The code of version 3 can be downloaded here:Here's the list of possible improvements I want to implement:1. Letting the user type in which languages are learned, so that the files will indicate the L1-L2-combination.DONE2. Splitting the GMS-like files after a certain number of sentence pairs. Letting the user decide how many sentences the GMS-like files should contain.DONE3. Simultaneous creation of GMS-like files.DONE4. Letting the user decide, which kind of files should be created (GMS-A, GMS-B, GMS-C, and/or GSR-like files). Not everyone needs every kind of file.DONE5. Make the program work with uneven numbers of sentences.DONE6. Adding an "outtro" (a short tone for example) to the GMS- and GSR-like files, so that the user instantly knows when he is done with the file. This is useful if the files are used on a smartphone with an app like "Smart AudioBook Player" which plays the sound files one after the other without any break.DONE7. Letting the user decide how many new sentences shall be introduced each day.DONE8. Letting the user decide for how many days the sentences shall be revised.DONE9. Letting the user decide how often the sentences will be repeated each day (five-, four-, three-, or twofold, or just once).DONE10. Letting the user decide if he first wants to learn the new sentences and then revise older sentences or the other way around. Letting the user decide if he wants to revise the oldest or the newest sentences first.DONE
Code: Select all
# Overlearning File Creator V3.0
import pydub
from pydub import AudioSegment
def create_sentence_pair(filenumber, silence_length):
# Creates one sentence pair from source and target file
source_filename = L1+"-"+filenumber+".mp3"
target_filename = L2+"-"+filenumber+".mp3"
source = AudioSegment.from_file(source_filename)
target = AudioSegment.from_file(target_filename)
if not fixed_silence_length:
target_length = len(target)+added_silence
if target_length > maximum_silence_length: target_length = maximum_silence_length
elif target_length < minimum_silence_length: target_length = minimum_silence_length
silence = AudioSegment.silent(duration=target_length)
else: silence = AudioSegment.silent(duration=silence_length)
pair = source + silence + target + silence
pair_filename = title+"-"+L1+"-"+L2+"-"+filenumber+".mp3"
print(pair_filename)
if embedded_cover:
pair.export("GSP/"+pair_filename, format="mp3", bitrate=bit_rate, tags={"album": title, "title": pair_filename, "track": filenumber}, cover="cover.png")
else:
pair.export("GSP/"+pair_filename, format="mp3", bitrate=bit_rate, tags={"album": title, "title": pair_filename, "track": filenumber})
def create_pattern_list (pattern):
# Applies the relevant pattern.
list = []
for number in range(new_senteces_per_day):
list.extend(pattern)
for counter in range(len(pattern)):
pattern[counter] = (pattern[counter]+1)%new_senteces_per_day
if pattern[counter] == 0:
pattern[counter] = new_senteces_per_day
return list
def add_number_of_first_sentence_to_pattern_list (list, number_of_first_sentence):
# Determines the precise numbers of the sentence pairs to be added.
for number in range(len(list)):
list[number] = list[number]+number_of_first_sentence
def create_list_of_sentences_to_be_added_to_sessionfile (first_sentence, pattern_number):
# Creates a list of sentences to be added to sessionfile
if pattern_number > 5: pattern_number = 5
pattern = patterns[pattern_number]
list_of_sentences = create_pattern_list(pattern)
add_number_of_first_sentence_to_pattern_list (list_of_sentences, first_sentence)
return list_of_sentences
def create_sessionfile(sessionnumber, days_of_revision, first, last, step):
# Creates a sessionfile, which consists of new sentences, if any, and sentences to be revised, if any (or vice versa).
session_filename = (title+"-"+L1+"-"+L2+"-GSR-DAY")+str(sessionnumber).zfill(3)+".mp3"
print(session_filename)
if intro:
session_file = AudioSegment.from_file("intro.mp3")
else:
session_file = AudioSegment.empty()
reps = 0
for number_of_revision in range(first, last, step):
first_sentence = sessionnumber*new_senteces_per_day-(number_of_revision*new_senteces_per_day+(new_senteces_per_day-1))
if first_sentence > 0 and first_sentence < number_of_necessary_sentences:
sentences = create_list_of_sentences_to_be_added_to_sessionfile (first_sentence-1, number_of_revision)
if number_of_revision == 0: print ("New sentences to add: ", sentences)
else: print ("Sentences to add for revision #", number_of_revision, ": ", sentences, sep="")
for counter in range(len(sentences)):
if not sentences[counter] > number_of_sentences:
source_filename = L1+"-"+str(sentences[counter])+".mp3"
target_filename = L2+"-"+str(sentences[counter])+".mp3"
source = AudioSegment.from_file(source_filename)
target = AudioSegment.from_file(target_filename)
if not fixed_silence_length:
target_length = len(target)+added_silence
if target_length > maximum_silence_length: target_length = maximum_silence_length
elif target_length < minimum_silence_length: target_length = minimum_silence_length
silence = AudioSegment.silent(duration=target_length)
else:
silence = AudioSegment.silent(duration=silence_length)
if number_of_revision == 0 and not new_sentences_once:
print("Adding:", source_filename, target_filename, target_filename)
session_file = session_file + source + silence + target + silence + target + silence
reps += 2
else:
print("Adding:", source_filename, target_filename)
session_file = session_file + source + silence + target + silence
reps += 1
if outtro:
session_file = session_file + AudioSegment.from_file("outtro.mp3")
print("Reps: ", reps)
print("Writing session file. Please wait.\n")
if embedded_cover:
session_file.export("GSR/"+session_filename, format="mp3", bitrate=bit_rate, tags={"album": title, "title": session_filename, "track": sessionnumber}, cover="cover.png")
else:
session_file.export("GSR/"+session_filename, format="mp3", bitrate=bit_rate, tags={"album": title, "title": session_filename, "track": sessionnumber})
# Start
# Standard preferences
sentence_pair_files = False # Sentence pair files will not be created
a_files = True # GMS-A-files will be created
b_files = True # GMS-B-files will be created
c_files = True # GMS-C-files will be created
spaced_repetition_files = True # GSR-files will be created
sentences_per_mass_sentences_file = 50 # GMS-files will have 50 sentences
first_mass_sentences_file = 1 # Creation of mass sentences files will begin mit the first file
new_senteces_per_day = 10 # 10 new sentences per day
new_sentences_once = True # New sentences will only presented once
days_of_revision = 4 # GSR-files will have 4 x 10 old sentences
revision_first = True # GSR-files will first present the oldest, then the newest sentences
patterns = [[1, 2, 3, 4, 5], [1, 2, 4, 7], [1, 2, 4, 7], [1, 3, 6], [1, 4], [1]] # Standard patterns
number_of_different_revision_patterns = days_of_revision # 4 different patterns for revision
first_session_file = 1 # Creation of GSR-files will begin mit the file for day one
bit_rate = "64k" # Bit rate of output soundfiles
embedded_cover = False # Soundfiles will have no cover
intro = False # No intro
outtro = False # No outtro
fixed_silence_length = True # Silence between sentences has fixed length
silence_length = 2000 # Fixed silence length is 2 seconds
maximum_silence_length = 10000 # Maximum silence length of 10 seconds
minimum_silence_length = 1000 # Minimum silence length of 1 second
added_silence = 0 # Added silence in case of flexible silence length
# Welcoming the user.
import os
os.system('cls' if os.name == 'nt' else 'clear')
print("Welcome to the Overlearning File Creator!")
# Gathering basic information about project
print("\nGathering basic information about project:")
title = input("\n1. What is your project title? ")
L1 = input("2. What is your source language? ")
L2 = input("3. What is your target language? ")
number_of_sentences = int(input("4. How many sentence pairs are there? "))
standard_preferences = bool(input("5. Do you want all standard preferences (y/n)? ") =="y")
# Gathering information about user defined preferences
if not standard_preferences:
print("\nGathering information about user defined preferences:")
# Which kind of files shall be created?
print("\n1. Types of files to be created")
sentence_pair_files = bool(input("1.1 Do you want sentence pair files for shuffeling (y/n)? ")=="y")
a_files = bool(input("1.2 Do you want sentence training mass sentences files (L1, L2, L2) (y/n)? ")=="y")
b_files = bool(input("1.3 Do you want interpretation training mass sentences files (L1, L2) (y/n)? ")=="y")
c_files = bool(input("1.4 Do you want repetition/shadowing mass sentences files (L2 only) (y/n)? ")=="y")
spaced_repetition_files = bool(input("1.5 Do you want overlearning/spaced repetition files (y/n)? ")=="y")
# Specifications of mass sentences files
print("\n2. Specifications of mass sentences files")
if a_files or b_files or c_files:
standard_mass_sentences = bool(input("2.1 Do you want 50 sentences per file (y/n)? ")=="y")
if not standard_mass_sentences: sentences_per_mass_sentences_file = int(input(" How many sentences per file do you want? "))
start_with_first_mass_sentences_file = bool(input("2.2 Do you want to begin with the first (set of) mass sentences file(s) (y/n)? ")=="y")
if not start_with_first_mass_sentences_file: first_mass_sentences_file = int(input(" With which (set of) mass sentences file(s) do you want to begin? "))
else:
print("N/A")
# Specifications of overlearning/spaced repetition files
print("\n3. Specifications of overlearning/spaced repetition files")
if spaced_repetition_files:
ten_new_sentences_per_day = bool(input("3.1 Do you want 10 new sentences per day (y/n)? ")=="y")
if not ten_new_sentences_per_day:
new_senteces_per_day = int(input(" How many new sentences per day do you want? "))
new_sentences_once = bool(input("3.2 Do you want new sentences to be presented once (otherwise twice) (y/n)? ")=="y")
if not new_sentences_once:
print(" New sentences shall be presented twice.")
standard_days_of_revision = bool(input("3.3 Do you want 4 days of revision (y/n)? ")=="y")
if not standard_days_of_revision:
days_of_revision = int(input(" How many days do you want to revise the sentences? "))
revision_first = bool(input("3.4 Do you want revisions first (otherwise new sentences first) (y/n)? ")=="y")
if not revision_first: print(" New sentences shall be presented first, then the revisions.")
standard_patterns = bool(input("3.5 Do you want to use the standard patterns (y/n)? ")=="y")
if not standard_patterns:
patterns_according_to_number_of_presentations = [[1], [1, 4], [1, 3, 6], [1, 2, 4, 7], [1, 2, 3, 4, 5]]
if days_of_revision > 5: number_of_different_revision_patterns = 5
else: number_of_different_revision_patterns = days_of_revision
for counter in range(number_of_different_revision_patterns+1):
if counter == 0:
number_of_presentations = int(input(" How often shall new sentences be presented (1-5)? "))-1
elif counter == 5 and days_of_revision > number_of_different_revision_patterns:
number_of_presentations = int(input(" How often shall sentences of revision #5 and up be presented (1-5)? "))-1
else:
print(" How often shall sentences of revision #", counter, " be presented (1-5)? ", end='', sep='')
number_of_presentations = int(input())-1
patterns[counter] = patterns_according_to_number_of_presentations[number_of_presentations]
if counter == 0:
print(" Pattern for new sentences:", patterns[counter])
elif counter == 5 and days_of_revision > number_of_different_revision_patterns:
print(" Pattern for revision # 5 and up:", patterns[counter])
else:
print(" Pattern for revision #", counter, ":", patterns[counter])
start_with_first_day = bool(input("3.6 Do you want to begin with the first session file (y/n)? ")=="y")
if not start_with_first_day: first_session_file = int(input(" With which session file do you want to begin? "))
else:
print("N/A")
# General Specifications
print("\n4. General specifications")
if sentence_pair_files or a_files or b_files or c_files or spaced_repetition_files:
standard_bit_rate = bool(input("4.1 Do you want the standard bitrate of 64k (y/n)? ")=="y")
if not standard_bit_rate:
bit_rate = input(" Which bitrate do you prefer (at least 32k)? ")
if bit_rate[-1] != "k": bit_rate = bit_rate + "k"
embedded_cover = bool(input("4.2 Do want want an embedded cover (y/n)? ")=="y")
if a_files or b_files or c_files or spaced_repetition_files:
intro = bool(input("4.3 Do you want to include an intro? ")=="y")
outtro = bool(input("4.4 Do you want to include an outtro? ")=="y")
fixed_silence_length = bool(input("4.5 Do you want fixed silence length between sentences (y/n)? ")=="y")
if fixed_silence_length:
two_seconds_of_silence = bool(input(" Do you want 2 seconds of silence between sentences (y/n)? ")=="y")
if not two_seconds_of_silence:
silence_length = float(input(" Desired silence length between sentences (in seconds)? "))*1000
if silence_length > maximum_silence_length: silence_length = maximum_silence_length
elif silence_length < minimum_silence_length: silence_length = minimum_silence_length
added_silence = 0
else:
print(" Silence length shall correspond to the length of the respective sentence.")
added_silence = float(input(" Added silence (in seconds)? "))*1000
else:
print("N/A")
# Informing user about set specifications of project
print("\nSet specifications of your project:")
print("\n1. Types of files to be created")
print("1.1 Sentence pair files:", sentence_pair_files)
print("1.2 Sentence training mass sentences files:", a_files)
print("1.3 Interpretation training mass sentences files:", b_files)
print("1.4 Repetition/shadowing mass sentences files:", c_files)
print("1.5 Overlearning/spaced repetition files:", spaced_repetition_files)
print("\n2. Specifications of mass sentences files")
if a_files or b_files or c_files:
print("2.1 Sentences per mass sentences file:", sentences_per_mass_sentences_file)
print("2.2 First mass sentences file:", first_mass_sentences_file)
else:
print("N/A")
print("\n3. Specifications of overlearning/spaced repetition files")
if spaced_repetition_files:
print("3.1 New sentences per day:", new_senteces_per_day)
print("3.2 New sentences presented once:", new_sentences_once)
print("3.3 Days of revision:", days_of_revision)
print("3.4 Revision first:", revision_first)
for counter in range(number_of_different_revision_patterns+1):
if counter == 0:
print("3.5 Pattern for new sentences:", patterns[counter])
elif counter == 5 and days_of_revision > number_of_different_revision_patterns:
print(" Pattern for revision # 5 and up:", patterns[counter])
else:
print(" Pattern for revision #", counter, ":", patterns[counter])
print("3.6 First session file:", first_session_file)
else:
print("N/A")
print("\n4. General Specifications")
if sentence_pair_files or a_files or b_files or c_files or spaced_repetition_files:
print("4.1 Bitrate:", bit_rate)
print("4.2 Embedded cover:", embedded_cover)
if a_files or b_files or c_files or spaced_repetition_files:
print("4.3 Intro:", intro)
print("4.4 Outtro:", outtro)
print("4.5 Fixed length:", fixed_silence_length)
if fixed_silence_length:
print(" Silence length:", silence_length/1000, "seconds")
else:
print(" Flexible silence length:", True)
print(" Added silence:", added_silence/1000, "seconds")
else:
print("N/A")
# Creation of sound files
print("\nSoundfiles will now be created.")
import time
import math
# Step 1: Creation of sentence pairs.
if sentence_pair_files:
start_time = time.time()
if not os.path.exists("GSP"):
os.makedirs("GSP")
print("\nStep 1: ", number_of_sentences, " sentence pair files (e.g. for shuffle play) will be created.")
for number in range(1, number_of_sentences+1):
filenumber=str(number)
create_sentence_pair(filenumber, silence_length)
end_time = time.time()
elapsed_time = math.ceil((end_time-start_time)/60)
print("Completed in ", str(elapsed_time//60).zfill(2), ":", str(elapsed_time%60).zfill(2), " hours.", sep='')
else:
print("\nStep 1: Creation of sentence pair files: N/A")
# Step 2: Creation of mass sentences files
if a_files or b_files or c_files:
start_time = time.time()
if a_files:
if not os.path.exists("GMS/GMS-A"):
os.makedirs("GMS/GMS-A")
if b_files:
if not os.path.exists("GMS/GMS-B"):
os.makedirs("GMS/GMS-B")
if c_files:
if not os.path.exists("GMS/GMS-C"):
os.makedirs("GMS/GMS-C")
if number_of_sentences/sentences_per_mass_sentences_file != number_of_sentences//sentences_per_mass_sentences_file:
number_of_mass_sentences_files = round(number_of_sentences//sentences_per_mass_sentences_file)+1
else:
number_of_mass_sentences_files = int(number_of_sentences/sentences_per_mass_sentences_file)
print("\nStep 2: ", number_of_mass_sentences_files-first_mass_sentences_file+1, " (sets of) mass sentences file(s) will be created.")
for number in range(first_mass_sentences_file, number_of_mass_sentences_files+1):
number_of_first_audiofile = (number-1)*sentences_per_mass_sentences_file+1
if a_files:
a_file_name = (title+"-"+L1+"-"+L2+"-GMS-A-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
print(a_file_name)
if b_files:
b_file_name = (title+"-"+L1+"-"+L2+"-GMS-B-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
print(b_file_name)
if c_files:
c_file_name = (title+"-"+L1+"-"+L2+"-GMS-C-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
print(c_file_name)
if intro:
a_file = AudioSegment.from_file("intro.mp3")
b_file = AudioSegment.from_file("intro.mp3")
c_file = AudioSegment.from_file("intro.mp3")
else:
a_file = AudioSegment.empty()
b_file = AudioSegment.empty()
c_file = AudioSegment.empty()
if number*sentences_per_mass_sentences_file > number_of_sentences:
maximum_files_to_add = number_of_sentences - (number-1)*sentences_per_mass_sentences_file
else:
maximum_files_to_add = sentences_per_mass_sentences_file
for counter in range(maximum_files_to_add):
filenumber = number_of_first_audiofile + counter
source_filename = L1+"-"+str(filenumber)+".mp3"
target_filename = L2+"-"+str(filenumber)+".mp3"
source = AudioSegment.from_file(source_filename)
target = AudioSegment.from_file(target_filename)
if not fixed_silence_length:
target_length = len(target)+added_silence
if target_length > maximum_silence_length: target_length = maximum_silence_length
elif target_length < minimum_silence_length: target_length = minimum_silence_length
silence = AudioSegment.silent(duration=target_length)
else:
silence = AudioSegment.silent(duration=silence_length)
if a_files:
a_file = a_file + source + silence + target + silence + target + silence
print("Adding to A-file:", source_filename, target_filename, target_filename)
if b_files:
b_file = b_file + source + silence + target + silence
print("Adding to B-file:", source_filename, target_filename)
if c_files:
c_file = c_file + target + silence
print("Adding to C-file:", source_filename)
if outtro:
a_file = a_file + AudioSegment.from_file("outtro.mp3")
b_file = b_file + AudioSegment.from_file("outtro.mp3")
c_file = c_file + AudioSegment.from_file("outtro.mp3")
print("Writing mass sentences file(s). Please wait.\n")
if a_files:
if embedded_cover:
a_file.export("GMS/GMS-A/"+a_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": a_file_name, "track": number}, cover="cover.png")
else:
a_file.export("GMS/GMS-A/"+a_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": a_file_name, "track": number})
if b_files:
if embedded_cover:
b_file.export("GMS/GMS-B/"+b_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": b_file_name, "track": number}, cover="cover.png")
else:
b_file.export("GMS/GMS-B/"+b_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": b_file_name, "track": number})
if c_files:
if embedded_cover:
c_file.export("GMS/GMS-C/"+c_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": c_file_name, "track": number}, cover="cover.png")
else:
c_file.export("GMS/GMS-C/"+c_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": c_file_name, "track": number})
end_time = time.time()
elapsed_time = math.ceil((end_time-start_time)/60)
print("Completed in ", str(elapsed_time//60).zfill(2), ":", str(elapsed_time%60).zfill(2), " hours.", sep='')
else:
print("\nStep 2: Creation of mass sentences files: N/A")
# Step 3: Creation of sessionfiles
if spaced_repetition_files:
start_time = time.time()
if not os.path.exists("GSR"):
os.makedirs("GSR")
number_of_sessions = math.ceil(number_of_sentences/new_senteces_per_day) + days_of_revision - first_session_file + 1
print("\nStep 3: ", number_of_sessions, "session files will be created. ")
number_of_necessary_sentences = math.ceil(number_of_sentences/new_senteces_per_day)*new_senteces_per_day
if revision_first:
first = days_of_revision
last = -1
step = -1
else:
first = 0
last = days_of_revision+1
step = +1
for number in range(first_session_file, first_session_file + number_of_sessions):
create_sessionfile(number, days_of_revision, first, last, step)
end_time = time.time()
elapsed_time = math.ceil((end_time-start_time)/60)
print("Completed in ", str(elapsed_time//60).zfill(2), ":", str(elapsed_time%60).zfill(2), " hours.", sep='')
else:
print("\nStep 3: Creation of spaced repetition files: N/A\n")
os.system('pause' if os.name == 'nt' else 'read')
Here you can get an impression of the settings the program now offers: As you can see, I used the program with 1854 sentences from an Assimil course, and it worked perfectly circumventing the notorious memory error of Pydub. The mp3 files produced are more than 2.4 gigabytes. The whole process took about 10 hours. The log file can be viewed here: https://tinyurl.com/ya594o3s.
I already have ideas about future improvements, which I might implement if I have time to do so:
1. The user shall have the possibility to choose between Overlearning Files and Spaced Repetition Files. Depending on this, he shall have the possibility to determine the number and the timing of the repetitions. This should be quite easy to implement. In this way, this tool also becomes an alternative to Gradint.
2. Possibility to learn not the same number of new sentences every day, but depending on the number of sentences in each lesson of a course. This would be particularly useful, for example, to learn all sentences from an Assimil lesson every day and at the same time to repeat all sentences from certain earlier Assimil lessons. However, this will have a major impact on the algorithm, which is why it will be very difficult to implement.
3. Integration of the Cue Creator into the Overlearning File Creator, so that the user has the possibility to choose if he wants to use already existing cue sound files or have them created.
4. Possibility to use also a comma seperated value file (CSV-file) containing all the cues in written form and have them converted to sound files with integrated text to speech software.
5. Converting the Python script into a Windows executable file to make the program accessible to the user who does not want or is not able to install Python and Libav first. In this context, the following changes must also be implemented:
- Check if Libav is installed. If not, Libav should be installed by the program without any further action on the part of the user.
- Query the folder containing the sound files. Automatic detection of the number of sound files. Check if all relevant files are present or skip non-existent files to prevent the program from crashing.
- Insert a subroutine that checks whether the user has given the correct input to prevent the program from crashing.
- Insert a help file that explains all settings and possible uses to the user.
Did I say I enjoy programming? It's almost as enjoyable as learning languages ...