Hello everybody,
Lately, I have had some time for coding again, and I have implemented some of the desired features into the Overlearning File Creator. Here is what has been achieved so far:
neumanc wrote:1. Letting the user type in which languages are learned, so that the files will indicate the L1-L2-combination.DONE
2. Splitting the GMS-like files after a certain number of sentence pairs. Letting the user decide how many sentences the GMS-like files should contain.DONE
3. Simultaneous creation of GMS-like files.DONE
4. Letting the user decide, which kind of files should be created (GMS-A, GMS-B, GMS-C, and/or GSR-like files). Not everyone needs every kind of file.DONE
5. Make the program work with uneven numbers of sentences.
6. Adding an "outtro" (a short tone for example) to the GMS- and GSR-like files, so that the user instantly knows when he is done with the file. This is useful if the files are used on a smartphone with an app like "Smart AudioBook Player" which plays the sound files one after the other without any break.
7. Letting the user decide how many new sentences shall be introduced each day.
8. Letting the user decide for how many days the sentences shall be revised.DONE
9. Letting the user decide how often the sentences will be repeated each day (five-, four-, three-, or twofold, or just once).
10. Letting the user decide if he first wants to learn the new sentences and then revise older sentences or the other way around. Letting the user decide if he wants to revise the oldest or the newest sentences first.DONE
There are also a couple of other improvements, e.g. the user can now alter several parameters.
And here's the code of version 2.5:
Code: Select all
# Overlearning File Creator V2.5
import pydub
from pydub import AudioSegment
def create_sentence_pair(filenumber, silence_length):
# Creates one sentence pair from source and target file
source_filename = L1+"-"+filenumber+".mp3"
target_filename = L2+"-"+filenumber+".mp3"
source = AudioSegment.from_mp3(source_filename)
target = AudioSegment.from_mp3(target_filename)
if not fixed_silence_length:
target_length = len(target)+added_silence
if target_length > maximum_silence_length: target_length = maximum_silence_length
elif target_length < minimum_silence_length: target_length = minimum_silence_length
silence = AudioSegment.silent(duration=target_length)
else: silence = AudioSegment.silent(duration=silence_length)
pair = source + silence + target + silence
pair_filename = title+"-"+L1+"-"+L2+"-"+filenumber+".mp3"
print(pair_filename)
pair.export(pair_filename, format="mp3", bitrate=bit_rate)
def create_pattern_list (pattern):
# Applies the relevant pattern.
list = []
for number in range(10):
list.extend(pattern)
for counter in range(0, len(pattern)):
pattern[counter] = (pattern[counter]+1)%10
if pattern[counter] == 0:
pattern[counter] = 10
return list
def add_number_of_first_sentence_to_pattern_list (list, number_of_first_sentence):
# Determines the precise numbers of the sentence pairs to be added.
for number in range(0, len(list)):
list[number] = list[number]+number_of_first_sentence
def create_list_of_sentences_to_be_added_to_sessionfile (first_sentence, pattern_number):
# Creates a list of sentences to be added to sessionfile
if pattern_number == 0: pattern = [1, 2, 3, 4, 5]
elif pattern_number == 1: pattern = [1, 2, 4, 7]
elif pattern_number == 2: pattern = [1, 2, 4, 7]
elif pattern_number == 3: pattern = [1, 3, 6]
elif pattern_number == 4: pattern = [1, 4]
else: pattern = [1]
list_of_sentences = create_pattern_list(pattern)
add_number_of_first_sentence_to_pattern_list (list_of_sentences, first_sentence)
return list_of_sentences
def create_sessionfile(sessionnumber, days_of_revision, first, last, step):
# Creates a sessionfile, which consists of new sentences, if any, and sentences to be revised, if any (or vice versa).
session_filename = (title+"-"+L1+"-"+L2+"-GSR-DAY")+str(sessionnumber).zfill(3)+".mp3"
print(session_filename)
session_file = AudioSegment.empty()
reps = 0
for number in range(first, last, step):
first_sentence = sessionnumber*10-(number*10+9)
if first_sentence > 0 and first_sentence < number_of_sentences:
sentences = create_list_of_sentences_to_be_added_to_sessionfile (first_sentence-1, number)
if number == 0: print ("New sentences to add: ", sentences)
else: print ("Sentences to add for revision #", number, ": ", sentences, sep="")
for counter in range(len(sentences)):
source_filename = L1+"-"+str(sentences[counter])+".mp3"
target_filename = L2+"-"+str(sentences[counter])+".mp3"
source = AudioSegment.from_mp3(source_filename)
target = AudioSegment.from_mp3(target_filename)
if not fixed_silence_length:
target_length = len(target)+added_silence
if target_length > maximum_silence_length: target_length = maximum_silence_length
elif target_length < minimum_silence_length: target_length = minimum_silence_length
silence = AudioSegment.silent(duration=target_length)
else:
silence = AudioSegment.silent(duration=silence_length)
print("Adding:", source_filename, target_filename)
session_file = session_file + source + silence + target + silence
reps += 1
print("Reps: ", reps)
print("Writing session file. Please wait.\n")
session_file.export(session_filename, format="mp3", bitrate=bit_rate)
# Start
# Welcoming the user.
import os
os.system('cls' if os.name == 'nt' else 'clear')
print("Welcome to the Overlearning File Creator!\n")
# Determining project title, source and target language
title = input("What is your project title? ")
L1 = input("What is your source language? ")
L2 = input("What is your target language? ")
# Determining how many sentence pairs there are to be processed, only multiples of 10 will do.
number_of_sentences = int(input("\nHow many sentence pairs are there? "))
if number_of_sentences%10 >0:
number_of_sentences = (number_of_sentences//10)*10
if number_of_sentences == 0: print("At least 10 sentencepairs are necessary.")
else: print("Only ", number_of_sentences, "sentence pairs can be processed (only multiples of 10).")
# Processing requires at least 10 sentence pairs.
if number_of_sentences >= 10:
# Setting standard preferences
fixed_silence_length = True # Silence between sentences has fixed lengths
silence_length = 2000 # Fixed silence lengths is 2 seconds
maximum_silence_length = 10000 # Maximum silence lengths of 10 seconds
minimum_silence_length = 1000 # Minimum silence lengths of 1 second
added_silence = 0 # Added silence in case of flexible silence length
bit_rate = "64k" # Bit rate of output soundfiles
sentence_pair_files = False # Sentence pair files will not be created
a_files = True # GMS-A-files will be created
b_files = True # GMS-B-files will be created
c_files = True # GMS-C-files will be created
sentences_per_mass_sentences_file = 50 # GMS-files will have 50 sentences
spaced_repetition_files = True # GSR-files will be created
days_of_revision = 4 # GSR-files will have 4 x 10 old sentences
revision_first = True # GSR-files will first present the oldest, then the newest sentences
first_session_file = 1 # Creation of GSR-files will begin mit the file for day one
# User defined preferences?
standard_preferences = bool(input("\nDo you want all standard preferences (default) (y/n)? ") =="y")
if not standard_preferences:
# Fixed or flexible silence length?
fixed_silence_length = bool(input("\nDo you want fixed silence length between sentences (default) (y/n)? ")=="y")
if fixed_silence_length:
two_seconds_of_silence = bool(input("Do you want 2 seconds of silence between sentences (default) (y/n)? ")=="y")
if not two_seconds_of_silence:
silence_length = float(input("Desired silence length between sentences (in seconds)? "))*1000
if silence_length > maximum_silence_length: silence_length = maximum_silence_length
#Not more than 4 seconds of silence because of memory error issue with Pydub
elif silence_length < minimum_silence_length: silence_length = minimum_silence_length
added_silence = 0
else:
print("Silence length shall correspond to the length of the sentences.")
added_silence = float(input("Added silence (in seconds)?"))*1000
# Determining the bit rate of the sound files to be created
standard_bit_rate = bool(input("\nDo you want a bitrate of 64k (default) (y/n)? ")=="y")
if not standard_bit_rate:
bit_rate = input("Which bitrate shall the sound files have (e.g. 32k, 48k, 64k, 128k, 192k, etc.)? ")
if bit_rate[-1] != "k": bit_rate = bit_rate + "k"
# Determining which kind of files shall be created
sentence_pair_files = bool(input("\nDo you want sentence pair files (e.g. for shuffeling) (y/n)? ")=="y")
a_files = bool(input("Do you want mass sentence files (L1, L2, L2) for sentence training (default) (y/n)? ")=="y")
b_files = bool(input("Do you want mass sentence files (L1, L2) for interpretation training (default) (y/n)? ")=="y")
c_files = bool(input("Do you want mass sentence files (L2 only) for sentence repetition/shadowing (default) (y/n)? ")=="y")
if a_files or b_files or c_files:
standard_mass_sentences = bool(input("Do you want 50 sentences per mass sentences file (default) (y/n)? ")=="y")
if not standard_mass_sentences: sentences_per_mass_sentences_file = int(input("How many sentences per mass sentences file do you want? "))
print()
spaced_repetition_files = bool(input("Do you want overlearning/spaced repetition files (default)? (y/n)? ")=="y")
if spaced_repetition_files:
standard_days_of_revision = bool(input("Do you want 4 days of revision (default) (y/n)? ")=="y")
if not standard_days_of_revision:
days_of_revision = int(input("How many days do you want to revise the sentences? "))
if days_of_revision > 0: revision_first = bool(input("Do you want revisions first (default) (y/n)? ")=="y")
if not revision_first: print("New sentences shall be presented first, then the revisions.")
start_with_first_day = bool(input("Do you want to begin the process with the first session file (default) (y/n)? ")=="y")
if not start_with_first_day: first_session_file = int(input("With which session file shall the process begin (1, 2, ...)?"))
# Informing user about specifications of soundfiles
print("\nSpecifications of soundfiles:")
print("\nSentence pair files:", sentence_pair_files)
if sentence_pair_files:
print("Fixed length:", fixed_silence_length)
if fixed_silence_length:
print("Silence length:", silence_length/1000, "seconds")
else:
print("Flexible silence length:", True)
print("Added silence:", added_silence/1000, "seconds")
print("Bit rate:", bit_rate)
print("A-files:", a_files)
print("B-files:", b_files)
print("C-files:", c_files)
if a_files or b_files or c_files: print("Sentences per mass sentences file:", sentences_per_mass_sentences_file)
print("Spaced repetition files:", spaced_repetition_files)
if spaced_repetition_files:
print("Days of revision:", days_of_revision)
if days_of_revision > 0: print("Revision first:", revision_first)
print("First session file:", first_session_file)
# Creation of sound files
print("\nSoundfiles will now be created.")
# Step 1: Creation of sentence pairs.
if sentence_pair_files:
print("\nStep 1: ", number_of_sentences, " sentence pair files (e.g. for shuffle play) will be created.")
for number in range(1, number_of_sentences+1):
filenumber=str(number)
create_sentence_pair(filenumber, silence_length)
print("Completed.")
else:
print("\nStep 1: Creation of sentence pair files: N/A")
# Step 2: Creation of mass sentences files
if a_files or b_files or c_files:
if number_of_sentences/sentences_per_mass_sentences_file != number_of_sentences//sentences_per_mass_sentences_file:
number_of_mass_sentences_files = round(number_of_sentences//sentences_per_mass_sentences_file)+1
else:
number_of_mass_sentences_files = int(number_of_sentences/sentences_per_mass_sentences_file)
print("\nStep 2: ", number_of_mass_sentences_files, " (sets of) mass sentences file(s) will be created.")
for number in range(1, number_of_mass_sentences_files+1):
number_of_first_audiofile = (number-1)*sentences_per_mass_sentences_file+1
a_file_name = (title+"-"+L1+"-"+L2+"-GMS-A-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
b_file_name = (title+"-"+L1+"-"+L2+"-GMS-B-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
c_file_name = (title+"-"+L1+"-"+L2+"-GMS-C-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
print(a_file_name, b_file_name, c_file_name)
a_file = AudioSegment.empty()
b_file = AudioSegment.empty()
c_file = AudioSegment.empty()
if number*sentences_per_mass_sentences_file > number_of_sentences:
maximum_files_to_add = number_of_sentences - (number-1)*sentences_per_mass_sentences_file
else:
maximum_files_to_add = sentences_per_mass_sentences_file
for counter in range(maximum_files_to_add):
filenumber = number_of_first_audiofile + counter
source_filename = L1+"-"+str(filenumber)+".mp3"
target_filename = L2+"-"+str(filenumber)+".mp3"
source = AudioSegment.from_mp3(source_filename)
target = AudioSegment.from_mp3(target_filename)
if not fixed_silence_length:
target_length = len(target)+added_silence
if target_length > maximum_silence_length: target_length = maximum_silence_length
elif target_length < minimum_silence_length: target_length = minimum_silence_length
silence = AudioSegment.silent(duration=target_length)
else:
silence = AudioSegment.silent(duration=silence_length)
if a_files:
a_file = a_file + source + silence + target + silence + target + silence
print("Adding to A-file:", source_filename, target_filename, target_filename)
if b_files:
b_file = b_file + source + silence + target + silence
print("Adding to B-file:", source_filename, target_filename)
if c_files:
c_file = c_file + target + silence
print("Adding to C-file:", source_filename)
print("Writing mass sentence file(s). Please wait.\n")
if a_files: a_file.export(a_file_name, format="mp3", bitrate=bit_rate)
if b_files: b_file.export(b_file_name, format="mp3", bitrate=bit_rate)
if c_files: c_file.export(c_file_name, format="mp3", bitrate=bit_rate)
del a_file
del b_file
del c_file
print("Completed.")
else:
print("\nStep 2: Creation of mass sentences files: N/A")
# Step 3: Creation of sessionfiles
if spaced_repetition_files:
number_of_sessions = int(number_of_sentences/10) + days_of_revision - first_session_file + 1
print("\nStep 3: ", number_of_sessions, " session files will be created. ")
if revision_first:
first = days_of_revision
last = -1
step = -1
else:
first = 0
last = days_of_revision+1
step = +1
for number in range(first_session_file, first_session_file + number_of_sessions):
create_sessionfile(number, days_of_revision, first, last, step)
print("Completed.")
else:
print("\nStep 3: Creation of spaced repetition files: N/A\n")
os.system('pause' if os.name == 'nt' else 'read')
How to use the Overlearning File Creator on Windows 101. How to install WinPython (portable version)
a) Go to
https://sourceforge.net/projects/winpython/files/WinPython_3.6/3.6.5.0/, then click on "Download Latest Version". At the time of writing, this is
WinPython 3.6.5.0Qt5-64bit.
You should use the 64bit version in order to prevent the memory issues of Pydub described by MattG. b) Click on the downloaded exe-file, this will install WinPython Portable on your system.
2. How to install Libav (Pydub relies on Libav, so this needs to be done)
a) Go to
http://builds.libav.org/windows/nightly-lgpl/ to get the newest nightly build of Libav. Choose the newest file at the bottom of the page. At the time of writing, this is
libav-x86_64-w64-mingw32-20180108.7z dating from 8th January 2018. Download it by clicking on it. Warning: This is a nightly build, so it may contain errors. The latest release version can be found at
http://builds.libav.org/windows/release-lgpl/, but it's already three (!) years old.
b) Extract the downloaded 7z-file somewhere on your computer where it won't be deleted. You will need the freeware "7-zip" for this, which you can download at
http://www.7-zip.de.
c) Open the folder that contains the extracted files. Within this folder, open the sub-folder
usr, then the sub-sub-folder
bin. Click on the address bar and copy the path to this sub-sub-folder (CTRL-C).
d) Still using the Windows Explorer, search on the left sidebar for
This PC and right-click on it. A context menu will open, where you click on
Properties.
e) A new window will open where you can "View basic information about your computer". In this window, on the left sidebar, click on
Advanced system settings.
f) A new window will open called
System Properties, where you must click on
Environment Variables.
g) A new window will open called
Environment Variables. On the bottom, you will find the
System Variables. Click on the system variable
Path.
h) A new window will open where you can edit the environment variable. Click on
New and paste the path to the above-mentioned sub-sub-folder
bin. Save and close everything.
3. How to install Pydub (without Pydub, the script will not work)
a) Go to
https://pypi.org/project/pydub/#files and click on
pydub-0.22.1.tar.gz b) Go to the folder where WinPython in installed and start the program
WinPython Control Panel.exe by double-clicking on it.
c) In the tab
Install/upgrade packages, click on
Add packages.
d) Go to your download folder and click on
pydub-0.22.1.tar.gz and then on
Open, this will install Pydub.
4. How to get the Overlearning File Creator
a) Create a new text file on your desktop.
b) Rename it properly (e.g. Overlearning File Creator) and change the ending to ".py".
c) Mark all the above code in this post and copy it (CTRL-C).
d) Open the renamed file and paste the code into it (CTRL-V).
e) Save the renamed file.
5. How to prepare the audio
a) Create a new folder on your desktop and name it properly (e.g. "English-French audio").
b) Move the py-file into this folder.
c) Move all the source and target audio (mp3) files into this folder. The source and the target files should contain corresponding sentences (or short passages) and must be numbered accordingly, e.g. "EN-1.mp3" and "FR-1.mp3" and up.
6. How to run the script
a) Go to the folder where WinPython is installed and open the program
IDLEX (Python GUI).exe by double-clicking on it.
b) Click on the tab
File, then on
Open. Choose the folder on your desktop containing the script and the audio.
c) Now choose the py-file and click on
Open. This will open a new WinPython window where you can read the script.
d) Click on the tab
Run and then on
Run Module. Voilà, the script should be running on your machine.
e) Answer all the questions the script will prompt you. As project title, you could choose the source where you got the audio from. Every file the script will create will be named beginning with the project title. Then you must let the script know which are the source and the target language (e.g. "EN" and "FR") in order to let the script find the audio files. If you choose the "standard preferences" you will get mass sentences files and spaced repetition files that will be exactly like those of the old Glossika with the exception that the silence length between sentences is always two seconds instead of one (this will give you a second more to think before speaking with the audio). Otherwise, you might set the preferences as you like. Just try it out.
f) Unfortunately, the 64bit version of WinPython runs very slowly (but reliably). If you want to have faster progress and don't shy away from memory errors, you could also use the 32bit version. For this, go to the folder where WinPython is installed and open the program
WinPython Command Prompt.exe by double-clicking on it. This will start the Windows command prompt. Use the cd-command to choose the folder containing the audio and the script. Start the script by typing in its name including the extension and pressing "return". Everything else should work exactly the same, but much faster. On my machine, each GSR-like file takes about 2 minutes to create. In order to avoid a memory error, your audio files should be very short. Furthermore, you should choose a short silence length (e.g. 2 seconds).
Important note: If you follow any of the above instructions, you do so at your own risk. I am neither a computer scientist nor a professional programmer and I do not know enough about any risks that might accompany the installation of WinPython, Libav and Pydub. This is what runs on my computer without it seemingly being damaged and what I wanted to share with you.
What can I do if I don't have bilingual audio?If you don't have bilingual audio (for example the split mp3 files from Assimil), you can still use the Overlearning File Creator to create Glossika-like files. How so? If you don't have cues in your mother language (or any other language) and if you don't want to create them in a very tedious process, you must search for another kind of cue. While working through my Assimil files by listening, repeating and using heavily the pause button, I noticed that I only had to hear the beginning of a sentence to remember the whole next sentence. In my opinion, the best cue for a sentence is its
beginning! The first one or two words together with their intonation and the speaker's voice are enough. Obviously, this will only work with sentences you can understand just fine (but for some reason can't reproduce fluently enough yet). That's why the idea came up to write another script that serves to cut off the beginnings of sentences and save them as new files that can serve as cues. Voilà:
Code: Select all
#Cue Creator
import pydub
from pydub import AudioSegment
#Welcoming the user.
import os
os.system('cls' if os.name == 'nt' else 'clear')
print("Welcome to the Cue Creator!")
#Gathering necessary information
number_of_sentences = int(input("\nHow many sentencepairs are there? "))
length_of_cue = float(input("Desired cue length (in seconds)? "))*1000
if length_of_cue < 250:
length_of_cue = 250
#Creating Cues
print("Creating cues:")
for filenumber in range(1, number_of_sentences+1):
sentence_filename="target-"+str(filenumber)+".mp3"
sentence = AudioSegment.from_mp3(sentence_filename)
cue_filename="source-"+str(filenumber)+".mp3"
cue_raw = sentence[:length_of_cue+250]
cue = cue_raw.fade_out(250)
cue.export(cue_filename, format="mp3")
print(cue_filename)
print("Completed.")
os.system('pause' if os.name == 'nt' else 'read')
Enjoy!