Solved: How to create your own Glossika-like GSR files?

All about language programs, courses, websites and other learning resources
User avatar
MattG
Posts: 3
Joined: Fri Oct 30, 2015 1:25 am
Location: Huntsville, AL (USA)
Languages: English (N)
Spanish (~B2), Hebrew (A2), Persian (A1)
x 3

Re: Solved: How to create your own Glossika-like GSR files?

Postby MattG » Thu May 03, 2018 1:31 am

neumanc wrote:If you have any ideas on how to improve the script, let's discuss them. One improvement could be to let the user choose how many repetitions he wants, because I for myself think that repeating each sentence 18 times within 5 learning sessions is sometimes a little bit too much. I tried to implement this, but it didn't work so far. Maybe I will implement this later.

Great job on the script, neumanc. I was able to test it on my system and it works great, although as you suggested it took some effort figuring out how to get pydub and ffmpeg to work. Would it be possible for you to add a couple of things to your script?
1) Prompt the user for the L1 (Source) and L2 (Target) language names, then use those names to read in the files (e.g., English-1.mp3 and Hebrew-1.mp3). Disregard - I modified the script to do that.
2) Allow the user to specify if they want the traditional 50 sentences/file for the "GMS"-like files or all of them merged into 1 file.
neumanc wrote:Thank you, sfuqua, for making me aware that also a spreadsheet could be used to produce a gsr-like experience. What a clever idea! This could especially be helpful if one doesn't have native quality audio files but only text. Furthermore, I have the impression that this could also be useful to generate automatically audio files with tts-voices, e.g. the L1-translations needed for use with my little Python script, so that one wouldn't have to record these oneself.

As far as TTS, I'm not a programmer by any means but was able to put together a simple Python script that reads in a text file and converts it, line by line, to audio and outputs individual mp3 files:

Code: Select all

# Python 3 code to convert text to speech and save as mp3 file
from gtts import gTTS

def makeMP3(words, mp3name,language="en"):
   tts = gTTS(text=words, lang=language)
   tts.save("%s.mp3" % mp3name)
   print("File %s.mp3 created" % mp3name)

numLines = 0

with open('exampleFile.txt', 'r') as f:

   for text in f:
      numLines += 1
      # print (numLines)
      print (text, end='')
      output_file = "test" + str(numLines)

      makeMP3(text,output_file)
2 x

User avatar
MattG
Posts: 3
Joined: Fri Oct 30, 2015 1:25 am
Location: Huntsville, AL (USA)
Languages: English (N)
Spanish (~B2), Hebrew (A2), Persian (A1)
x 3

Re: Solved: How to create your own Glossika-like GSR files?

Postby MattG » Thu May 03, 2018 10:33 pm

Neumanc, I did run into a problem with the script. I was processing a block of 180 sentence pairs and it died. It successfully created the 180 "pair*" files and the "mass sentences for interpretation training (L1, L2).mp3" file. But then it crashed during the next section after exactly 90 sentences. I did a screen capture of the "memory error" message:

MemoryError.JPG
MemoryError.JPG (86.19 KiB) Viewed 612 times


Any ideas what the problem might be?

Thanks,
Matt
1 x

User avatar
rdearman
Site Admin
Posts: 3992
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 8896
Contact:

Re: Solved: How to create your own Glossika-like GSR files?

Postby rdearman » Fri May 04, 2018 9:43 am

MattG wrote:Neumanc, I did run into a problem with the script. I was processing a block of 180 sentence pairs and it died. It successfully created the 180 "pair*" files and the "mass sentences for interpretation training (L1, L2).mp3" file. But then it crashed during the next section after exactly 90 sentences. I did a screen capture of the "memory error" message:

MemoryError.JPG


Any ideas what the problem might be?

Thanks,
Matt

Looks like you had a memory error when loading the last segment. Perhaps you need to cut down the amount you're doing, or increase your computers memory. This is just a guess. You'd be better off asking programming questions on stackexchange.
1 x
: 4 / 100 100 Italian paperbacks:

Lollygagging Podcast available on iTunes

Online
User avatar
neumanc
Orange Belt
Posts: 114
Joined: Sat Jul 18, 2015 11:19 am
Location: Düsseldorf (Germany)
Languages: Speaks: German (native), English
Studies: French (advanced), Dutch (advanced), Spanish (false beginner)
Mostly forgotten: Italian, Latin
x 344

Re: Solved: How to create your own Glossika-like GSR files?

Postby neumanc » Fri May 04, 2018 11:33 am

Hello MattG, thank you very much for your feedback on the script. I hope it will be useful to you. I really appreciate your input.
1) Prompt the user for the L1 (Source) and L2 (Target) language names, then use those names to read in the files (e.g., English-1.mp3 and Hebrew-1.mp3). Disregard - I modified the script to do that.
2) Allow the user to specify if they want the traditional 50 sentences/file for the "GMS"-like files or all of them merged into 1 file.
Neumanc, I did run into a problem with the script. I was processing a block of 180 sentence pairs and it died. It successfully created the 180 "pair*" files and the "mass sentences for interpretation training (L1, L2).mp3" file. But then it crashed during the next section after exactly 90 sentences. I did a screen capture of the "memory error" message:
These are two improvements I have already decided to make myself. I have already implemented the second point, even in such a way that the user can choose how many sentences each GMS-like file should contain (because I think that 50 sentences per file are a little too much). I will post the improvement shortly, hopefully this weekend, I just want to make it even better. I also know of the problem with the memory error. I really don't know why the computer should run into a memory problem because of some megabytes of audio files. I assume that this is a problem with pydub. However, this problem is now (partly) solved. Since the user can choose now how many sentences each GMS-like file should contain, there won't be any shortage of memory. This may explain why the original GMS-files were divided into 50 sentence pairs each. I successfully tried out the amended script (which I will post shortly) with 1820 (!) sentence pairs without any memory error. However, while doing a second test on further sentence pairs, I had a memory error during the creation of one of the GSR-like files. I closed every other program on my computer, then it went through only to produce a memory error at a later point. This really shouldn't be the case since the GSR-like files are only 180 sentences long. As rdearman said, I think that running the script on a computer with bigger memory might be the solution. It certainly doesn't have anything to do with the script itself.

Here's the list of possible improvements I want to implement:
1. Letting the user type in which languages are learned, so that the files will indicate the L1-L2-combination.
2. Splitting the GMS-like files after a certain number of sentence pairs. Letting the user decide how many sentences the GMS-like files should contain (done).
3. Simultaneous creation of GMS-like files (in progress).
4. Letting the user decide, which kind of files should be created (GMS-A, GMS-B, GMS-C, and/or GSR-like files). Not everyone needs every kind of file (in progress).
5. Make the program work with uneven numbers of sentences.
6. Adding an "outtro" (a short tone for example) to the GMS- and GSR-like files, so that the user instantly knows when he is done with the file. This is useful if the files are used on a smartphone with an app like "Smart AudioBook Player" which plays the sound files one after the other without any break.
7. Letting the user decide how many new sentences shall be introduced each day.
8. Letting the user decide for how many days the sentences shall be revised.
9. Letting the user decide how often the sentences will be repeated each day (five-, four-, three-, or twofold, or just once).
10. Letting the user decide if he first wants to learn the new sentences and then revise older sentences or the other way around. Letting the user decide if he wants to revise the oldest or the newest sentences first.

The first five modifications will be relatively easy to implement, but the last five amendments will be quite tough. I will do my best, but I don't have too much time at my hands to work on this.
2 x

Online
User avatar
neumanc
Orange Belt
Posts: 114
Joined: Sat Jul 18, 2015 11:19 am
Location: Düsseldorf (Germany)
Languages: Speaks: German (native), English
Studies: French (advanced), Dutch (advanced), Spanish (false beginner)
Mostly forgotten: Italian, Latin
x 344

Re: Solved: How to create your own Glossika-like GSR files?

Postby neumanc » Fri May 04, 2018 10:51 pm

Here's the list of possible improvements I want to implement:
1. Letting the user type in which languages are learned, so that the files will indicate the L1-L2-combination.
2. Splitting the GMS-like files after a certain number of sentence pairs. Letting the user decide how many sentences the GMS-like files should contain (done).
3. Simultaneous creation of GMS-like files (in progress).
4. Letting the user decide, which kind of files should be created (GMS-A, GMS-B, GMS-C, and/or GSR-like files). Not everyone needs every kind of file (in progress).
5. Make the program work with uneven numbers of sentences.
6. Adding an "outtro" (a short tone for example) to the GMS- and GSR-like files, so that the user instantly knows when he is done with the file. This is useful if the files are used on a smartphone with an app like "Smart AudioBook Player" which plays the sound files one after the other without any break.
7. Letting the user decide how many new sentences shall be introduced each day.
8. Letting the user decide for how many days the sentences shall be revised.
9. Letting the user decide how often the sentences will be repeated each day (five-, four-, three-, or twofold, or just once).
10. Letting the user decide if he first wants to learn the new sentences and then revise older sentences or the other way around. Letting the user decide if he wants to revise the oldest or the newest sentences first.

I have now implemented modifications number 1, 2 and 4. I did not implement modification number 3, because I fear that this might lead to memory issues that are to be avoided. Instead, the program will question you which bit rate the sound files should have. If you type in a low bit rate (e.g. "48k"), this might help with the memory issues, because the script mainly works with the sentence pair files which are created first, and if these have a low bit rate, less memory should be used. Searching the Internet on the "MemoryError" (which seems to be a notorious problem with pydub), I found a possible solution (i.e. forcing Python's "garbage collector" to run) at https://github.com/jiaaro/pydub/issues/89, which I implemented into the script, too. This won't make the program faster though, but let's see if it helps. The current version of the script is the following:

Code: Select all

#Overlearning File Creator V2.0

import pydub
from pydub import AudioSegment

import gc

def create_sentence_pair(filenumber, silence_length):
#Creates one sentence pair from source and target file.
   
   source_filename = L1+"-"+filenumber+".mp3"
   target_filename = L2+"-"+filenumber+".mp3"
         
   source = AudioSegment.from_mp3(source_filename)
   target = AudioSegment.from_mp3(target_filename)
   target_length = len(target)+added_silence
      
   if silence_length == 0:
      silence = AudioSegment.silent(duration=target_length)
   else:
      silence = AudioSegment.silent(duration=silence_length)

   pair = source + silence + target + silence      
   pair_filename = L1+"-"+L2+"-"+filenumber+".mp3"   
   print(pair_filename)
   pair.export(pair_filename, format="mp3", bitrate=bit_rate)

def create_all_sentence_pairs(number_of_sentences, silence_length):
#Creates as many sentence pairs as there are sentences to be processed.
   
   for number in range(1, number_of_sentences+1):
      filenumber=str(number)      
      create_sentence_pair(filenumber, silence_length)

def create_pattern_list (pattern):
#Applies the relevant pattern.

   list = []   
   for number in range(10):   
      list.extend(pattern)
      for counter in range(0, len(pattern)):
         pattern[counter] = (pattern[counter]+1)%10   
         if pattern[counter] == 0:
            pattern[counter] = 10      
   return list   

def add_first_sentence_to_pattern_list (list, first_sentence):
#Determines the precise numbers of the sentence pairs to be added.

   for number in range(0, len(list)):
      list[number] = list[number]+first_sentence
   
def create_sessionfile(sessionnumber):
#Creates a sessionfile, which consists of new sentences, if any, and sentences to be rivised, if any.
   
   session_filename = "GSR-DAY"+str(sessionnumber).zfill(3)+".mp3"
   print(session_filename)
      
   reps = 0
      
   audiofile_new_sentences = AudioSegment.empty()
   audiofile_first_revision = AudioSegment.empty()
   audiofile_second_revision = AudioSegment.empty()
   audiofile_third_revision = AudioSegment.empty()
   audiofile_fourth_revision = AudioSegment.empty()      
   
   first_new_sentence = sessionnumber*10-9   
   if first_new_sentence < number_of_sentences:
      new_sentences = create_pattern_list(pattern=[1, 2, 3, 4, 5])
      add_first_sentence_to_pattern_list (new_sentences, first_new_sentence-1)
      print("New sentences to add: ", new_sentences)
      for number in range(len(new_sentences)):
         name_of_audiofile_to_add = L1+"-"+L2+"-"+str(new_sentences[number])+".mp3"
         print("Adding:", name_of_audiofile_to_add)
         audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)         
         audiofile_new_sentences = audiofile_new_sentences + audiofile_to_add
      reps = reps + (len(new_sentences))
   
   first_sentence_of_first_revision = sessionnumber*10-19
   if first_sentence_of_first_revision > 0 and first_sentence_of_first_revision < number_of_sentences:
      first_revision = create_pattern_list(pattern=[1, 2, 4, 7])
      add_first_sentence_to_pattern_list (first_revision, first_sentence_of_first_revision-1)
      print("Sentences to add for first revision: ", first_revision)
      for number in range(len(first_revision)):
         name_of_audiofile_to_add = L1+"-"+L2+"-"+str(first_revision[number])+".mp3"
         print("Adding:", name_of_audiofile_to_add)
         audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)
         audiofile_first_revision = audiofile_first_revision + audiofile_to_add
      reps = reps + (len(first_revision))
      
   first_sentence_of_second_revision = sessionnumber*10-29
   if first_sentence_of_second_revision > 0 and first_sentence_of_second_revision < number_of_sentences:
      second_revision = create_pattern_list(pattern=[1, 2, 4, 7])
      add_first_sentence_to_pattern_list (second_revision, first_sentence_of_second_revision-1)
      print("Sentences to add for second revision: ", second_revision)
      for number in range(len(second_revision)):
         name_of_audiofile_to_add = L1+"-"+L2+"-"+str(second_revision[number])+".mp3"
         print("Adding:", name_of_audiofile_to_add)
         audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)
         audiofile_second_revision = audiofile_second_revision + audiofile_to_add
      reps = reps + (len(second_revision))

   first_sentence_of_third_revision = sessionnumber*10-39
   if first_sentence_of_third_revision > 0 and first_sentence_of_third_revision < number_of_sentences:
      third_revision = create_pattern_list(pattern=[1, 3, 6])
      add_first_sentence_to_pattern_list (third_revision, first_sentence_of_third_revision-1)
      print("Sentences to add for third revision: ", third_revision)
      for number in range(len(third_revision)):
         name_of_audiofile_to_add = L1+"-"+L2+"-"+str(third_revision[number])+".mp3"
         print("Adding:", name_of_audiofile_to_add)
         audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)
         audiofile_third_revision = audiofile_third_revision + audiofile_to_add
      reps = reps + (len(third_revision))
   
   first_sentence_of_fourth_revision = sessionnumber*10-49
   if first_sentence_of_fourth_revision > 0 and first_sentence_of_fourth_revision < number_of_sentences:
      fourth_revision = create_pattern_list(pattern=[1, 4])
      add_first_sentence_to_pattern_list (fourth_revision, first_sentence_of_fourth_revision-1)
      print("Sentences to add for fourth revision: ", fourth_revision)
      for number in range(len(fourth_revision)):
         name_of_audiofile_to_add = L1+"-"+L2+"-"+str(fourth_revision[number])+".mp3"
         print("Adding:", name_of_audiofile_to_add)
         audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)
         audiofile_fourth_revision = audiofile_fourth_revision + audiofile_to_add
      reps = reps + (len(fourth_revision))
   
   print("Reps: ", reps)
   print("Writing session file. Please wait.\n")
   session_file = audiofile_new_sentences + audiofile_first_revision + audiofile_second_revision + audiofile_third_revision + audiofile_fourth_revision
   session_file.export(session_filename, format="mp3", bitrate=bit_rate)
   
   del audiofile_new_sentences
   del audiofile_first_revision
   del audiofile_second_revision
   del audiofile_third_revision
   del audiofile_fourth_revision
   del session_file
   gc.collect()
   
#Welcoming the user.
import os
os.system('cls' if os.name == 'nt' else 'clear')
print("Welcome to the Overlearning File Creator!\n")

#Determining source and target language
L1 = input("What is your source language? ")
L2 = input ("What is your target language? ")

#Determining how many sentence pairs there are to be processed, only multiples of 10 will do.
number_of_sentences = int(input("\nHow many sentence pairs are there? "))
if number_of_sentences%10 >0:
   number_of_sentences = (number_of_sentences//10)*10
   if number_of_sentences == 0:
      print("At least 10 sentencepairs are necessary.")
   else:
      print("Only ", number_of_sentences, "sentence pairs can be processed.")

#Processing requires at least 10 sentence pairs.
if number_of_sentences >= 10:

   #Fixed or flexible silence length?
   flexible_length = input("Shall the silence length between sentences correspond to their length (y/n)? ")
   if flexible_length == "y":   
      silence_length = 0   
      added_silence = float(input("Added silence (in seconds)?"))*1000
   else:
      silence_length = float(input("Desired silence length between sentences (in seconds)? "))*1000
      added_silence = float(0)
         
   #Determining which kind of files shall be created
   a = input("\nDo you want mass sentence files (L1, L2, L2) for sentence training (y/n)? ")
   b = input("Do you want mass sentence files (L1, L2) for interpretation training (y/n)? ")
   c = input("Do you want mass sentence files (L2 only) for sentence repetition/shadowing? ")
   if a == "y" or b == "y" or c == "y":
      sentences_per_mass_sentences_file = int(input("How many sentences per mass sentences file do you want? "))   
      if number_of_sentences/sentences_per_mass_sentences_file != number_of_sentences//sentences_per_mass_sentences_file:
         number_of_mass_sentences_files = round(number_of_sentences//sentences_per_mass_sentences_file)+1
      else:
         number_of_mass_sentences_files = int(number_of_sentences/sentences_per_mass_sentences_file)   
      print()
   spaced_repetition = input("Do you want overlearing/spaced repetition files? (y/n)? ")   

   #Determining the bit rate of the sound files to be created
   bit_rate = input("\nWhich bit rate shall the sound files have (e.g. 48k, 192k, etc.)? ")
   if bit_rate[-1] != "k":
      bit_rate = bit_rate + "k"   
   
   #Step 1: Creation of sentence pairs.
   print("\nStep 1: ", number_of_sentences, " sentence pair files (e.g. for shuffle play) will be created.")
   create_all_sentence_pairs(number_of_sentences, silence_length)
   print("Completed.")
   
   #Step 2: Creation of mass sentence files for sentence training (L1, L2, L2)
   if a == "y":
      print("\nStep 2: ", number_of_mass_sentences_files, " file(s) for sentence training (L1, L2, L2) will be created.")   

      for number in range(1, number_of_mass_sentences_files+1):
         
         number_of_first_audiofile = (number-1)*sentences_per_mass_sentences_file+1
         sentence_training_filename = ("GMS-A-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
         print(sentence_training_filename)   
         
         sentence_training_file = AudioSegment.empty()
         
         if number*sentences_per_mass_sentences_file > number_of_sentences:
            maximum_files_to_add = number_of_sentences - (number-1)*sentences_per_mass_sentences_file
         else:
            maximum_files_to_add = sentences_per_mass_sentences_file   
         
         for counter in range(maximum_files_to_add):
            
            number_of_audiofile_to_add = number_of_first_audiofile + counter
            name_of_pair_audiofile_to_add = L1+"-"+L2+"-"+str(number_of_audiofile_to_add)+".mp3"
            name_of_target_audiofile_to_add = L2+"-"+str(number_of_audiofile_to_add)+".mp3"
            pair_audiofile_to_add = AudioSegment.from_mp3(name_of_pair_audiofile_to_add)
            target_audiofile_to_add = AudioSegment.from_mp3(name_of_target_audiofile_to_add)   
            
            if silence_length == 0:
               target_length = len(target_audiofile_to_add)+added_silence
               silence = AudioSegment.silent(duration=target_length)
            else:
               silence = AudioSegment.silent(duration=silence_length)
            
            sentence_training_file = sentence_training_file + pair_audiofile_to_add + target_audiofile_to_add + silence
            print("Adding:", name_of_pair_audiofile_to_add, ", ", name_of_target_audiofile_to_add)
            
         print("Writing mass sentence file. Please wait.\n")
         sentence_training_file.export(sentence_training_filename, format="mp3", bitrate=bit_rate)

      del sentence_training_file
      gc.collect()
      print("Completed.")         
   
   else:
      print("\nStep 2: N/A")

   #Step 3: Creation of mass sentence files for interpretation training (L1, L2)   
   if b == "y":
      print("\nStep 3: ", number_of_mass_sentences_files, " file(s) for interpretation training (L1, L2) will be created.")   

      for number in range(1, number_of_mass_sentences_files+1):
         
         number_of_first_audiofile = (number-1)*sentences_per_mass_sentences_file+1
         interpretation_filename = ("GMS-B-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
         print(interpretation_filename)   
         
         interpretation_file = AudioSegment.empty()
         
         if number*sentences_per_mass_sentences_file > number_of_sentences:
            maximum_files_to_add = number_of_sentences - (number-1)*sentences_per_mass_sentences_file
         else:
            maximum_files_to_add = sentences_per_mass_sentences_file   
         
         for counter in range(maximum_files_to_add):
            
            number_of_audiofile_to_add = number_of_first_audiofile + counter
            name_of_audiofile_to_add = L1+"-"+L2+"-"+str(number_of_audiofile_to_add)+".mp3"
            audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)
            interpretation_file = interpretation_file + audiofile_to_add
            print("Adding:", name_of_audiofile_to_add)
            
         print("Writing mass sentence file. Please wait.\n")
         interpretation_file.export(interpretation_filename, format="mp3", bitrate=bit_rate)

      del interpretation_file
      gc.collect()
      print("Completed.")
   
   else:
      print("\nStep 3: N/A")
      
   #Step 4: Creation of mass sentence files for sentence repetition/shadowing (L2 only)
   if c == "y":
      print("\nStep 4: ", number_of_mass_sentences_files, " file(s) for sentence repetition/shadowing (L2 only) will be created.")   

      for number in range(1, number_of_mass_sentences_files+1):
         
         number_of_first_audiofile = (number-1)*sentences_per_mass_sentences_file+1
         repetition_filename = ("GMS-C-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
         print(repetition_filename)   
         
         repetition_file = AudioSegment.empty()
         
         if number*sentences_per_mass_sentences_file > number_of_sentences:
            maximum_files_to_add = number_of_sentences - (number-1)*sentences_per_mass_sentences_file
         else:
            maximum_files_to_add = sentences_per_mass_sentences_file   
         
         for counter in range(maximum_files_to_add):
            
            number_of_audiofile_to_add = number_of_first_audiofile + counter
            name_of_audiofile_to_add = L2+"-"+str(number_of_audiofile_to_add)+".mp3"
            audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)
            
            if silence_length == 0:
               target_length = len(target_audiofile_to_add)+added_silence
               silence = AudioSegment.silent(duration=target_length)
            else:
               silence = AudioSegment.silent(duration=silence_length)
            
            repetition_file = repetition_file + audiofile_to_add + silence
            print("Adding:", name_of_audiofile_to_add)
            
         print("Writing mass sentence file. Please wait.\n")
         repetition_file.export(repetition_filename, format="mp3", bitrate=bit_rate)

      del repetition_file
      gc.collect()
      print("Completed.")   
      
   else:
      print("\nStep 4: N/A")
   
   #Step 5: Creation of sessionfiles
   if spaced_repetition == "y":
      number_of_sessions = int(number_of_sentences/10)+4   
      print("\nStep 5: ", number_of_sessions, " session files will be created. ")
      for number in range(1, number_of_sessions+1):      
         create_sessionfile(number)
      print("Completed.")   
      
   else:
      print("\nStep 5: N/A\n")

os.system('pause' if os.name == 'nt' else 'read')
3 x

User avatar
MattG
Posts: 3
Joined: Fri Oct 30, 2015 1:25 am
Location: Huntsville, AL (USA)
Languages: English (N)
Spanish (~B2), Hebrew (A2), Persian (A1)
x 3

Re: Solved: How to create your own Glossika-like GSR files?

Postby MattG » Sat May 05, 2018 1:57 pm

neumanc wrote:
I have now implemented modifications number 1, 2 and 4. I did not implement modification number 3, because I fear that this might lead to memory issues that are to be avoided. Instead, the program will question you which bit rate the sound files should have. If you type in a low bit rate (e.g. "48k"), this might help with the memory issues, because the script mainly works with the sentence pair files which are created first, and if these have a low bit rate, less memory should be used.

Excellent job, neumanc! I ran this updated version of your script with the same set of 180 sentences as before and it worked perfectly. I tried it first with a bit rate of 48k but found that the resulting files had a little too much sibilance. So I re-ran them at 128k and they sound perfect and there were no memory error issues.
This is an awesome tool for people like me who like the overall concept of the "mass sentence" method but who are learning a language (like Hebrew) that wasn't/isn't currently supported by Glossika or who want to customize their learning approach. Thanks!
0 x

Online
User avatar
neumanc
Orange Belt
Posts: 114
Joined: Sat Jul 18, 2015 11:19 am
Location: Düsseldorf (Germany)
Languages: Speaks: German (native), English
Studies: French (advanced), Dutch (advanced), Spanish (false beginner)
Mostly forgotten: Italian, Latin
x 344

Re: Solved: How to create your own Glossika-like GSR files?

Postby neumanc » Mon Jun 25, 2018 5:13 pm

Hello everybody,

Lately, I have had some time for coding again, and I have implemented some of the desired features into the Overlearning File Creator. Here is what has been achieved so far:

neumanc wrote:1. Letting the user type in which languages are learned, so that the files will indicate the L1-L2-combination.DONE
2. Splitting the GMS-like files after a certain number of sentence pairs. Letting the user decide how many sentences the GMS-like files should contain.DONE
3. Simultaneous creation of GMS-like files.DONE
4. Letting the user decide, which kind of files should be created (GMS-A, GMS-B, GMS-C, and/or GSR-like files). Not everyone needs every kind of file.DONE
5. Make the program work with uneven numbers of sentences.
6. Adding an "outtro" (a short tone for example) to the GMS- and GSR-like files, so that the user instantly knows when he is done with the file. This is useful if the files are used on a smartphone with an app like "Smart AudioBook Player" which plays the sound files one after the other without any break.
7. Letting the user decide how many new sentences shall be introduced each day.
8. Letting the user decide for how many days the sentences shall be revised.DONE
9. Letting the user decide how often the sentences will be repeated each day (five-, four-, three-, or twofold, or just once).
10. Letting the user decide if he first wants to learn the new sentences and then revise older sentences or the other way around. Letting the user decide if he wants to revise the oldest or the newest sentences first.DONE

There are also a couple of other improvements, e.g. the user can now alter several parameters.

And here's the code of version 2.5:

Code: Select all

# Overlearning File Creator V2.5

import pydub
from pydub import AudioSegment

def create_sentence_pair(filenumber, silence_length):
# Creates one sentence pair from source and target file
   
   source_filename = L1+"-"+filenumber+".mp3"
   target_filename = L2+"-"+filenumber+".mp3"
         
   source = AudioSegment.from_mp3(source_filename)
   target = AudioSegment.from_mp3(target_filename)
         
   if not fixed_silence_length:   
      target_length = len(target)+added_silence
      if target_length > maximum_silence_length: target_length = maximum_silence_length      
      elif target_length < minimum_silence_length: target_length = minimum_silence_length
      silence = AudioSegment.silent(duration=target_length)
   else: silence = AudioSegment.silent(duration=silence_length)

   pair = source + silence + target + silence      
   pair_filename = title+"-"+L1+"-"+L2+"-"+filenumber+".mp3"   
   print(pair_filename)
   pair.export(pair_filename, format="mp3", bitrate=bit_rate)

def create_pattern_list (pattern):
# Applies the relevant pattern.

   list = []   
   for number in range(10):   
      list.extend(pattern)
      for counter in range(0, len(pattern)):
         pattern[counter] = (pattern[counter]+1)%10   
         if pattern[counter] == 0:
            pattern[counter] = 10      
   return list   
   
def add_number_of_first_sentence_to_pattern_list (list, number_of_first_sentence):
# Determines the precise numbers of the sentence pairs to be added.

   for number in range(0, len(list)):
      list[number] = list[number]+number_of_first_sentence
   
def create_list_of_sentences_to_be_added_to_sessionfile (first_sentence, pattern_number):
# Creates a list of sentences to be added to sessionfile
   
   if pattern_number == 0: pattern = [1, 2, 3, 4, 5]
   elif pattern_number == 1: pattern = [1, 2, 4, 7]
   elif pattern_number == 2: pattern = [1, 2, 4, 7]
   elif pattern_number == 3: pattern = [1, 3, 6]
   elif pattern_number == 4: pattern = [1, 4]
   else: pattern = [1]
      
   list_of_sentences = create_pattern_list(pattern)
   add_number_of_first_sentence_to_pattern_list (list_of_sentences, first_sentence)
   
   return list_of_sentences
   
def create_sessionfile(sessionnumber, days_of_revision, first, last, step):
# Creates a sessionfile, which consists of new sentences, if any, and sentences to be revised, if any (or vice versa).

   session_filename = (title+"-"+L1+"-"+L2+"-GSR-DAY")+str(sessionnumber).zfill(3)+".mp3"
   print(session_filename)
   session_file =  AudioSegment.empty()   
   reps = 0
         
   for number in range(first, last, step):

      first_sentence = sessionnumber*10-(number*10+9)
      if first_sentence > 0 and first_sentence < number_of_sentences:
         
         sentences = create_list_of_sentences_to_be_added_to_sessionfile (first_sentence-1, number)
         if number == 0:   print ("New sentences to add: ", sentences)
         else: print ("Sentences to add for revision #", number, ": ", sentences, sep="")
         
         for counter in range(len(sentences)):
            
            source_filename = L1+"-"+str(sentences[counter])+".mp3"
            target_filename = L2+"-"+str(sentences[counter])+".mp3"
            source = AudioSegment.from_mp3(source_filename)
            target = AudioSegment.from_mp3(target_filename)   
         
            if not fixed_silence_length:
               target_length = len(target)+added_silence
               if target_length > maximum_silence_length: target_length = maximum_silence_length
               elif target_length < minimum_silence_length: target_length = minimum_silence_length
               silence = AudioSegment.silent(duration=target_length)         
            else:
               silence = AudioSegment.silent(duration=silence_length)
         
            print("Adding:", source_filename, target_filename)            
            session_file = session_file + source + silence + target + silence
            reps += 1         
      
   print("Reps: ", reps)
   print("Writing session file. Please wait.\n")
   session_file.export(session_filename, format="mp3", bitrate=bit_rate)
   
# Start
# Welcoming the user.
import os
os.system('cls' if os.name == 'nt' else 'clear')
print("Welcome to the Overlearning File Creator!\n")

# Determining project title, source and target language
title = input("What is your project title? ")
L1 = input("What is your source language? ")
L2 = input("What is your target language? ")

# Determining how many sentence pairs there are to be processed, only multiples of 10 will do.
number_of_sentences = int(input("\nHow many sentence pairs are there? "))
if number_of_sentences%10 >0:
   number_of_sentences = (number_of_sentences//10)*10
   if number_of_sentences == 0: print("At least 10 sentencepairs are necessary.")
   else: print("Only ", number_of_sentences, "sentence pairs can be processed (only multiples of 10).")

# Processing requires at least 10 sentence pairs.
if number_of_sentences >= 10:

   # Setting standard preferences
   fixed_silence_length = True            # Silence between sentences has fixed lengths   
   silence_length = 2000               # Fixed silence lengths is 2 seconds
   maximum_silence_length = 10000         # Maximum silence lengths of 10 seconds
   minimum_silence_length =  1000          # Minimum silence lengths of 1 second
   added_silence = 0                  # Added silence in case of flexible silence length
   bit_rate = "64k"                  # Bit rate of output soundfiles
   sentence_pair_files = False            # Sentence pair files will not be created
   a_files = True                     # GMS-A-files will be created
   b_files = True                     # GMS-B-files will be created
   c_files = True                     # GMS-C-files will be created
   sentences_per_mass_sentences_file = 50   # GMS-files will have 50 sentences
   spaced_repetition_files = True         # GSR-files will be created
   days_of_revision = 4               # GSR-files will have 4 x 10 old sentences
   revision_first = True               # GSR-files will first present the oldest, then the newest sentences
   first_session_file = 1               # Creation of GSR-files will begin mit the file for day one

   # User defined preferences?
   standard_preferences = bool(input("\nDo you want all standard preferences (default) (y/n)? ") =="y")
   if not standard_preferences:   

      # Fixed or flexible silence length?
      fixed_silence_length = bool(input("\nDo you want fixed silence length between sentences (default) (y/n)? ")=="y")
      if fixed_silence_length:   
         two_seconds_of_silence = bool(input("Do you want 2 seconds of silence between sentences (default) (y/n)? ")=="y")
         if not two_seconds_of_silence:
            silence_length = float(input("Desired silence length between sentences (in seconds)? "))*1000   
            if silence_length > maximum_silence_length: silence_length = maximum_silence_length
            #Not more than 4 seconds of silence because of memory error issue with Pydub
            elif silence_length < minimum_silence_length: silence_length = minimum_silence_length
            added_silence = 0
      else:
         print("Silence length shall correspond to the length of the sentences.")
         added_silence = float(input("Added silence (in seconds)?"))*1000

      # Determining the bit rate of the sound files to be created
      standard_bit_rate = bool(input("\nDo you want a bitrate of 64k (default) (y/n)? ")=="y")
      if not standard_bit_rate:
         bit_rate = input("Which bitrate shall the sound files have (e.g. 32k, 48k, 64k, 128k, 192k, etc.)? ")
         if bit_rate[-1] != "k": bit_rate = bit_rate + "k"      

      # Determining which kind of files shall be created      
      sentence_pair_files = bool(input("\nDo you want sentence pair files (e.g. for shuffeling) (y/n)? ")=="y")   
      a_files = bool(input("Do you want mass sentence files (L1, L2, L2) for sentence training (default) (y/n)? ")=="y")
      b_files = bool(input("Do you want mass sentence files (L1, L2) for interpretation training (default) (y/n)? ")=="y")
      c_files = bool(input("Do you want mass sentence files (L2 only) for sentence repetition/shadowing (default) (y/n)? ")=="y")
      if a_files or b_files or c_files:
         standard_mass_sentences = bool(input("Do you want 50 sentences per mass sentences file (default) (y/n)? ")=="y")
         if not standard_mass_sentences: sentences_per_mass_sentences_file = int(input("How many sentences per mass sentences file do you want? "))   
         print()
      
      spaced_repetition_files = bool(input("Do you want overlearning/spaced repetition files (default)? (y/n)? ")=="y")
      if spaced_repetition_files:      
         
         standard_days_of_revision = bool(input("Do you want 4 days of revision (default) (y/n)? ")=="y")
         if not standard_days_of_revision:
            days_of_revision = int(input("How many days do you want to revise the sentences? "))
         if days_of_revision > 0: revision_first = bool(input("Do you want revisions first (default) (y/n)? ")=="y")
         if not revision_first: print("New sentences shall be presented first, then the revisions.")
         
         start_with_first_day = bool(input("Do you want to begin the process with the first session file (default) (y/n)? ")=="y")
         if not start_with_first_day: first_session_file = int(input("With which session file shall the process begin (1, 2, ...)?"))
   
   # Informing user about specifications of soundfiles
   print("\nSpecifications of soundfiles:")
   print("\nSentence pair files:", sentence_pair_files)
   if sentence_pair_files:
      print("Fixed length:", fixed_silence_length)
      if fixed_silence_length:
         print("Silence length:", silence_length/1000, "seconds")
      else:
         print("Flexible silence length:", True)
         print("Added silence:", added_silence/1000, "seconds")
      print("Bit rate:", bit_rate)
   print("A-files:", a_files)
   print("B-files:", b_files)
   print("C-files:", c_files)
   if a_files or b_files or c_files: print("Sentences per mass sentences file:", sentences_per_mass_sentences_file)
   print("Spaced repetition files:", spaced_repetition_files)
   if spaced_repetition_files:
      print("Days of revision:", days_of_revision)
      if days_of_revision > 0: print("Revision first:", revision_first)   
      print("First session file:", first_session_file)
   
   # Creation of sound files
   print("\nSoundfiles will now be created.")
   
   # Step 1: Creation of sentence pairs.
   if sentence_pair_files:
      print("\nStep 1: ", number_of_sentences, " sentence pair files (e.g. for shuffle play) will be created.")
      for number in range(1, number_of_sentences+1):
         filenumber=str(number)      
         create_sentence_pair(filenumber, silence_length)
      print("Completed.")
   else:
      print("\nStep 1: Creation of sentence pair files: N/A")
   
   # Step 2: Creation of mass sentences files
   if a_files or b_files or c_files:      
      if number_of_sentences/sentences_per_mass_sentences_file != number_of_sentences//sentences_per_mass_sentences_file:
         number_of_mass_sentences_files = round(number_of_sentences//sentences_per_mass_sentences_file)+1
      else:
         number_of_mass_sentences_files = int(number_of_sentences/sentences_per_mass_sentences_file)   
      print("\nStep 2: ", number_of_mass_sentences_files, " (sets of) mass sentences file(s) will be created.")   
         
      for number in range(1, number_of_mass_sentences_files+1):   
      
         number_of_first_audiofile = (number-1)*sentences_per_mass_sentences_file+1
         a_file_name = (title+"-"+L1+"-"+L2+"-GMS-A-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
         b_file_name = (title+"-"+L1+"-"+L2+"-GMS-B-") + str(number_of_first_audiofile).zfill(4) + ".mp3"         
         c_file_name = (title+"-"+L1+"-"+L2+"-GMS-C-") + str(number_of_first_audiofile).zfill(4) + ".mp3"      
         print(a_file_name, b_file_name, c_file_name)
         
         a_file = AudioSegment.empty()
         b_file = AudioSegment.empty()
         c_file = AudioSegment.empty()

         if number*sentences_per_mass_sentences_file > number_of_sentences:
            maximum_files_to_add = number_of_sentences - (number-1)*sentences_per_mass_sentences_file
         else:
            maximum_files_to_add = sentences_per_mass_sentences_file   
         
         for counter in range(maximum_files_to_add):

            filenumber = number_of_first_audiofile + counter
            source_filename = L1+"-"+str(filenumber)+".mp3"
            target_filename = L2+"-"+str(filenumber)+".mp3"
            source = AudioSegment.from_mp3(source_filename)
            target = AudioSegment.from_mp3(target_filename)   
         
            if not fixed_silence_length:
               target_length = len(target)+added_silence
               if target_length > maximum_silence_length: target_length = maximum_silence_length
               elif target_length < minimum_silence_length: target_length = minimum_silence_length
               silence = AudioSegment.silent(duration=target_length)         
            else:
               silence = AudioSegment.silent(duration=silence_length)

            if a_files:
               a_file = a_file + source + silence + target + silence + target + silence
               print("Adding to A-file:", source_filename, target_filename, target_filename)               
            if b_files:
               b_file = b_file + source + silence + target + silence
               print("Adding to B-file:", source_filename, target_filename)               
            if c_files:
               c_file = c_file + target + silence
               print("Adding to C-file:", source_filename)

         print("Writing mass sentence file(s). Please wait.\n")
         if a_files: a_file.export(a_file_name, format="mp3", bitrate=bit_rate)
         if b_files: b_file.export(b_file_name, format="mp3", bitrate=bit_rate)
         if c_files: c_file.export(c_file_name, format="mp3", bitrate=bit_rate)

      del a_file
      del b_file
      del c_file
      print("Completed.")               
            
   else:
      print("\nStep 2: Creation of mass sentences files: N/A")      
   
   # Step 3: Creation of sessionfiles
   if spaced_repetition_files:
      number_of_sessions = int(number_of_sentences/10) + days_of_revision - first_session_file + 1
      print("\nStep 3: ", number_of_sessions, " session files will be created. ")
      
      if revision_first:
         first = days_of_revision
         last = -1
         step = -1
      else:
         first = 0
         last = days_of_revision+1
         step = +1
      
      for number in range(first_session_file, first_session_file + number_of_sessions):      
         create_sessionfile(number, days_of_revision, first, last, step)
      
      print("Completed.")   
      
   else:
      print("\nStep 3: Creation of spaced repetition files: N/A\n")

os.system('pause' if os.name == 'nt' else 'read')


How to use the Overlearning File Creator on Windows 10

1. How to install WinPython (portable version)
a) Go to https://sourceforge.net/projects/winpython/files/WinPython_3.6/3.6.5.0/, then click on "Download Latest Version". At the time of writing, this is WinPython 3.6.5.0Qt5-64bit. You should use the 64bit version in order to prevent the memory issues of Pydub described by MattG.
b) Click on the downloaded exe-file, this will install WinPython Portable on your system.

2. How to install Libav (Pydub relies on Libav, so this needs to be done)
a) Go to http://builds.libav.org/windows/nightly-lgpl/ to get the newest nightly build of Libav. Choose the newest file at the bottom of the page. At the time of writing, this is libav-x86_64-w64-mingw32-20180108.7z dating from 8th January 2018. Download it by clicking on it. Warning: This is a nightly build, so it may contain errors. The latest release version can be found at http://builds.libav.org/windows/release-lgpl/, but it's already three (!) years old.
b) Extract the downloaded 7z-file somewhere on your computer where it won't be deleted. You will need the freeware "7-zip" for this, which you can download at http://www.7-zip.de.
c) Open the folder that contains the extracted files. Within this folder, open the sub-folder usr, then the sub-sub-folder bin. Click on the address bar and copy the path to this sub-sub-folder (CTRL-C).
d) Still using the Windows Explorer, search on the left sidebar for This PC and right-click on it. A context menu will open, where you click on Properties.
e) A new window will open where you can "View basic information about your computer". In this window, on the left sidebar, click on Advanced system settings.
f) A new window will open called System Properties, where you must click on Environment Variables.
g) A new window will open called Environment Variables. On the bottom, you will find the System Variables. Click on the system variable Path.
h) A new window will open where you can edit the environment variable. Click on New and paste the path to the above-mentioned sub-sub-folder bin. Save and close everything.

3. How to install Pydub (without Pydub, the script will not work)
a) Go to https://pypi.org/project/pydub/#files and click on pydub-0.22.1.tar.gz
b) Go to the folder where WinPython in installed and start the program WinPython Control Panel.exe by double-clicking on it.
c) In the tab Install/upgrade packages, click on Add packages.
d) Go to your download folder and click on pydub-0.22.1.tar.gz and then on Open, this will install Pydub.

4. How to get the Overlearning File Creator
a) Create a new text file on your desktop.
b) Rename it properly (e.g. Overlearning File Creator) and change the ending to ".py".
c) Mark all the above code in this post and copy it (CTRL-C).
d) Open the renamed file and paste the code into it (CTRL-V).
e) Save the renamed file.

5. How to prepare the audio
a) Create a new folder on your desktop and name it properly (e.g. "English-French audio").
b) Move the py-file into this folder.
c) Move all the source and target audio (mp3) files into this folder. The source and the target files should contain corresponding sentences (or short passages) and must be numbered accordingly, e.g. "EN-1.mp3" and "FR-1.mp3" and up.

6. How to run the script
a) Go to the folder where WinPython is installed and open the program IDLEX (Python GUI).exe by double-clicking on it.
b) Click on the tab File, then on Open. Choose the folder on your desktop containing the script and the audio.
c) Now choose the py-file and click on Open. This will open a new WinPython window where you can read the script.
d) Click on the tab Run and then on Run Module. Voilà, the script should be running on your machine.
e) Answer all the questions the script will prompt you. As project title, you could choose the source where you got the audio from. Every file the script will create will be named beginning with the project title. Then you must let the script know which are the source and the target language (e.g. "EN" and "FR") in order to let the script find the audio files. If you choose the "standard preferences" you will get mass sentences files and spaced repetition files that will be exactly like those of the old Glossika with the exception that the silence length between sentences is always two seconds instead of one (this will give you a second more to think before speaking with the audio). Otherwise, you might set the preferences as you like. Just try it out.
f) Unfortunately, the 64bit version of WinPython runs very slowly (but reliably). If you want to have faster progress and don't shy away from memory errors, you could also use the 32bit version. For this, go to the folder where WinPython is installed and open the program WinPython Command Prompt.exe by double-clicking on it. This will start the Windows command prompt. Use the cd-command to choose the folder containing the audio and the script. Start the script by typing in its name including the extension and pressing "return". Everything else should work exactly the same, but much faster. On my machine, each GSR-like file takes about 2 minutes to create. In order to avoid a memory error, your audio files should be very short. Furthermore, you should choose a short silence length (e.g. 2 seconds).

Important note: If you follow any of the above instructions, you do so at your own risk. I am neither a computer scientist nor a professional programmer and I do not know enough about any risks that might accompany the installation of WinPython, Libav and Pydub. This is what runs on my computer without it seemingly being damaged and what I wanted to share with you.

What can I do if I don't have bilingual audio?

If you don't have bilingual audio (for example the split mp3 files from Assimil), you can still use the Overlearning File Creator to create Glossika-like files. How so? If you don't have cues in your mother language (or any other language) and if you don't want to create them in a very tedious process, you must search for another kind of cue. While working through my Assimil files by listening, repeating and using heavily the pause button, I noticed that I only had to hear the beginning of a sentence to remember the whole next sentence. In my opinion, the best cue for a sentence is its beginning! The first one or two words together with their intonation and the speaker's voice are enough. Obviously, this will only work with sentences you can understand just fine (but for some reason can't reproduce fluently enough yet). That's why the idea came up to write another script that serves to cut off the beginnings of sentences and save them as new files that can serve as cues. Voilà:

Code: Select all

#Cue Creator

import pydub
from pydub import AudioSegment

#Welcoming the user.
import os
os.system('cls' if os.name == 'nt' else 'clear')
print("Welcome to the Cue Creator!")

#Gathering necessary information
number_of_sentences = int(input("\nHow many sentencepairs are there? "))
length_of_cue = float(input("Desired cue length (in seconds)? "))*1000
if length_of_cue < 250:
   length_of_cue = 250

#Creating Cues
print("Creating cues:")

for filenumber in range(1, number_of_sentences+1):

   sentence_filename="target-"+str(filenumber)+".mp3"
   sentence = AudioSegment.from_mp3(sentence_filename)
      
   cue_filename="source-"+str(filenumber)+".mp3"      
   cue_raw = sentence[:length_of_cue+250]
   cue = cue_raw.fade_out(250)
   
   cue.export(cue_filename, format="mp3")
   print(cue_filename)
   
print("Completed.")
os.system('pause' if os.name == 'nt' else 'read')


Enjoy!
Last edited by neumanc on Mon Jun 25, 2018 9:12 pm, edited 1 time in total.
4 x

Hashimi
Orange Belt
Posts: 186
Joined: Sun Jan 10, 2016 12:45 pm
x 227

Re: Solved: How to create your own Glossika-like GSR files?

Postby Hashimi » Mon Jun 25, 2018 9:09 pm

Great!

Could you explain in more detail the second script? how does it work? can you show us an example?
0 x

Online
User avatar
neumanc
Orange Belt
Posts: 114
Joined: Sat Jul 18, 2015 11:19 am
Location: Düsseldorf (Germany)
Languages: Speaks: German (native), English
Studies: French (advanced), Dutch (advanced), Spanish (false beginner)
Mostly forgotten: Italian, Latin
x 344

Re: Solved: How to create your own Glossika-like GSR files?

Postby neumanc » Mon Jun 25, 2018 10:49 pm

Hashimi wrote:Could you explain in more detail the second script? how does it work? can you show us an example?

Sure, I will explain the functioning and the reasoning behind the second script in more detail:

First Step: Use the Cue Creator to create cues which are one to three words long
The Cue Creator expects mp3 files in the same folder as the script. These files should contain single sentences or (very) short passages in your target language only. The files must be named and numbered "target-1.mp3" (and up). After starting the script, it will prompt you how many sentences there are and how long the cues shall be. After having typed in for example 0.6 (seconds), the script will work through all the sentences and produce mp3 files named "source-1.mp3" (and up). These "source" files will contain roughly the first one or two words of the respective sentences they were made of. The cues won't end abruptly at the end of 0.6 seconds, but fade out smoothly within the last 0.25 seconds. Of course, you can also choose to have longer cues by typing in higher values, this depends on the audio speed of your sentence files. You really need to hear at least one full word, but it should not be more than three. Together with the intonation and the speaker's voice, this should be enough to give you the chance to recall the sentences from memory. From what I've read here and elsewhere, recalling something is supposed to be especially good for memorizing it.

Second Step: Use the Overlearning File Creator to create Glossika-like files which contain cues and sentences in the target language only
In the next step, you can run the Overlearning File Creator (which must be copied into the same folder) and use the "source"-files as "source language" and the "target"-files as "target language". The end product will be GMS-like mass sentences files and GSR-like spaced repetition files which will keep your thoughts totally immersed in the target language. I have tried this out and in my opinion this works fine for sentences you can already understand (for example because you already read them in the corresponding coursebook) but cannot produce fluently enough yet. To achieve optimal results, you may consider deselecting "fixed silence length" what will have the effect that the pauses between the cues and the target language sentences will correspond to the length of the respective sentences. In order to have some additional time to remember the sentence, you might also consider to use "added silence" of at least 0.2 seconds. Please take into account that there will be a pause of equal length after each sentence. This will give you the possibility to recall the sentence and speak it before the audio, then to shadow it with the audio and then to repeat it after the audio. This makes up to 3 reps for each sentence. Furthermore, I would advise not to choose the default 50 sentences per mass sentences file, but only 10. If you prefer to memorize the sentences with the GSM-like files instead of the GSR-like files, you could set the GSM-like files to loop. After 10 sentences, you have a fair chance to remember most if not all sentences, not so after having heard and spoken 50 different sentences in a row.

What do you think about this learning method?
Using this approach you can convert just about any audio resource into Glossika-like files. The only thing you will have to do upfront is cutting the audio into single sentences if it has not already been done for you (e.g. the split mp3 files of Assimil). Please understand that I have not yet used this learning method long enough and, therefore, cannot (yet) vouch for it. But I don't see any reason why it shouldn't work. In any case, it's better for memorizing and fluency than just repeating after the audio. For me, it is also better than shadowing alone, because it serves to make me more "aware" of the sentences I want to learn. I would be very much interested in hearing the opinion of experienced learners about the learning method outlined above.
5 x

Online
User avatar
neumanc
Orange Belt
Posts: 114
Joined: Sat Jul 18, 2015 11:19 am
Location: Düsseldorf (Germany)
Languages: Speaks: German (native), English
Studies: French (advanced), Dutch (advanced), Spanish (false beginner)
Mostly forgotten: Italian, Latin
x 344

Re: Solved: How to create your own Glossika-like GSR files?

Postby neumanc » Tue Jun 26, 2018 11:56 am

Example files produced with the Cue Creator and the Overlearning File Creator

In order to give you all a better understanding of the kind of files that can be produced with the Cue Creator and the Overlearning File Creator, I produced a demonstration for you. For demonstration purposes, I used a public domain Librivox recording as audio source: the first chapter of "Les précieuses ridicules" by Molière, which I downloaded here: https://tinyurl.com/ybtv493s.

First, I cut the audio file into 40 sentences within a few minutes using the freeware "Mp3DirectCut". Then, I renamed the files as "target-1.mp3" up to "target-40.mp3" using the freeware "ReNamer Lite". After that, I started the Cue Creator, which produced 40 cues named "source-1.mp3" up to "source-40.mp3". This was done in one minute. Please see the log: https://tinyurl.com/yc9zbtoc. Then, I started the Overlearning File Creator. I set the following parameters: silence length according to the length of the sentences, added silence of 0.3 seconds, 10 sentences (instead of 50) per GMS-like file, 5 revisions (instead of 4) per GSR-like file, and new sentences first, revisions afterwards. This process took 11 minutes. Please see the log: https://tinyurl.com/ybbdyml4 and the directory listing of all files produced: https://tinyurl.com/ycef5gnu.

The resulting overlearning example files can be listened to or downloaded here: https://tinyurl.com/yb3xogo5.

Please let me know what you think about this file format for (over-)learning purposes. Thanks!
2 x


Return to “Language Programs and Resources”

Who is online

Users browsing this forum: No registered users and 1 guest