Solved: How to create your own Glossika-like GSR files?

All about language programs, courses, websites and other learning resources
Hashimi
Orange Belt
Posts: 186
Joined: Sun Jan 10, 2016 12:45 pm
x 227

Re: Solved: How to create your own Glossika-like GSR files?

Postby Hashimi » Tue Apr 10, 2018 12:34 am

This is really amazing! Success on your first try after 25 years of no programming. Great work!

You are a natural autodidact. I wish I could do the same.
2 x

User avatar
Axon
Green Belt
Posts: 417
Joined: Thu Jun 16, 2016 12:29 am
Languages: Comfortable: German, Mandarin, Indonesian.
Rusty: Spanish, French, Russian.
Also: Cantonese, Vietnamese, Polish.
Language Log: viewtopic.php?f=15&t=5086
x 1300

Re: Solved: How to create your own Glossika-like GSR files?

Postby Axon » Tue Apr 10, 2018 9:43 am

Oh man.

Neumanc messaged me asking very politely about my project, and I gave to him my very messy code - which had been cobbled together over a few weeks of frustration more than a year ago. XKCD has a couple of choice words for code like mine.

Then in a few days he cranks out a solution that's more readable, more complete, and more scalable than mine in every sense. As his first programming project in decades.

Respect!
3 x

Online
User avatar
neumanc
Orange Belt
Posts: 114
Joined: Sat Jul 18, 2015 11:19 am
Location: Düsseldorf (Germany)
Languages: Speaks: German (native), English
Studies: French (advanced), Dutch (advanced), Spanish (false beginner)
Mostly forgotten: Italian, Latin
x 344

Re: Solved: How to create your own Glossika-like GSR files?

Postby neumanc » Tue Apr 10, 2018 9:58 am

Dear fellow language learners,

After having heard back from the moderators, I hereby post the script for everyone to use for their personal and private use only. It's not elegant or as short as possible, but hey, it's my first Python script ever. Especially the function "create_sessionfile" could have been improved upon, I know. But it's working, so I won't change it for the moment.

You need to have Python 3, pydub and ffmpeg or libav installed on your system. This is probably the hardest part in getting the script to run. Then you must save the script as a text-file, but with a ".py" file extension. After that, place all source audio files (titled "source-#.mp3", the "#" meaning the sentence number) and all target audio files (titled "target-#.mp3", the "#" meaning the sentence number) in the same directory as the script. The numbers of your L1 and L2 sentence audio files must correspond to each other, of course. Then double-click on the script. You will be prompted some specifications, then the gsr and gsm-like files will be created in the same directory. The program will tell you continuously which files it is creating at that moment. The gsr files will be those named "session-#.mp3", "#" meaning the number of your learning session. I tested the script only on a Windows computer. If you have another operating system and the script won't run, you could try do delete those lines containing "import os" and "os.system", they are expendable.

If you have any ideas on how to improve the script, let's discuss them. One improvement could be to let the user choose how many repetitions he wants, because I for myself think that repeating each sentence 18 times within 5 learning sessions is sometimes a little bit too much. I tried to implement this, but it didn't work so far. Maybe I will implement this later.

Enjoy!
neumanc

Code: Select all

#Overlearning File Creator

import pydub
from pydub import AudioSegment

def create_sentencepair(filenumber, silence_length):
#Creates one sentencepair from source and target file.
   
   target_filename="target-"+filenumber+".mp3"
   source_filename="source-"+filenumber+".mp3"
      
   target = AudioSegment.from_mp3(target_filename)
   source = AudioSegment.from_mp3(source_filename)
   target_length = len(target)+added_silence
      
   if silence_length == 0:
      silence = AudioSegment.silent(duration=target_length)
   else:
      silence = AudioSegment.silent(duration=silence_length)

   pair = source + silence + target + silence      
   pair_filename="pair-"+filenumber+".mp3"   
   print(pair_filename)
   pair.export(pair_filename, format="mp3")

def create_all_sentencpairs(number_of_sentences, silence_length):
#Creates as many sentencpairs as there are sentences to be processed.
   
   for number in range(1, number_of_sentences+1):
      filenumber=str(number)      
      create_sentencepair(filenumber, silence_length)

def create_pattern_list (pattern):
#Applies the relevant pattern.

   list = []   
   for number in range(10):   
      list.extend(pattern)
      for counter in range(0, len(pattern)):
         pattern[counter] = (pattern[counter]+1)%10   
         if pattern[counter] == 0:
            pattern[counter] = 10      
   return list   

def add_first_sentence_to_pattern_list (list, first_sentence):
#Determines the precise numbers of the sentencpairs to be added.

   for number in range(0, len(list)):
      list[number] = list[number]+first_sentence
   
def create_sessionfile(sessionnumber):
#Creates a sessionfile, which consists of new sentences, if any, and sentences to be rivised, if any.
   
   session_filename = "session-"+str(sessionnumber)+".mp3"
   print(session_filename)
      
   reps = 0
      
   audiofile_new_sentences = AudioSegment.empty()
   audiofile_first_revision = AudioSegment.empty()
   audiofile_second_revision = AudioSegment.empty()
   audiofile_third_revision = AudioSegment.empty()
   audiofile_fourth_revision = AudioSegment.empty()      
   
   first_new_sentence = sessionnumber*10-9   
   if first_new_sentence < number_of_sentences:
      new_sentences = create_pattern_list(pattern=[1, 2, 3, 4, 5])
      add_first_sentence_to_pattern_list (new_sentences, first_new_sentence-1)
      print("New sentences to add: ", new_sentences)
      for number in range(len(new_sentences)):
         name_of_audiofile_to_add = "pair-"+str(new_sentences[number])+".mp3"
         print("Adding:", name_of_audiofile_to_add)
         audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)         
         audiofile_new_sentences = audiofile_new_sentences + audiofile_to_add
      reps = reps + (len(new_sentences))
   
   first_sentence_of_first_revision = sessionnumber*10-19
   if first_sentence_of_first_revision > 0 and first_sentence_of_first_revision < number_of_sentences:
      first_revision = create_pattern_list(pattern=[1, 2, 4, 7])
      add_first_sentence_to_pattern_list (first_revision, first_sentence_of_first_revision-1)
      print("Sentences to add for first revision: ", first_revision)
      for number in range(len(first_revision)):
         name_of_audiofile_to_add = "pair-"+str(first_revision[number])+".mp3"
         print("Adding:", name_of_audiofile_to_add)
         audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)
         audiofile_first_revision = audiofile_first_revision + audiofile_to_add
      reps = reps + (len(first_revision))
      
   first_sentence_of_second_revision = sessionnumber*10-29
   if first_sentence_of_second_revision > 0 and first_sentence_of_second_revision < number_of_sentences:
      second_revision = create_pattern_list(pattern=[1, 2, 4, 7])
      add_first_sentence_to_pattern_list (second_revision, first_sentence_of_second_revision-1)
      print("Sentences to add for second revision: ", second_revision)
      for number in range(len(second_revision)):
         name_of_audiofile_to_add = "pair-"+str(second_revision[number])+".mp3"
         print("Adding:", name_of_audiofile_to_add)
         audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)
         audiofile_second_revision = audiofile_second_revision + audiofile_to_add
      reps = reps + (len(second_revision))

   first_sentence_of_third_revision = sessionnumber*10-39
   if first_sentence_of_third_revision > 0 and first_sentence_of_third_revision < number_of_sentences:
      third_revision = create_pattern_list(pattern=[1, 3, 6])
      add_first_sentence_to_pattern_list (third_revision, first_sentence_of_third_revision-1)
      print("Sentences to add for third revision: ", third_revision)
      for number in range(len(third_revision)):
         name_of_audiofile_to_add = "pair-"+str(third_revision[number])+".mp3"
         print("Adding:", name_of_audiofile_to_add)
         audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)
         audiofile_third_revision = audiofile_third_revision + audiofile_to_add
      reps = reps + (len(third_revision))
   
   first_sentence_of_fourth_revision = sessionnumber*10-49
   if first_sentence_of_fourth_revision > 0 and first_sentence_of_fourth_revision < number_of_sentences:
      fourth_revision = create_pattern_list(pattern=[1, 4])
      add_first_sentence_to_pattern_list (fourth_revision, first_sentence_of_fourth_revision-1)
      print("Sentences to add for fourth revision: ", fourth_revision)
      for number in range(len(fourth_revision)):
         name_of_audiofile_to_add = "pair-"+str(fourth_revision[number])+".mp3"
         print("Adding:", name_of_audiofile_to_add)
         audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)
         audiofile_fourth_revision = audiofile_fourth_revision + audiofile_to_add
      reps = reps + (len(fourth_revision))
   
   print("Reps: ", reps)
   print("Writing session file. Please wait.\n")
   session_file = audiofile_new_sentences + audiofile_first_revision + audiofile_second_revision + audiofile_third_revision + audiofile_fourth_revision
   session_file.export(session_filename, format="mp3")   
   
#Welcoming the user.
import os
os.system('cls' if os.name == 'nt' else 'clear')
print("Welcome to the Overlearning File Creator!")

#Determining how many sentencepairs there are to be processed, only multiples of 10 will do.
number_of_sentences = int(input("\nHow many sentencepairs are there? "))
if number_of_sentences%10 >0:
   number_of_sentences = (number_of_sentences//10)*10
   if number_of_sentences == 0:
      print("At least 10 sentencepairs are necessary.")
   else:
      print("Only ", number_of_sentences, "sentencepairs can be processed.")

#Processing requires at least 10 sentencepairs.
if number_of_sentences >= 10:

   #Fixed or flexible silence length?
   flexible_length = input("Shall the silence length between sentences correspond to their length (y/n)?")
   if flexible_length == ("y" or "Y"):   
      silence_length = 0   
      added_silence = float(input("Added silence (in seconds)?"))/1000      
   else:
      silence_length = float(input("Desired silence length between sentences (in seconds)? "))*1000
      added_silence = float(0)
      
   #Step 1: Creation of sentencepairs.
   print("\nStep 1: ", number_of_sentences, " sentencepairs will be created.")
   create_all_sentencpairs(number_of_sentences, silence_length)
   print("Completed.")
   
   #Step 2: Creation of mass sentence file for interpretation training (L1, L2)
   print("\nStep 2: Mass sentence file for interpretation training (L1, L2) will be created.")
   interpretation_file = AudioSegment.empty()
   for number in range(1, number_of_sentences+1):
      name_of_audiofile_to_add = "pair-"+str(number)+".mp3"
      print("Adding:", name_of_audiofile_to_add)
      audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)
      interpretation_file = interpretation_file + audiofile_to_add
   print("Writing mass sentence file. Please wait.")
   interpretation_file.export("mass sentences for interpretation training (L1, L2).mp3", format="mp3")
   print("Completed.")
   
   #Step 3: Creation of mass sentence file for sentence training (L1, L2, L2)
   print("\nStep 3: Mass sentence file for sentence training (L1, L2, L2) will be created.")
   training_file = AudioSegment.empty()
   for number in range(1, number_of_sentences+1):
      name_of_source_audiofile_to_add = "source-"+str(number)+".mp3"
      name_of_target_audiofile_to_add = "target-"+str(number)+".mp3"
      print("Adding:", name_of_source_audiofile_to_add, ", ", name_of_target_audiofile_to_add)
      source_audiofile_to_add = AudioSegment.from_mp3(name_of_source_audiofile_to_add)
      target_audiofile_to_add = AudioSegment.from_mp3(name_of_target_audiofile_to_add)
      if silence_length == 0:
         target_length = len(target_audiofile_to_add)+added_silence
         silence = AudioSegment.silent(duration=target_length)
      else:
         silence = AudioSegment.silent(duration=silence_length)
      training_file = training_file + source_audiofile_to_add + silence + target_audiofile_to_add + silence + target_audiofile_to_add + silence
   print("Writing mass sentence file. Please wait.")
   training_file.export("mass sentences for sentence training (L1, L2, L2).mp3", format="mp3")
   print("Completed.")
      
   #Step 4: Creation of mass sentence file for sentence repetition/shadowing (L2 only)
   print("\nStep 4: Mass sentence file for sentence repetition/shadowing (L2 only) will be created.")
   repetition_file = AudioSegment.empty()
   for number in range(1, number_of_sentences+1):
      name_of_audiofile_to_add = "target-"+str(number)+".mp3"
      print("Adding:", name_of_audiofile_to_add)
      audiofile_to_add = AudioSegment.from_mp3(name_of_audiofile_to_add)
      if silence_length == 0:
         target_length = len(audiofile_to_add)+added_silence
         silence = AudioSegment.silent(duration=target_length)
      else:
         silence = AudioSegment.silent(duration=silence_length)
      repetition_file = repetition_file + audiofile_to_add + silence
   print("Writing mass sentence file. Please wait.")
   repetition_file.export("mass sentences for sentence repetition (L2 only).mp3", format="mp3")
   print("Completed.")
   
   #Step 5: Creation of sessionfiles
   number_of_sessions = int(number_of_sentences/10)+4   
   print("\nStep 5: ", number_of_sessions, " sessionfiles will be created. ")
   for number in range(1, number_of_sessions+1):      
      create_sessionfile(number)
   print("Completed.")
   

os.system('pause' if os.name == 'nt' else 'read')
8 x

Andy E
Yellow Belt
Posts: 85
Joined: Sun Jul 19, 2015 8:41 am
Languages: *
Language Log: https://forum.language-learners.org/vie ... =15&t=8001
x 141

Re: Solved: How to create your own Glossika-like GSR files?

Postby Andy E » Wed Apr 11, 2018 6:15 am

Wow! Excellent work....

I've just finished Glossika Spanish via Anki - I got fed with the mistakes and I also wanted gender-relevant audio. So, I ended going the Anki route and using Awesome TTS to generate my own audio. I also did the same thing as Jeff and used the "hard" option to get the earlier repeats. Now it's completed, I've suspended all the cards in the deck.

I have plans to revisit my French and German at some point in the future, so this looks like a good alternative.
1 x

Online
User avatar
neumanc
Orange Belt
Posts: 114
Joined: Sat Jul 18, 2015 11:19 am
Location: Düsseldorf (Germany)
Languages: Speaks: German (native), English
Studies: French (advanced), Dutch (advanced), Spanish (false beginner)
Mostly forgotten: Italian, Latin
x 344

Re: Solved: How to create your own Glossika-like GSR files?

Postby neumanc » Wed Apr 11, 2018 10:07 pm

Thanks for the compliments, guys! I hope that the program comes in handy for you. Have you tested it already?
Andy E wrote:I've just finished Glossika Spanish via Anki - I got fed with the mistakes and I also wanted gender-relevant audio. So, I ended going the Anki route and using Awesome TTS to generate my own audio. I also did the same thing as Jeff and used the "hard" option to get the earlier repeats. Now it's completed, I've suspended all the cards in the deck.

I have plans to revisit my French and German at some point in the future, so this looks like a good alternative.

I can very much relate to your disappointment with Glossika. Tonight, Glossika wanted to make me believe that "She hasn't got a key" would be "Ze heeft geen auto" in Dutch. I really wonder how this kind of errors could have been overlooked. And I doubt that I can bear listening to those for long, especially because it's 18 times in 5 days in a row each.

What one could do is correct those mistakes yourself. All you had to do is to load all GMS-B files into Audacity. Then cut off the intro and the outtro. After that, use Sound Finder and save as multiples. All files with uneven numbers would have to be renamed as source files, all files with even numbers as target files. This can be done with free software like "Renamer" in an instant. Then you could re-record the fifty (or hundred?) or so faulty L2 sentences with your own voice or, if you prefer, re-record the L1 translation, respectively. Then you could recompile the whole set of sentences into GSR-like files within say 30 minutes, depending on the speed of your computer's CPU. This way, you could keep most of Glossika's very natural sounding audio. Incidentally, you could adjust the silence length to the duration of the L2 sentences, so that you won't have to use your pause button any more. Wouldn't that be wonderful? I'm really thinking about doing this with my Glossika Dutch course.

By the way, there's more audio that is ready to be compiled in the same manner. Think of Penton Overseas' "Learn in your car" or Language/30 or other phrasebooks, such as the "Rough Guide" phrasebooks, which have free audio. Having these "overlearned" would make for a functioning tourist, I would say.
4 x

User avatar
tommus
Blue Belt
Posts: 633
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2), German (A1), Spanish (A1), Esperanto (A1), Mandarin (beginner)
x 966

Re: Solved: How to create your own Glossika-like GSR files?

Postby tommus » Wed Apr 11, 2018 10:28 pm

neumanc wrote:I can very much relate to your disappointment with Glossika.

I tried Glossika for the first time during the last couple of days, on Dutch. I liked the material and the approach. But I was very surprised by the large number of very obvious mistakes and glitches in the application. I first thought there was something wrong with my browser or my computer. I made quite a few comments to Glossika via their red flag system, but there was no reply. Finally the system hung up so I emailed them. They came back with the suggestion that I try it again in "incognito mode", whatever that is. I thought first it was an application feature but I couldn't find it. Their menu system is very confusing. They didn't answer my second email. So I may follow your plan to develop my own material.
2 x
Dutch
40 Boeken
● 35 Ned. Videos
● 370 Univ-Nederland
: 23 / 40
: 35 / 35
: 160 / 370
● 730 Video Nieuws
● 104 Skype NL Chats
● 730 Tekst Nieuws
: 730 / 730
: 82 / 104
: 730 / 730

Online
User avatar
neumanc
Orange Belt
Posts: 114
Joined: Sat Jul 18, 2015 11:19 am
Location: Düsseldorf (Germany)
Languages: Speaks: German (native), English
Studies: French (advanced), Dutch (advanced), Spanish (false beginner)
Mostly forgotten: Italian, Latin
x 344

Re: Solved: How to create your own Glossika-like GSR files?

Postby neumanc » Wed Apr 11, 2018 11:21 pm

tommus wrote:
neumanc wrote:I can very much relate to your disappointment with Glossika.

I tried Glossika for the first time during the last couple of days, on Dutch. I liked the material and the approach. But I was very surprised by the large number of very obvious mistakes and glitches in the application. I first thought there was something wrong with my browser or my computer. I made quite a few comments to Glossika via their red flag system, but there was no reply. Finally the system hung up so I emailed them. They came back with the suggestion that I try it again in "incognito mode", whatever that is. I thought first it was an application feature but I couldn't find it. Their menu system is very confusing. They didn't answer my second email. So I may follow your plan to develop my own material.

Well, I was speaking about the old format, but apparently they didn't correct the mistakes yet. I don't even understand why they would need a red flag system. An attentive proofreader should be capable to go through all 3,000 sentences within one day. That cannot be such a big problem. Furthermore, the new platform seems to suffer from diverse technical glitches, as you say. I'm not a facebook member, but I read through the comments on Glossika's public facebook page. I had the impression that there were literally hundreds of users complaining about technical problems.

I think nothing beats a quality publisher like Assimil, Linguaphone, Langenscheidt, etc. The problem is that they don't provide a system how to internalize their material effectively. That's where overlearning sentences could come into play. Unfortunately, recording the cues and splitting and matching the audio would be quite time-consuming. But it would reap a great benefit in the end, I think. At least that's what my experience with Glossika is telling me. The idea of Glossika is good, but there is a lack of implementation. It's questionable if you really need sentences designed for pattern practice, as Glossika asserts. Maybe it would be even more beneficial to "overlearn" the very authentic sentences a quality publisher provides. Since quite a long time I am looking for a means to internalize their material more effectively. I have tried various techniques like shadowing, Anki and the Schliemann method, but nothing was really as effective as I had wished. Let's hope that this is it!
5 x

User avatar
sfuqua
Blue Belt
Posts: 998
Joined: Sun Jul 19, 2015 5:05 am
Location: san jose, california
Languages: English(N),
Irish(beginner, studying),
Samoan(FSI 4+, rusty),
Tagalog (use daily),
Spanish (rusting)
French (rusting)
Language Log: https://forum.language-learners.org/vie ... =15&t=9248
x 2394

Re: Solved: How to create your own Glossika-like GSR files?

Postby sfuqua » Thu Apr 12, 2018 3:09 am

Awesome, simply awesome!

I did a very different solution using tts voices last year. You take a file of the sentences you want to learn and use google translate to produce the L1 equivalent. Then, using a text editor and a spreadsheet, you produce a file which includes all of the voice changes needed to change produce this file with a tts reader. Finally, you use the spreadsheet to push the sentences out into increasing intervals.

When you run it through a tts generator with the right voices installed, you get something similar to a GSR file.

I've used subtitle files with this, and they work much the same way glossika does.
If you can stand tts voices, there are many options for using native media in a drill format like glossika.

Here are a couple of lines from one of my files:
{{Pause=1}} <voice required="name = IVONA 2 Amy OEM"> "Do you accompany me to Rome?" {{Pause=1}} <voice required="name = Vocalizer Expressive Aurelie Premium High 22kHz"> -Tu m'accompagnes à Rome ?
{{Pause=1}} <voice required="name = IVONA 2 Amy OEM"> Yes I love you. {{Pause=1}} <voice required="name = Scansoft Sebastien_Full_22kHz"> Oui, je t'aime.

If you leave out the English voices, the file structure can be much simpler.
3 x
Irish
colloquial_irish lessons: 7 / 195

Online
User avatar
neumanc
Orange Belt
Posts: 114
Joined: Sat Jul 18, 2015 11:19 am
Location: Düsseldorf (Germany)
Languages: Speaks: German (native), English
Studies: French (advanced), Dutch (advanced), Spanish (false beginner)
Mostly forgotten: Italian, Latin
x 344

Re: Solved: How to create your own Glossika-like GSR files?

Postby neumanc » Sat Apr 14, 2018 12:45 pm

sfuqua wrote:I did a very different solution using tts voices last year. You take a file of the sentences you want to learn and use google translate to produce the L1 equivalent. Then, using a text editor and a spreadsheet, you produce a file which includes all of the voice changes needed to change produce this file with a tts reader. Finally, you use the spreadsheet to push the sentences out into increasing intervals.

When you run it through a tts generator with the right voices installed, you get something similar to a GSR file.

I've used subtitle files with this, and they work much the same way glossika does.
If you can stand tts voices, there are many options for using native media in a drill format like glossika.

Thank you, sfuqua, for making me aware that also a spreadsheet could be used to produce a gsr-like experience. What a clever idea! This could especially be helpful if one doesn't have native quality audio files but only text. Furthermore, I have the impression that this could also be useful to generate automatically audio files with tts-voices, e.g. the L1-translations needed for use with my little Python script, so that one wouldn't have to record these oneself.

Unfortunately, I have never worked with a spreadsheet (e.g. Microsoft Excel or the corresponding programs in LibreOffice or OpenOffice). I would therefore be very grateful if you could elaborate a little bit on how exactly to use a spreadsheet for this. Or maybe you could make available a link to a good description on the Internet? That would be wonderful. I appreciate any help you can provide.
0 x

crush
Green Belt
Posts: 311
Joined: Mon Nov 30, 2015 3:35 pm
Languages: :
Speak:
--English, Spanish, Mandarin
Study:
--Basque, Cantonese
x 431

Re: Solved: How to create your own Glossika-like GSR files?

Postby crush » Sun Apr 29, 2018 4:29 am

I wish i'd have seen this earlier. I've just spent the last month or so working on precisely this same thing:
https://forum.language-learners.org/vie ... =19&t=7989

I came up with a set of scripts to split the audio + pdfs into individual sentences and load them onto my Android phone. In the app i can create my own schedules similar to the GSR files except with a bit more control, e.g. i can pick how many days i want to review each sentence and how many times each day to review them, as well as how many sentences/day to learn.

I also used pydub, but i used Audacity to create timings for each sentence. I set up a keyboard shortcut for the Sound Finder, set the silence level to 60, minimum duration to 1.5 seconds (they seem to put 2 seconds in between, but i did 1.5 just to play it safe), and then add .05 seconds before/after. I made another keyboard shortcut for the export tags option. So to do one file, i silence the intro/final Glossika announcements, run the sound finder with my shortcut, make sure there are 100 sentences (50 base, 50 target), export tags with my shortcut, and do the next one. It takes about 5 minutes or so to get through a full book of 1,000 sentences.
3 x


Return to “Language Programs and Resources”

Who is online

Users browsing this forum: No registered users and 1 guest