Solved: How to create your own Glossika-like GSR files?

All about language programs, courses, websites and other learning resources
User avatar
neumanc
Orange Belt
Posts: 134
Joined: Sat Jul 18, 2015 11:19 am
Location: Düsseldorf (Germany)
Languages: Speaks: German (native), English, Dutch
Studies: French (advanced), Spanish (false beginner)
Mostly forgotten: Italian, Latin
x 441

Re: Solved: How to create your own Glossika-like GSR files?

Postby neumanc » Fri Jul 06, 2018 12:04 pm

Meanwhile I had some time again to do some programming and implemented everything I originally intended:
Here's the list of possible improvements I want to implement:
1. Letting the user type in which languages are learned, so that the files will indicate the L1-L2-combination. DONE
2. Splitting the GMS-like files after a certain number of sentence pairs. Letting the user decide how many sentences the GMS-like files should contain. DONE
3. Simultaneous creation of GMS-like files. DONE
4. Letting the user decide, which kind of files should be created (GMS-A, GMS-B, GMS-C, and/or GSR-like files). Not everyone needs every kind of file. DONE
5. Make the program work with uneven numbers of sentences. DONE
6. Adding an "outtro" (a short tone for example) to the GMS- and GSR-like files, so that the user instantly knows when he is done with the file. This is useful if the files are used on a smartphone with an app like "Smart AudioBook Player" which plays the sound files one after the other without any break. DONE
7. Letting the user decide how many new sentences shall be introduced each day. DONE
8. Letting the user decide for how many days the sentences shall be revised. DONE
9. Letting the user decide how often the sentences will be repeated each day (five-, four-, three-, or twofold, or just once). DONE
10. Letting the user decide if he first wants to learn the new sentences and then revise older sentences or the other way around. Letting the user decide if he wants to revise the oldest or the newest sentences first. DONE
The code of version 3 can be downloaded here:

Code: Select all

# Overlearning File Creator V3.0

import pydub
from pydub import AudioSegment

def create_sentence_pair(filenumber, silence_length):
# Creates one sentence pair from source and target file
       
   source_filename = L1+"-"+filenumber+".mp3"
   target_filename = L2+"-"+filenumber+".mp3"
               
   source = AudioSegment.from_file(source_filename)
   target = AudioSegment.from_file(target_filename)
               
   if not fixed_silence_length:   
      target_length = len(target)+added_silence
      if target_length > maximum_silence_length: target_length = maximum_silence_length               
      elif target_length < minimum_silence_length: target_length = minimum_silence_length
      silence = AudioSegment.silent(duration=target_length)
   else: silence = AudioSegment.silent(duration=silence_length)

   pair = source + silence + target + silence             
   pair_filename = title+"-"+L1+"-"+L2+"-"+filenumber+".mp3"       
   print(pair_filename)
   if embedded_cover:
      pair.export("GSP/"+pair_filename, format="mp3", bitrate=bit_rate, tags={"album": title, "title": pair_filename, "track": filenumber}, cover="cover.png")
   else:
      pair.export("GSP/"+pair_filename, format="mp3", bitrate=bit_rate, tags={"album": title, "title": pair_filename, "track": filenumber})

def create_pattern_list (pattern):
# Applies the relevant pattern.

   list = []       
   for number in range(new_senteces_per_day):       
      list.extend(pattern)
      for counter in range(len(pattern)):
         pattern[counter] = (pattern[counter]+1)%new_senteces_per_day     
         if pattern[counter] == 0:
            pattern[counter] = new_senteces_per_day           
   return list     
       
def add_number_of_first_sentence_to_pattern_list (list, number_of_first_sentence):
# Determines the precise numbers of the sentence pairs to be added.

   for number in range(len(list)):
      list[number] = list[number]+number_of_first_sentence
       
def create_list_of_sentences_to_be_added_to_sessionfile (first_sentence, pattern_number):
# Creates a list of sentences to be added to sessionfile
       
   if pattern_number > 5: pattern_number = 5
   pattern = patterns[pattern_number]         
   list_of_sentences = create_pattern_list(pattern)
   add_number_of_first_sentence_to_pattern_list (list_of_sentences, first_sentence)   
   return list_of_sentences
       
def create_sessionfile(sessionnumber, days_of_revision, first, last, step):
# Creates a sessionfile, which consists of new sentences, if any, and sentences to be revised, if any (or vice versa).

   session_filename = (title+"-"+L1+"-"+L2+"-GSR-DAY")+str(sessionnumber).zfill(3)+".mp3"
   print(session_filename)
   
   if intro:
      session_file = AudioSegment.from_file("intro.mp3")
   else:   
      session_file =  AudioSegment.empty()   

   reps = 0
               
   for number_of_revision in range(first, last, step):

      first_sentence = sessionnumber*new_senteces_per_day-(number_of_revision*new_senteces_per_day+(new_senteces_per_day-1))
      if first_sentence > 0 and first_sentence < number_of_necessary_sentences:
            
         sentences = create_list_of_sentences_to_be_added_to_sessionfile (first_sentence-1, number_of_revision)
         if number_of_revision == 0: print ("New sentences to add: ", sentences)
         else: print ("Sentences to add for revision #", number_of_revision, ": ", sentences, sep="")
         
         for counter in range(len(sentences)):
               
            if not sentences[counter] > number_of_sentences:               
               
               source_filename = L1+"-"+str(sentences[counter])+".mp3"
               target_filename = L2+"-"+str(sentences[counter])+".mp3"
               source = AudioSegment.from_file(source_filename)
               target = AudioSegment.from_file(target_filename)
         
               if not fixed_silence_length:
                  target_length = len(target)+added_silence
                  if target_length > maximum_silence_length: target_length = maximum_silence_length
                  elif target_length < minimum_silence_length: target_length = minimum_silence_length
                  silence = AudioSegment.silent(duration=target_length)                   
               else:
                  silence = AudioSegment.silent(duration=silence_length)
         
               if number_of_revision == 0 and not new_sentences_once:
                  print("Adding:", source_filename, target_filename, target_filename)
                  session_file = session_file + source + silence + target + silence + target + silence
                  reps += 2
               else:                                                                   
                  print("Adding:", source_filename, target_filename)                                                             
                  session_file = session_file + source + silence + target + silence
                  reps += 1                       

   if outtro:
      session_file = session_file + AudioSegment.from_file("outtro.mp3")                           
   
   print("Reps: ", reps)
   print("Writing session file. Please wait.\n")
   if embedded_cover:
      session_file.export("GSR/"+session_filename, format="mp3", bitrate=bit_rate, tags={"album": title, "title": session_filename, "track": sessionnumber}, cover="cover.png")
   else:
      session_file.export("GSR/"+session_filename, format="mp3", bitrate=bit_rate, tags={"album": title, "title": session_filename, "track": sessionnumber})
       
# Start

# Standard preferences
sentence_pair_files = False                # Sentence pair files will not be created
a_files = True                             # GMS-A-files will be created
b_files = True                             # GMS-B-files will be created
c_files = True                             # GMS-C-files will be created
spaced_repetition_files = True             # GSR-files will be created

sentences_per_mass_sentences_file = 50     # GMS-files will have 50 sentences
first_mass_sentences_file = 1            # Creation of mass sentences files will begin mit the first file

new_senteces_per_day = 10               # 10 new sentences per day
new_sentences_once = True                   # New sentences will only presented once
days_of_revision = 4                       # GSR-files will have 4 x 10 old sentences
revision_first = True                      # GSR-files will first present the oldest, then the newest sentences
patterns = [[1, 2, 3, 4, 5], [1, 2, 4, 7], [1, 2, 4, 7], [1, 3, 6], [1, 4], [1]] # Standard patterns
number_of_different_revision_patterns = days_of_revision   # 4 different patterns for revision
first_session_file = 1                     # Creation of GSR-files will begin mit the file for day one

bit_rate = "64k"                           # Bit rate of output soundfiles
embedded_cover = False                  # Soundfiles will have no cover
intro = False                        # No intro
outtro = False                        # No outtro

fixed_silence_length = True                # Silence between sentences has fixed length   
silence_length = 2000                        # Fixed silence length is 2 seconds
maximum_silence_length = 10000             # Maximum silence length of 10 seconds
minimum_silence_length =  1000             # Minimum silence length of 1 second
added_silence = 0                          # Added silence in case of flexible silence length

# Welcoming the user.
import os
os.system('cls' if os.name == 'nt' else 'clear')
print("Welcome to the Overlearning File Creator!")

# Gathering basic information about project
print("\nGathering basic information about project:")
title = input("\n1. What is your project title? ")
L1 = input("2. What is your source language? ")
L2 = input("3. What is your target language? ")
number_of_sentences = int(input("4. How many sentence pairs are there? "))
standard_preferences = bool(input("5. Do you want all standard preferences (y/n)? ") =="y")

# Gathering information about user defined preferences
if not standard_preferences:   
   print("\nGathering information about user defined preferences:")

   # Which kind of files shall be created?
   print("\n1. Types of files to be created")
   sentence_pair_files = bool(input("1.1 Do you want sentence pair files for shuffeling (y/n)? ")=="y")       
   a_files = bool(input("1.2 Do you want sentence training mass sentences files (L1, L2, L2) (y/n)? ")=="y")
   b_files = bool(input("1.3 Do you want interpretation training mass sentences files (L1, L2) (y/n)? ")=="y")
   c_files = bool(input("1.4 Do you want repetition/shadowing mass sentences files (L2 only) (y/n)? ")=="y")
   spaced_repetition_files = bool(input("1.5 Do you want overlearning/spaced repetition files (y/n)? ")=="y")
   
   # Specifications of mass sentences files
   print("\n2. Specifications of mass sentences files")
   if a_files or b_files or c_files:
      standard_mass_sentences = bool(input("2.1 Do you want 50 sentences per file (y/n)? ")=="y")
      if not standard_mass_sentences: sentences_per_mass_sentences_file = int(input("    How many sentences per file do you want? "))     
      start_with_first_mass_sentences_file = bool(input("2.2 Do you want to begin with the first (set of) mass sentences file(s) (y/n)? ")=="y")
      if not start_with_first_mass_sentences_file: first_mass_sentences_file = int(input("    With which (set of) mass sentences file(s) do you want to begin? "))
   else:
      print("N/A")
   
   # Specifications of overlearning/spaced repetition files
   print("\n3. Specifications of overlearning/spaced repetition files")
   if spaced_repetition_files:             
            
      ten_new_sentences_per_day = bool(input("3.1 Do you want 10 new sentences per day (y/n)? ")=="y")
      if not ten_new_sentences_per_day:
         new_senteces_per_day = int(input("    How many new sentences per day do you want? "))

      new_sentences_once = bool(input("3.2 Do you want new sentences to be presented once (otherwise twice) (y/n)? ")=="y")
      if not new_sentences_once:          
         print("    New sentences shall be presented twice.")

      standard_days_of_revision = bool(input("3.3 Do you want 4 days of revision (y/n)? ")=="y")
      if not standard_days_of_revision:
         days_of_revision = int(input("    How many days do you want to revise the sentences? "))

      revision_first = bool(input("3.4 Do you want revisions first (otherwise new sentences first) (y/n)? ")=="y")
      if not revision_first: print("    New sentences shall be presented first, then the revisions.")

      standard_patterns = bool(input("3.5 Do you want to use the standard patterns (y/n)? ")=="y")
      if not standard_patterns:
         patterns_according_to_number_of_presentations = [[1], [1, 4], [1, 3, 6], [1, 2, 4, 7], [1, 2, 3, 4, 5]]         
         if days_of_revision > 5: number_of_different_revision_patterns = 5
         else: number_of_different_revision_patterns = days_of_revision
         for counter in range(number_of_different_revision_patterns+1):
            if counter == 0:
               number_of_presentations = int(input("    How often shall new sentences be presented (1-5)? "))-1
            elif counter == 5 and days_of_revision > number_of_different_revision_patterns:
               number_of_presentations = int(input("    How often shall sentences of revision #5 and up be presented (1-5)? "))-1
            else:
               print("    How often shall sentences of revision #", counter, " be presented (1-5)? ", end='', sep='')
               number_of_presentations = int(input())-1
            patterns[counter] = patterns_according_to_number_of_presentations[number_of_presentations]
            if counter == 0:
               print("    Pattern for new sentences:", patterns[counter])
            elif counter == 5 and days_of_revision > number_of_different_revision_patterns:
               print("    Pattern for revision # 5 and up:", patterns[counter])
            else:
               print("    Pattern for revision #", counter, ":", patterns[counter])
                  
      start_with_first_day = bool(input("3.6 Do you want to begin with the first session file (y/n)? ")=="y")
      if not start_with_first_day: first_session_file = int(input("    With which session file do you want to begin? "))

   else:
      print("N/A")   
   
   # General Specifications
   print("\n4. General specifications")
   
   if sentence_pair_files or a_files or b_files or c_files or spaced_repetition_files:

      standard_bit_rate = bool(input("4.1 Do you want the standard bitrate of 64k (y/n)? ")=="y")
      if not standard_bit_rate:
         bit_rate = input("    Which bitrate do you prefer (at least 32k)? ")
         if bit_rate[-1] != "k": bit_rate = bit_rate + "k"

      embedded_cover = bool(input("4.2 Do want want an embedded cover (y/n)? ")=="y")
      
      if a_files or b_files or c_files or spaced_repetition_files:
         intro = bool(input("4.3 Do you want to include an intro? ")=="y")
         outtro = bool(input("4.4 Do you want to include an outtro? ")=="y")
         
         fixed_silence_length = bool(input("4.5 Do you want fixed silence length between sentences (y/n)? ")=="y")
         if fixed_silence_length:       
            two_seconds_of_silence = bool(input("    Do you want 2 seconds of silence between sentences (y/n)? ")=="y")
            if not two_seconds_of_silence:
               silence_length = float(input("    Desired silence length between sentences (in seconds)? "))*1000   
               if silence_length > maximum_silence_length: silence_length = maximum_silence_length
               elif silence_length < minimum_silence_length: silence_length = minimum_silence_length
               added_silence = 0
         else:
            print("    Silence length shall correspond to the length of the respective sentence.")
            added_silence = float(input("    Added silence (in seconds)? "))*1000
   else:
      print("N/A")      
            
# Informing user about set specifications of project
print("\nSet specifications of your project:")

print("\n1. Types of files to be created")
print("1.1 Sentence pair files:", sentence_pair_files)
print("1.2 Sentence training mass sentences files:", a_files)
print("1.3 Interpretation training mass sentences files:", b_files)
print("1.4 Repetition/shadowing mass sentences files:", c_files)
print("1.5 Overlearning/spaced repetition files:", spaced_repetition_files)

print("\n2. Specifications of mass sentences files")
if a_files or b_files or c_files:    
   print("2.1 Sentences per mass sentences file:", sentences_per_mass_sentences_file)
   print("2.2 First mass sentences file:", first_mass_sentences_file)
else:
   print("N/A")

print("\n3. Specifications of overlearning/spaced repetition files")
if spaced_repetition_files:
   print("3.1 New sentences per day:", new_senteces_per_day)
   print("3.2 New sentences presented once:", new_sentences_once)
   print("3.3 Days of revision:", days_of_revision)
   print("3.4 Revision first:", revision_first)       
   for counter in range(number_of_different_revision_patterns+1):
      if counter == 0:
         print("3.5 Pattern for new sentences:", patterns[counter])
      elif counter == 5 and days_of_revision > number_of_different_revision_patterns:
         print("    Pattern for revision # 5 and up:", patterns[counter])
      else:
         print("    Pattern for revision #", counter, ":", patterns[counter])
   print("3.6 First session file:", first_session_file)
else:
   print("N/A")

print("\n4. General Specifications")
if sentence_pair_files or a_files or b_files or c_files or spaced_repetition_files:
   print("4.1 Bitrate:", bit_rate)   
   print("4.2 Embedded cover:", embedded_cover)
   
   if a_files or b_files or c_files or spaced_repetition_files:
      print("4.3 Intro:", intro)
      print("4.4 Outtro:", outtro)
      print("4.5 Fixed length:", fixed_silence_length)
      if fixed_silence_length:
         print("    Silence length:", silence_length/1000, "seconds")
      else:
         print("    Flexible silence length:", True)
         print("    Added silence:", added_silence/1000, "seconds")
else:
   print("N/A")
         
# Creation of sound files
print("\nSoundfiles will now be created.")

import time
import math

# Step 1: Creation of sentence pairs.
if sentence_pair_files:

   start_time = time.time()
      
   if not os.path.exists("GSP"):
      os.makedirs("GSP")
         
   print("\nStep 1: ", number_of_sentences, " sentence pair files (e.g. for shuffle play) will be created.")
   for number in range(1, number_of_sentences+1):
      filenumber=str(number)         
      create_sentence_pair(filenumber, silence_length)

   end_time = time.time()
   elapsed_time = math.ceil((end_time-start_time)/60)
   print("Completed in ", str(elapsed_time//60).zfill(2), ":", str(elapsed_time%60).zfill(2), " hours.", sep='')
   
else:
   print("\nStep 1: Creation of sentence pair files: N/A")

# Step 2: Creation of mass sentences files
if a_files or b_files or c_files:               

   start_time = time.time()
   if a_files:
      if not os.path.exists("GMS/GMS-A"):
         os.makedirs("GMS/GMS-A")
   if b_files:
      if not os.path.exists("GMS/GMS-B"):
         os.makedirs("GMS/GMS-B")
   if c_files:
      if not os.path.exists("GMS/GMS-C"):
         os.makedirs("GMS/GMS-C")
   
   if number_of_sentences/sentences_per_mass_sentences_file != number_of_sentences//sentences_per_mass_sentences_file:
      number_of_mass_sentences_files = round(number_of_sentences//sentences_per_mass_sentences_file)+1
   else:
      number_of_mass_sentences_files = int(number_of_sentences/sentences_per_mass_sentences_file)     
   print("\nStep 2: ", number_of_mass_sentences_files-first_mass_sentences_file+1, " (sets of) mass sentences file(s) will be created.")       
         
   for number in range(first_mass_sentences_file, number_of_mass_sentences_files+1):       
   
      number_of_first_audiofile = (number-1)*sentences_per_mass_sentences_file+1
      if a_files:
         a_file_name = (title+"-"+L1+"-"+L2+"-GMS-A-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
         print(a_file_name)
      if b_files:
         b_file_name = (title+"-"+L1+"-"+L2+"-GMS-B-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
         print(b_file_name)
      if c_files:
         c_file_name = (title+"-"+L1+"-"+L2+"-GMS-C-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
         print(c_file_name)
            
      if intro:
         a_file = AudioSegment.from_file("intro.mp3")         
         b_file = AudioSegment.from_file("intro.mp3")
         c_file = AudioSegment.from_file("intro.mp3")
      else:
         a_file = AudioSegment.empty()
         b_file = AudioSegment.empty()
         c_file = AudioSegment.empty()
      
      if number*sentences_per_mass_sentences_file > number_of_sentences:
         maximum_files_to_add = number_of_sentences - (number-1)*sentences_per_mass_sentences_file
      else:
         maximum_files_to_add = sentences_per_mass_sentences_file       
      
      for counter in range(maximum_files_to_add):

         filenumber = number_of_first_audiofile + counter
         source_filename = L1+"-"+str(filenumber)+".mp3"
         target_filename = L2+"-"+str(filenumber)+".mp3"
         source = AudioSegment.from_file(source_filename)
         target = AudioSegment.from_file(target_filename)
   
         if not fixed_silence_length:
            target_length = len(target)+added_silence
            if target_length > maximum_silence_length: target_length = maximum_silence_length
            elif target_length < minimum_silence_length: target_length = minimum_silence_length
            silence = AudioSegment.silent(duration=target_length)                   
         else:
            silence = AudioSegment.silent(duration=silence_length)

         if a_files:
            a_file = a_file + source + silence + target + silence + target + silence
            print("Adding to A-file:", source_filename, target_filename, target_filename)                                   
         if b_files:
            b_file = b_file + source + silence + target + silence
            print("Adding to B-file:", source_filename, target_filename)                                   
         if c_files:
            c_file = c_file + target + silence
            print("Adding to C-file:", source_filename)

      if outtro:
         a_file = a_file + AudioSegment.from_file("outtro.mp3")         
         b_file = b_file + AudioSegment.from_file("outtro.mp3")
         c_file = c_file + AudioSegment.from_file("outtro.mp3")
         
      print("Writing mass sentences file(s). Please wait.\n")
      if a_files:       
         if embedded_cover:
            a_file.export("GMS/GMS-A/"+a_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": a_file_name, "track": number}, cover="cover.png")
         else:
            a_file.export("GMS/GMS-A/"+a_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": a_file_name, "track": number})
      if b_files:
         if embedded_cover:
            b_file.export("GMS/GMS-B/"+b_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": b_file_name, "track": number}, cover="cover.png")
         else:         
            b_file.export("GMS/GMS-B/"+b_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": b_file_name, "track": number})
      if c_files:
         if embedded_cover:
            c_file.export("GMS/GMS-C/"+c_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": c_file_name, "track": number}, cover="cover.png")
         else:
            c_file.export("GMS/GMS-C/"+c_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": c_file_name, "track": number})

   end_time = time.time()
   elapsed_time = math.ceil((end_time-start_time)/60)
   print("Completed in ", str(elapsed_time//60).zfill(2), ":", str(elapsed_time%60).zfill(2), " hours.", sep='')
               
else:
   print("\nStep 2: Creation of mass sentences files: N/A")               

# Step 3: Creation of sessionfiles
if spaced_repetition_files:

   start_time = time.time()

   if not os.path.exists("GSR"):
      os.makedirs("GSR")
         
   number_of_sessions = math.ceil(number_of_sentences/new_senteces_per_day) + days_of_revision - first_session_file + 1            
   print("\nStep 3: ", number_of_sessions, "session files will be created. ")

   number_of_necessary_sentences = math.ceil(number_of_sentences/new_senteces_per_day)*new_senteces_per_day

   if revision_first:
      first = days_of_revision
      last = -1
      step = -1
   else:
      first = 0
      last = days_of_revision+1
      step = +1
   
   for number in range(first_session_file, first_session_file + number_of_sessions):               
      create_sessionfile(number, days_of_revision, first, last, step)
   
   end_time = time.time()
   elapsed_time = math.ceil((end_time-start_time)/60)
   print("Completed in ", str(elapsed_time//60).zfill(2), ":", str(elapsed_time%60).zfill(2), " hours.", sep='')
      
else:
   print("\nStep 3: Creation of spaced repetition files: N/A\n")

os.system('pause' if os.name == 'nt' else 'read')

Here you can get an impression of the settings the program now offers:
Screen Capture_Overlearning Files Creator.png
As you can see, I used the program with 1854 sentences from an Assimil course, and it worked perfectly circumventing the notorious memory error of Pydub. The mp3 files produced are more than 2.4 gigabytes. The whole process took about 10 hours. The log file can be viewed here: https://tinyurl.com/ya594o3s.

I already have ideas about future improvements, which I might implement if I have time to do so:
1. The user shall have the possibility to choose between Overlearning Files and Spaced Repetition Files. Depending on this, he shall have the possibility to determine the number and the timing of the repetitions. This should be quite easy to implement. In this way, this tool also becomes an alternative to Gradint.
2. Possibility to learn not the same number of new sentences every day, but depending on the number of sentences in each lesson of a course. This would be particularly useful, for example, to learn all sentences from an Assimil lesson every day and at the same time to repeat all sentences from certain earlier Assimil lessons. However, this will have a major impact on the algorithm, which is why it will be very difficult to implement.
3. Integration of the Cue Creator into the Overlearning File Creator, so that the user has the possibility to choose if he wants to use already existing cue sound files or have them created.
4. Possibility to use also a comma seperated value file (CSV-file) containing all the cues in written form and have them converted to sound files with integrated text to speech software.
5. Converting the Python script into a Windows executable file to make the program accessible to the user who does not want or is not able to install Python and Libav first. In this context, the following changes must also be implemented:
- Check if Libav is installed. If not, Libav should be installed by the program without any further action on the part of the user.
- Query the folder containing the sound files. Automatic detection of the number of sound files. Check if all relevant files are present or skip non-existent files to prevent the program from crashing.
- Insert a subroutine that checks whether the user has given the correct input to prevent the program from crashing.
- Insert a help file that explains all settings and possible uses to the user.

Did I say I enjoy programming? It's almost as enjoyable as learning languages ...
You do not have the required permissions to view the files attached to this post.
4 x

User avatar
Axon
Blue Belt
Posts: 775
Joined: Thu Jun 16, 2016 12:29 am
Location: California
Languages: Native English, in order of comfort: Mandarin, German, Indonesian,
Spanish, French, Russian,
Cantonese, Vietnamese, Polish.
Language Log: viewtopic.php?f=15&t=5086
x 3288

Re: Solved: How to create your own Glossika-like GSR files?

Postby Axon » Sat Jul 07, 2018 4:07 am

This is great! I have it working with the dialogue files from my favorite video game.

I had a strange bug where it said there was a syntax error on line 77 right at the last =. But when I used Anaconda to launch it, there was no problem.

The settings are wonderful and exhaustive, though I've made the sentence pair file creation a default. Can't wait until there's a nice pretty GUI.

I believe pydub also supports speeding up mp3 files. It'd be cool to crank up the speed for your native language to make the files go by faster.

I'm really, really impressed with what you've done. Thank you for this!
1 x

User avatar
neumanc
Orange Belt
Posts: 134
Joined: Sat Jul 18, 2015 11:19 am
Location: Düsseldorf (Germany)
Languages: Speaks: German (native), English, Dutch
Studies: French (advanced), Spanish (false beginner)
Mostly forgotten: Italian, Latin
x 441

Re: Solved: How to create your own Glossika-like GSR files?

Postby neumanc » Sat Jul 07, 2018 11:57 pm

Axon wrote:This is great! I have it working with the dialogue files from my favorite video game.

I had a strange bug where it said there was a syntax error on line 77 right at the last =. But when I used Anaconda to launch it, there was no problem.

The settings are wonderful and exhaustive, though I've made the sentence pair file creation a default. Can't wait until there's a nice pretty GUI.

I believe pydub also supports speeding up mp3 files. It'd be cool to crank up the speed for your native language to make the files go by faster.

I'm really, really impressed with what you've done. Thank you for this!
Hello Axon, thank you very much! I am glad that the script is useful for you. This would not have been possible without your input.

I cannot see any reason why you got an error message as you first tried the script. I tested it intensively with WinPython without any error message. As I see it, Line 77 is perfectly fine. But as you know, computer programs are always a delicate matter. Some mistake will always be found under certain circumstances, I'm afraid.

I will look into the speeding up of the files and also into implementing a GUI. But I am still very new to Python, I just got my first instruction handbook for it. Programming a GUI would really be a very big challenge for me. But I can see that this would make the program much more user-friendly, so that more language learning enthusiasts could benefit from it. I will do my best.

The next improvement step, however, will be the implementation of a true spaced repetition algorithm. This is also very important to me personally. I would like the user to be able to practice the sentences, for example, five times on the first day, four times on the following day, four times again on the fourth day, three times on the seventh day, twice on the 14th day, once on the 30th day and finally once again on the 90th day. With ten new sentences per day, the GSR-like files would be only slightly longer than before, 200 "reps" per day instead of 180. However, the learning effect should be even greater, because it would be a combination of overlearning and spaced repetition, so that there would be no need to revise the sentences with the GMS-like files at the right time in order not to forget them. I already have kind of an idea how something like this could be implemented into the script. I hope it doesn't take all Sunday ...
3 x

User avatar
neumanc
Orange Belt
Posts: 134
Joined: Sat Jul 18, 2015 11:19 am
Location: Düsseldorf (Germany)
Languages: Speaks: German (native), English, Dutch
Studies: French (advanced), Spanish (false beginner)
Mostly forgotten: Italian, Latin
x 441

Re: Solved: How to create your own Glossika-like GSR files?

Postby neumanc » Sun Jul 08, 2018 8:36 pm

I already have ideas about future improvements, which I might implement if I have time to do so:
1. The user shall have the possibility to choose between Overlearning Files and Spaced Repetition Files. Depending on this, he shall have the possibility to determine the number and the timing of the repetitions. This should be quite easy to implement. In this way, this tool also becomes an alternative to Gradint. DONE
2. Possibility to learn not the same number of new sentences every day, but depending on the number of sentences in each lesson of a course. This would be particularly useful, for example, to learn all sentences from an Assimil lesson every day and at the same time to repeat all sentences from certain earlier Assimil lessons. However, this will have a major impact on the algorithm, which is why it will be very difficult to implement.
3. Integration of the Cue Creator into the Overlearning File Creator, so that the user has the possibility to choose if he wants to use already existing cue sound files or have them created. DONE
4. Possibility to use also a comma seperated value file (CSV-file) containing all the cues in written form and have them converted to sound files with integrated text to speech software.
5. Converting the Python script into a Windows executable file to make the program accessible to the user who does not want or is not able to install Python and Libav first. In this context, the following changes must also be implemented:
- Check if Libav is installed. If not, Libav should be installed by the program without any further action on the part of the user.
- Query the folder containing the sound files. Automatic detection of the number of sound files. Check if all relevant files are present or skip non-existent files to prevent the program from crashing. DONE
- Insert a subroutine that checks whether the user has given the correct input to prevent the program from crashing.
- Insert a help file that explains all settings and possible uses to the user.
Well, it took all Sunday, as expected. The most important point for me is now implemented, namely the possibility to create true Spaced Repetition Files instead of Overlearning Files. That means that this script is now also an alternative to Gradint. If you choose "standard spaced repetition files", you will (over-)learn the sentences 5 times on the 1st day, 4 times on the 2nd day, 4 times on the 4th day, three times on the 7th day, twice on the 14th day, and once on the 30th, 50th and 90th day, respectively.

In addition, it is possible to freely select the days on which you want to learn/revise the sentences and also the number of repetitions. Apart from that I have made some other improvements, including Axon's proposal to give the user the possibility to speed up the sound files. This could be useful, for example, if you want to use this script with Assimil's split mp3 files, which are often too slow even if you are not very advanced in your target language.

These were the simplest changes. All the others that I have set out to do will require much deeper work on the algorithm, so it may take some time before I can present a new version (if ever).

Here you can get an impression of the settings the program now offers:
Screenshot_1.png
Screenshot_2.png
The code of the current version 3.2 can be downloaded here:

Code: Select all

# Overlearning File Creator V3.2

import pydub
from pydub import AudioSegment

def create_sentence_pair(filenumber, silence_length):
# Creates one sentence pair from source and target file
       
   source_filename = L1+"-"+filenumber+".mp3"
   target_filename = L2+"-"+filenumber+".mp3"
               
   source = AudioSegment.from_file(source_filename)   
   target = AudioSegment.from_file(target_filename)
   if speed_up:
      if speed_source >= 1.1: source = source.speedup(speed_source)
      if speed_target >= 1.1: target = target.speedup(speed_target)
               
   if not fixed_silence_length:   
      target_length = len(target)+added_silence
      if target_length > maximum_silence_length: target_length = maximum_silence_length               
      elif target_length < minimum_silence_length: target_length = minimum_silence_length
      silence = AudioSegment.silent(duration=target_length)
   else: silence = AudioSegment.silent(duration=silence_length)

   pair = source + silence + target + silence             
   pair_filename = title+"-"+L1+"-"+L2+"-"+filenumber+".mp3"       
   print(pair_filename)
   if embedded_cover:
      pair.export("GSP/"+pair_filename, format="mp3", bitrate=bit_rate, tags={"album": title, "title": pair_filename, "track": filenumber}, cover=cover_file)
   else:
      pair.export("GSP/"+pair_filename, format="mp3", bitrate=bit_rate, tags={"album": title, "title": pair_filename, "track": filenumber})

def create_pattern_list (pattern):
# Applies the relevant pattern.

   list = []       
   for number in range(new_senteces_per_day):       
      list.extend(pattern)
      for counter in range(len(pattern)):
         pattern[counter] = (pattern[counter]+1)%new_senteces_per_day     
         if pattern[counter] == 0:
            pattern[counter] = new_senteces_per_day           
   return list     
       
def add_number_of_first_sentence_to_pattern_list (list, number_of_first_sentence):
# Determines the precise numbers of the sentence pairs to be added.

   for number in range(len(list)):
      list[number] = list[number]+number_of_first_sentence
       
def create_list_of_sentences_to_be_added_to_sessionfile (first_sentence, pattern_number):
# Creates a list of sentences to be added to sessionfile
       
   pattern = patterns[pattern_number]         
   list_of_sentences = create_pattern_list(pattern)
   add_number_of_first_sentence_to_pattern_list (list_of_sentences, first_sentence)   
   return list_of_sentences
       
def create_sessionfile(day, days_of_revision, first, last, step):
# Creates a sessionfile, which consists of new sentences, if any, and sentences to be revised, if any (or vice versa).

   session_filename = (title+"-"+L1+"-"+L2+"-GSR-DAY")+str(day).zfill(3)+".mp3"
   print(session_filename)
   
   if intro:
      session_file = AudioSegment.from_file("intro.mp3")
   else:   
      session_file =  AudioSegment.silent(duration=100)   

   reps = 0
               
   for number_of_revision in range(first, last, step):

      first_sentence = day*new_senteces_per_day-(number_of_revision*new_senteces_per_day+(new_senteces_per_day-1))
      if first_sentence > 0 and first_sentence < number_of_necessary_sentences:
            
         sentences = create_list_of_sentences_to_be_added_to_sessionfile (first_sentence-1, number_of_revision)
         if number_of_revision == 0: print ("New sentences to add: ", sentences)
         else: print ("Sentences to add for revision #", number_of_revision, ": ", sentences, sep="")
         
         for counter in range(len(sentences)):
               
            if not sentences[counter] > number_of_sentences:               
               
               source_filename = L1+"-"+str(sentences[counter])+".mp3"
               target_filename = L2+"-"+str(sentences[counter])+".mp3"
               source = AudioSegment.from_file(source_filename)
               target = AudioSegment.from_file(target_filename)
               if speed_up:
                  if speed_source >= 1.1: source = source.speedup(speed_source)
                  if speed_target >= 1.1: target = target.speedup(speed_target)
               
               if not fixed_silence_length:
                  target_length = len(target)+added_silence
                  if target_length > maximum_silence_length: target_length = maximum_silence_length
                  elif target_length < minimum_silence_length: target_length = minimum_silence_length
                  silence = AudioSegment.silent(duration=target_length)                   
               else:
                  silence = AudioSegment.silent(duration=silence_length)
         
               if number_of_revision == 0 and not new_sentences_once:
                  print("Adding:", source_filename, target_filename, target_filename)
                  session_file = session_file + source + silence + target + silence + target + silence
                  reps += 2
               else:                                                                   
                  print("Adding:", source_filename, target_filename)                                                             
                  session_file = session_file + source + silence + target + silence
                  reps += 1                       

   if outtro:
      session_file = session_file + AudioSegment.from_file("outtro.mp3")                           
   
   print("Reps: ", reps)
   print("Writing session file. Please wait.\n")
   if embedded_cover:
      session_file.export("GSR/"+session_filename, format="mp3", bitrate=bit_rate, tags={"album": title, "title": session_filename, "track": day}, cover=cover_file)
   else:
      session_file.export("GSR/"+session_filename, format="mp3", bitrate=bit_rate, tags={"album": title, "title": session_filename, "track": day})
       
def time_elapsed(start_time):
   end_time = time()
   elapsed_time = ceil((end_time-start_time)/60)
   print("Completed in ", str(elapsed_time//60).zfill(2), ":", str(elapsed_time%60).zfill(2), " hours.", sep='')
      
# Start

# Standard preferences
sentence_pair_files = True                   # Sentence pair files will be created
a_files = True                             # GMS-A-files will be created
b_files = True                             # GMS-B-files will be created
c_files = True                             # GMS-C-files will be created
overlearning_files = True                   # GSR-files will be created
overlearning = True                     # Overlearning instead of spaced repetition
sentences_per_mass_sentences_file = 50     # GMS-files will have 50 sentences
first_mass_sentences_file = 1            # Creation of mass sentences files will begin mit the first file

new_senteces_per_day = 10               # 10 new sentences per day
new_sentences_once = True                   # New sentences will only presented once
days_of_revision = 4                       # GSR-files will have 4 x 10 old sentences
revision_first = True                      # GSR-files will first present the oldest, then the newest sentences
patterns = [[1, 2, 3, 4, 5], [1, 2, 4, 7],
[1, 2, 4, 7], [1, 3, 6], [1, 4], [1]]       # Standard patterns
first_session_file = 1                     # Creation of GSR-files will begin mit the file for day one

bit_rate = "64k"                           # Bit rate of output soundfiles
speed_up = False                     # No speedup of source or target audio
embedded_cover = False                  # Soundfiles will have no cover
intro = False                        # No intro
outtro = False                        # No outtro

fixed_silence_length = True                # Silence between sentences has fixed length   
silence_length = 2000                        # Fixed silence length is 2 seconds
maximum_silence_length = 10000             # Maximum silence length of 10 seconds
minimum_silence_length =  1000             # Minimum silence length of 1 second
added_silence = 0                          # Added silence in case of flexible silence length

# Welcoming the user.
import os
os.system('cls' if os.name == 'nt' else 'clear')
print("Welcome to the Overlearning File Creator!")

# Gathering basic information about project
print("\nGathering basic information about project:")
title = input("\n1. What is your project title? ")
L1 = input("2. What is your source language? ")
L2 = input("3. What is your target language? ")
number_of_sentences = int(input("4. How many sentence pairs do you want to process? "))
cues = bool(input("5. Do you have ready-made cues (or else shall they be created) (y/n)? ")=="y")
if not cues:
   length_of_cue = int(float(input("   Desired cue length (in seconds)? "))*1000)
   if length_of_cue < 250: length_of_cue = 250
   length_of_fade_out = int(float(input("   Desired length of additional fade-out (in seconds)? "))*1000)
   if length_of_fade_out > 2000: length_of_fade_out = 2000
current_path_ok = bool(input("6. Is the raw data stored in the current folder (y/n)? ")=="y")
if not current_path_ok:      
   new_path = input("   Where else is the raw data stored? ")
   if os.path.exists(new_path):
      os.chdir(new_path)      
   else:
      print("   No such path. Exiting ...")
      exit()
if cues:
   for counter in range(1, number_of_sentences+1):
      if not os.path.exists(L1+"-"+str(counter)+".mp3") or not os.path.exists(L2+"-"+str(counter)+".mp3"):
         print("   Missing data for sentence pair #", counter, sep='')      
         number_of_sentences = counter-1
         print("   Only", number_of_sentences, "sentence pairs will be processed.")
         break
   if number_of_sentences == 0:
      print("\nNothing to create. Exiting ...")
      exit()
else:
   for counter in range(1, number_of_sentences+1):
      if not os.path.exists(L2+"-"+str(counter)+".mp3"):
         print("   Missing data for sentence pair #", counter, sep='')      
         number_of_sentences = counter-1
         print("   Only", number_of_sentences, "sentence pairs will be processed.")
         break
   if number_of_sentences == 0:
      print("\nNothing to create. Exiting ...")
      exit()
standard_preferences = bool(input("7. Do you want all standard preferences (y/n)? ") =="y")

# Gathering information about user defined preferences
if not standard_preferences:   
   print("\nGathering information about user defined preferences:")

   # Which kind of files shall be created?
   print("\n1. Types of files to be created")
   sentence_pair_files = bool(input("1.1 Do you want sentence pair files for shuffeling (y/n)? ")=="y")       
   a_files = bool(input("1.2 Do you want sentence training mass sentences files (L1, L2, L2) (y/n)? ")=="y")
   b_files = bool(input("1.3 Do you want interpretation training mass sentences files (L1, L2) (y/n)? ")=="y")
   c_files = bool(input("1.4 Do you want repetition/shadowing mass sentences files (L2 only) (y/n)? ")=="y")
   overlearning_files = bool(input("1.5 Do you want overlearning/spaced repetition files (y/n)? ")=="y")
   
   # Specifications of mass sentences files
   print("\n2. Specifications of mass sentences files")
   if a_files or b_files or c_files:
      standard_mass_sentences = bool(input("2.1 Do you want 50 sentences per file (y/n)? ")=="y")
      if not standard_mass_sentences: sentences_per_mass_sentences_file = int(input("    How many sentences per file do you want? "))     
      start_with_first_mass_sentences_file = bool(input("2.2 Do you want to begin with the first (set of) mass sentences file(s) (y/n)? ")=="y")
      if not start_with_first_mass_sentences_file: first_mass_sentences_file = int(input("    With which (set of) mass sentences file(s) do you want to begin? "))
   else:
      print("N/A")
   
   # Specifications of overlearning/spaced repetition files
   print("\n3. Specifications of overlearning/spaced repetition files")
   if overlearning_files:             
            
      ten_new_sentences_per_day = bool(input("3.1 Do you want 10 new sentences per day (y/n)? ")=="y")
      if not ten_new_sentences_per_day:
         new_senteces_per_day = int(input("    How many new sentences per day do you want? "))

      new_sentences_once = bool(input("3.2 Do you want new sentences to be presented once (otherwise twice) (y/n)? ")=="y")
      if not new_sentences_once:          
         print("    New sentences shall be presented twice.")

      revision_first = bool(input("3.3 Do you want revisions first (otherwise new sentences first) (y/n)? ")=="y")
      if not revision_first: print("    New sentences shall be presented first, then the revisions.")
         
      overlearning = bool(input("3.4 Do you want overlearning files (or else spaced repetition files) (y/n)? ")=="y")
      if overlearning:
      
         standard_preferences_for_overlearning_files = bool(input("3.5 Do you want standard overlearning files (y/n)? ") =="y")
         if not standard_preferences_for_overlearning_files:      
            days_of_revision = int(input("    How many days do you want to revise the sentences? "))
            extension_of_patterns = [[] for counter in range(days_of_revision-5)]
            patterns.extend(extension_of_patterns)            
            patterns_according_to_number_of_presentations = ([], [1], [1, 4], [1, 3, 6], [1, 2, 4, 7], [1, 2, 3, 4, 5])
            for counter in range(days_of_revision+1):
               if counter == 0:
                  number_of_presentations = int(input("    How often shall new sentences be presented (0-5)? "))
                  patterns[counter] = patterns_according_to_number_of_presentations[number_of_presentations]
                  print("    Pattern for new sentences:", patterns[counter])   
               else:
                  print("    How often shall sentences of revision #", counter, " be presented (0-5)? ", end='', sep='')
                  number_of_presentations = int(input())
                  patterns[counter] = patterns_according_to_number_of_presentations[number_of_presentations]
                  print("    Pattern for revision #", counter, ":", patterns[counter])         

      else:
      
         print("    Spaced repetition files shall be created.")         
         standard_preferences_for_spaced_repetition_files = bool(input("3.5 Do you want standard spaced repetition files (y/n)? ") =="y")
         if standard_preferences_for_spaced_repetition_files:                                          
            revision_dates = [1, 2, 4, 7, 14, 30, 50, 90]            # Standard learning/revision days for spaced repetition files         
            days_of_revision = revision_dates[len(revision_dates)-1]
            patterns = [[] for counter in range(days_of_revision)]                  
            standard_revision_patterns = [[1, 2, 3, 4, 5], [1,2, 4, 7], # Corresponding standard patterns for spaced repetition files
            [1, 2, 4, 7], [1, 3, 6], [1, 4], [1], [1], [1]]   
            for counter in range(len(revision_dates)):
               patterns[revision_dates[counter]-1] = standard_revision_patterns[counter]
         else:
            revision_dates = input("    Learning/revision days (e.g. 1, 2, 4, 7, etc.)? ").split(sep=",")
            for counter in range(len(revision_dates)):
               revision_dates[counter]=int(revision_dates[counter])
            if not revision_dates[0] == 1:
               revision_dates.insert(0, 1)
               print("    New sentences on first day included.")
            days_of_revision = revision_dates[len(revision_dates)-1]
            patterns = [[] for counter in range(days_of_revision)]
            patterns_according_to_number_of_presentations = ([], [1], [1, 4], [1, 3, 6], [1, 2, 4, 7], [1, 2, 3, 4, 5])
            for counter in range(len(revision_dates)):
               print("    How often shall the sentences be presented on day ", revision_dates[counter], " (0-5)? ", end='', sep='')
               number_of_presentations = int(input())
               patterns[revision_dates[counter]-1] = patterns_according_to_number_of_presentations[number_of_presentations]
               print("    Learning pattern for day", revision_dates[counter], ":", patterns[revision_dates[counter]-1])         
         days_of_revision -= 1   

      start_with_first_day = bool(input("3.6 Do you want to begin with the first session file (y/n)? ")=="y")
      if not start_with_first_day: first_session_file = int(input("    With which session file do you want to begin? "))         
         
   else:
      print("N/A")   
   
   # General Specifications
   print("\n4. General specifications")
   
   if sentence_pair_files or a_files or b_files or c_files or overlearning_files:

      standard_bit_rate = bool(input("4.1 Do you want the standard bitrate of 64k (y/n)? ")=="y")
      if not standard_bit_rate:
         bit_rate = input("    Which bitrate do you prefer (at least 32k)? ")
         if bit_rate[-1] != "k": bit_rate = bit_rate + "k"

      no_speed_up = bool(input("4.2 Do you want to keep the speed of the source and target audio (y/n)? ") == "y")
      if not no_speed_up:
         speed_up = True
         speed_source = float(input("    Speed factor of source audio (e.g. 1.25)? "))         
         if speed_source < 1.1 and not 1:
            print("    Speed factor to small. No speed up of source audio.")
            speed_source = 1
         elif speed_source > 1.5:
            print("    Speed factor to big. Set to maximum value of 1.5.")
            speed_source = 1.5
         speed_target = float(input("    Speed factor of target audio (e.g. 1.25)? "))
         if speed_target < 1.1 and not 1:
            print("    Speed factor to small. No speed up of target audio.")
            speed_target = 1
         elif speed_target > 1.5:
            print("    Speed factor to big. Set to maximum value of 1.5.")
            speed_target = 1.5
         if speed_source == 1 and speed_target == 1: speed_up = False      

      embedded_cover = bool(input("4.3 Do want want an embedded cover (y/n)? ")=="y")
      if embedded_cover:
         if os.path.exists("cover.png"): cover_file = "cover.png"
         elif os.path.exists("cover.jpg"): cover_file = "cover.jpg"
         else:
            print("    No \"cover.png\" or \"cover.jpg\" found. Cover will not be embedded.")
            embedded_cover = False   
      
      if a_files or b_files or c_files or overlearning_files:

         intro = bool(input("4.4 Do you want to include an intro? ")=="y")
         if intro and not os.path.exists("intro.mp3"):
            print("    No \"intro.mp3\" found. Intro will not be included.")
            intro = False         
         
         outtro = bool(input("4.5 Do you want to include an outtro? ")=="y")
         if outtro and not os.path.exists("outro.mp3"):
            print("    No \"outtro.mp3\" found. Outtro will not be included.")
            outtro = False                  
         
         fixed_silence_length = bool(input("4.6 Do you want fixed silence length between sentences (y/n)? ")=="y")
         if fixed_silence_length:       
            two_seconds_of_silence = bool(input("    Do you want 2 seconds of silence between sentences (y/n)? ")=="y")
            if not two_seconds_of_silence:
               silence_length = float(input("    Desired silence length between sentences (in seconds)? "))*1000   
               if silence_length > maximum_silence_length: silence_length = maximum_silence_length
               elif silence_length < minimum_silence_length: silence_length = minimum_silence_length
               added_silence = 0
         else:
            print("    Silence length shall correspond to the length of the respective sentence.")
            added_silence = float(input("    Added silence (in seconds)? "))*1000
   else:
      print("N/A")      
            
# Informing user about set specifications of project
print("\nSet specifications of your project:")

print("\n1. Types of files to be created")
print("1.1 Sentence pair files:", sentence_pair_files)
print("1.2 Sentence training mass sentences files:", a_files)
print("1.3 Interpretation training mass sentences files:", b_files)
print("1.4 Repetition/shadowing mass sentences files:", c_files)
print("1.5 Overlearning/spaced repetition files:", overlearning_files)

print("\n2. Specifications of mass sentences files")
if a_files or b_files or c_files:    
   print("2.1 Sentences per mass sentences file:", sentences_per_mass_sentences_file)
   print("2.2 First mass sentences file:", first_mass_sentences_file)
else:
   print("N/A")

print("\n3. Specifications of overlearning/spaced repetition files")
if overlearning_files:
   print("3.1 New sentences per day:", new_senteces_per_day)
   print("3.2 New sentences presented once:", new_sentences_once)   
   print("3.3 Revision first:", revision_first)
   if overlearning:   
      print("3.4 Type of files: overlearning")
      print("3.5 Days of revision:", days_of_revision)      
      for counter in range(days_of_revision+1):
         if counter == 0:
            print("    Pattern for new sentences:", patterns[counter])
         else:
            print("    Pattern for revision #", counter, ":", patterns[counter])      
   else:
      print("3.4 Type of files: spaced repetition")
      print("3.5 Learning/revision days:", revision_dates)
      for counter in range(len(revision_dates)):
         print("    Learning pattern for day", revision_dates[counter], ":", patterns[revision_dates[counter]-1])
   print("3.6 First session file:", first_session_file)
else:
   print("N/A")

print("\n4. General Specifications")
if sentence_pair_files or a_files or b_files or c_files or overlearning_files:
   print("4.1 Bitrate:", bit_rate)   
   print("4.2 Speedup:", speed_up)
   if speed_up:
      print("    Speed of source audio:", speed_source)
      print("    Speed of target audio:", speed_target)
   print("4.3 Embedded cover:", embedded_cover)
   
   if a_files or b_files or c_files or overlearning_files:
      print("4.4 Intro:", intro)
      print("4.5 Outtro:", outtro)
      print("4.6 Fixed length:", fixed_silence_length)
      if fixed_silence_length:
         print("    Silence length:", silence_length/1000, "seconds")
      else:
         print("    Flexible silence length:", True)
         print("    Added silence:", added_silence/1000, "seconds")
         
   create = bool(input("\nStart file creation process (y/n)? ")=="y")
   if not create:
      print("Nothing to create. Exiting ...")
      exit()
else:
   print("N/A")
   print("\nNothing to create. Exiting ...")
   exit()   
         
# Creation of sound files
print("\nSoundfiles will now be created.")

from time import time
from math import ceil

# Preliminary step: Creation of cues
if not cues:
   
   start_time = time()
   
   print("\nPreliminary step: ", number_of_sentences, " cue files will be created.")
   for counter in range(1, number_of_sentences+1):
      sentence_filename = L2+"-"+str(counter)+".mp3"
      sentence = AudioSegment.from_file(sentence_filename)
      cue_filename = L1+"-"+str(counter)+".mp3"      
      cue_raw = sentence[:length_of_cue+length_of_fade_out]
      cue = cue_raw.fade_out(length_of_fade_out)
      cue.export(cue_filename, format="mp3")
      print("Creating:", cue_filename)

   time_elapsed(start_time)

# Step 1: Creation of sentence pairs.
if sentence_pair_files:

   start_time = time()
      
   if not os.path.exists("GSP"):
      os.makedirs("GSP")
         
   print("\nStep 1: ", number_of_sentences, " sentence pair files (e.g. for shuffle play) will be created.")
   for number in range(1, number_of_sentences+1):
      filenumber=str(number)         
      create_sentence_pair(filenumber, silence_length)

   time_elapsed(start_time)
   
else:
   print("\nStep 1: Creation of sentence pair files: N/A")

# Step 2: Creation of mass sentences files
if a_files or b_files or c_files:               

   start_time = time()
   if a_files:
      if not os.path.exists("GMS/GMS-A"):
         os.makedirs("GMS/GMS-A")
   if b_files:
      if not os.path.exists("GMS/GMS-B"):
         os.makedirs("GMS/GMS-B")
   if c_files:
      if not os.path.exists("GMS/GMS-C"):
         os.makedirs("GMS/GMS-C")
   
   if number_of_sentences/sentences_per_mass_sentences_file != number_of_sentences//sentences_per_mass_sentences_file:
      number_of_mass_sentences_files = round(number_of_sentences//sentences_per_mass_sentences_file)+1
   else:
      number_of_mass_sentences_files = int(number_of_sentences/sentences_per_mass_sentences_file)     
   print("\nStep 2: ", number_of_mass_sentences_files-first_mass_sentences_file+1, " (sets of) mass sentences file(s) will be created.")       
         
   for number in range(first_mass_sentences_file, number_of_mass_sentences_files+1):       
   
      number_of_first_audiofile = (number-1)*sentences_per_mass_sentences_file+1
      if a_files:
         a_file_name = (title+"-"+L1+"-"+L2+"-GMS-A-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
         print(a_file_name)
      if b_files:
         b_file_name = (title+"-"+L1+"-"+L2+"-GMS-B-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
         print(b_file_name)
      if c_files:
         c_file_name = (title+"-"+L1+"-"+L2+"-GMS-C-") + str(number_of_first_audiofile).zfill(4) + ".mp3"
         print(c_file_name)
            
      if intro:
         a_file = AudioSegment.from_file("intro.mp3")         
         b_file = AudioSegment.from_file("intro.mp3")
         c_file = AudioSegment.from_file("intro.mp3")
      else:
         a_file = AudioSegment.empty()
         b_file = AudioSegment.empty()
         c_file = AudioSegment.empty()
      
      if number*sentences_per_mass_sentences_file > number_of_sentences:
         maximum_files_to_add = number_of_sentences - (number-1)*sentences_per_mass_sentences_file
      else:
         maximum_files_to_add = sentences_per_mass_sentences_file       
      
      for counter in range(maximum_files_to_add):

         filenumber = number_of_first_audiofile + counter
         source_filename = L1+"-"+str(filenumber)+".mp3"
         target_filename = L2+"-"+str(filenumber)+".mp3"
         source = AudioSegment.from_file(source_filename)
         target = AudioSegment.from_file(target_filename)
         if speed_up:
            if speed_source >= 1.1: source = source.speedup(speed_source)
            if speed_target >= 1.1: target = target.speedup(speed_target)
         
         if not fixed_silence_length:
            target_length = len(target)+added_silence
            if target_length > maximum_silence_length: target_length = maximum_silence_length
            elif target_length < minimum_silence_length: target_length = minimum_silence_length
            silence = AudioSegment.silent(duration=target_length)                   
         else:
            silence = AudioSegment.silent(duration=silence_length)

         if a_files:
            a_file = a_file + source + silence + target + silence + target + silence
            print("Adding to A-file:", source_filename, target_filename, target_filename)                                   
         if b_files:
            b_file = b_file + source + silence + target + silence
            print("Adding to B-file:", source_filename, target_filename)                                   
         if c_files:
            c_file = c_file + target + silence
            print("Adding to C-file:", source_filename)

      if outtro:
         a_file = a_file + AudioSegment.from_file("outtro.mp3")         
         b_file = b_file + AudioSegment.from_file("outtro.mp3")
         c_file = c_file + AudioSegment.from_file("outtro.mp3")
         
      print("Writing mass sentences file(s). Please wait.\n")
      if a_files:       
         if embedded_cover:
            a_file.export("GMS/GMS-A/"+a_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": a_file_name, "track": number}, cover=cover_file)
         else:
            a_file.export("GMS/GMS-A/"+a_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": a_file_name, "track": number})
      if b_files:
         if embedded_cover:
            b_file.export("GMS/GMS-B/"+b_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": b_file_name, "track": number}, cover=cover_file)
         else:         
            b_file.export("GMS/GMS-B/"+b_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": b_file_name, "track": number})
      if c_files:
         if embedded_cover:
            c_file.export("GMS/GMS-C/"+c_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": c_file_name, "track": number}, cover=cover_file)
         else:
            c_file.export("GMS/GMS-C/"+c_file_name, format="mp3", bitrate=bit_rate, tags={"album": title, "title": c_file_name, "track": number})

   time_elapsed(start_time)
               
else:
   print("\nStep 2: Creation of mass sentences files: N/A")               

# Step 3: Creation of sessionfiles
if overlearning_files:

   start_time = time()

   if not os.path.exists("GSR"):
      os.makedirs("GSR")
         
   number_of_days = ceil(number_of_sentences/new_senteces_per_day) + days_of_revision - first_session_file + 1            
   print("\nStep 3: ", number_of_days, "session files will be created. ")

   number_of_necessary_sentences = ceil(number_of_sentences/new_senteces_per_day)*new_senteces_per_day

   if revision_first:
      first = days_of_revision
      last = -1
      step = -1
   else:
      first = 0
      last = days_of_revision+1
      step = +1
   
   for day in range(first_session_file, first_session_file + number_of_days):               
      create_sessionfile(day, days_of_revision, first, last, step)
   
   time_elapsed(start_time)
      
else:
   print("\nStep 3: Creation of spaced repetition files: N/A\n")

os.system('pause' if os.name == 'nt' else 'read')
You do not have the required permissions to view the files attached to this post.
5 x

User avatar
rdearman
Site Admin
Posts: 7231
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23123
Contact:

Re: Solved: How to create your own Glossika-like GSR files?

Postby rdearman » Mon Jul 09, 2018 12:11 pm

neumanc wrote:The code of the current version 3.2 can be downloaded here:


You should consider putting the code on Github so others can contribute and extend.
0 x
: 0 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

User avatar
Jim
Orange Belt
Posts: 142
Joined: Sat Apr 22, 2017 3:18 pm
Languages: English (N), French (intermediate), Spanish (frosty intermediate), Russian (beginner)
Language Log: viewtopic.php?f=15&t=5766
x 254

Re: Solved: How to create your own Glossika-like GSR files?

Postby Jim » Thu Jul 26, 2018 9:52 pm

neumanc wrote:
Axon wrote:...

I had a strange bug where it said there was a syntax error on line 77 right at the last =. But when I used Anaconda to launch it, there was no problem.

...


...

I cannot see any reason why you got an error message as you first tried the script. I tested it intensively with WinPython without any error message. As I see it, Line 77 is perfectly fine. But as you know, computer programs are always a delicate matter. Some mistake will always be found under certain circumstances, I'm afraid.

...

I got this error. Turns out I was trying to run it with Python 2, not Python 3 (mac problem I think). Apparently the print arguments "sep" and "end" don't exist in Python 2.

I'm completely new to Python, so I've been struggling a bit to get the modules installed, but this does exactly what I'm looking for - thanks so much neumanc.
1 x
La hora más oscura es la que viene antes del nacimiento del sol

Merci de corriger mes erreurs !

User avatar
Jim
Orange Belt
Posts: 142
Joined: Sat Apr 22, 2017 3:18 pm
Languages: English (N), French (intermediate), Spanish (frosty intermediate), Russian (beginner)
Language Log: viewtopic.php?f=15&t=5766
x 254

Re: Solved: How to create your own Glossika-like GSR files?

Postby Jim » Sun Jul 29, 2018 6:41 am

I got this working last night and it really is a fantastic piece of software. In case it helps anyone else, when running the code I received the following error:
Automatic encoder selection failed for output stream #0:0. Default encoder for format mp3 (codec mp3) is probably disabled. Please choose an encoder manually.
I haven't a clue how to solve this (again it may be a mac issue, I don't know), but I worked round it by changing the code so that the output format was "wav" and the output file names added ".wav". I can then convert the wav files to mp3 using iTunes with no bother.

As a suggestion, you might extend the input filetype to other types such as m4a or aiff and give the user the option of selecting an output in a different filetype, but this now works well for what I was looking to do. Many thanks again.
0 x
La hora más oscura es la que viene antes del nacimiento del sol

Merci de corriger mes erreurs !

juman
White Belt
Posts: 24
Joined: Sat Nov 14, 2015 4:36 pm
x 33

Re: Solved: How to create your own Glossika-like GSR files?

Postby juman » Sun Jul 29, 2018 7:50 pm

Hi,

I'm trying to test this out but must have misunderstood something. I have the script in a folder with audio pairs (L1 = EN, L2 = FR) named EN-1.mp3, FR-1.mp3, EN-2.mp3, FR-2.mp3 etc. However I get an issue when I run the script with mainly default parameters :

Code: Select all

1. What is your project title? Test
2. What is your source language? EN
3. What is your target language? FR
4. How many sentence pairs do you want to process? 5
5. Do you have ready-made cues (or else shall they be created) (y/n)? n
   Desired cue length (in seconds)? 0.5
   Desired length of additional fade-out (in seconds)? 0.2
6. Is the raw data stored in the current folder (y/n)? y
7. Do you want all standard preferences (y/n)? y


When it starts it begins with the following :

Code: Select all

Soundfiles will now be created.

Preliminary step:  5  cue files will be created.
Creating: EN-1.mp3
Creating: EN-2.mp3
Creating: EN-3.mp3
Creating: EN-4.mp3
Creating: EN-5.mp3
Completed in 00:01 hours.

Step 1:  5  sentence pair files (e.g. for shuffle play) will be created.
Test-EN-FR-1.mp3
Test-EN-FR-2.mp3
Test-EN-FR-3.mp3
Test-EN-FR-4.mp3
Test-EN-FR-5.mp3
Completed in 00:01 hours.


When I then check my original files the EN-x.mp3 files have been overwritten with the FR-x.mp3 files and the resulting files is only in French?
0 x

User avatar
rdearman
Site Admin
Posts: 7231
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23123
Contact:

Re: Solved: How to create your own Glossika-like GSR files?

Postby rdearman » Mon Jul 30, 2018 8:53 am

I know nothing about this, but given it is asking if your source files are in another folder, you might want to put them in another folder so they don't get overwritten?
0 x
: 0 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

juman
White Belt
Posts: 24
Joined: Sat Nov 14, 2015 4:36 pm
x 33

Re: Solved: How to create your own Glossika-like GSR files?

Postby juman » Mon Jul 30, 2018 9:39 am

Found the answer to my own problem. The files was overwritten as I had this segment in my setup :

Code: Select all

5. Do you have ready-made cues (or else shall they be created) (y/n)? n
   Desired cue length (in seconds)? 0.5
   Desired length of additional fade-out (in seconds)? 0.2


It generates cues that overwrite the ENglish files
0 x


Return to “Language Programs and Resources”

Who is online

Users browsing this forum: No registered users and 2 guests