rdearman wrote:Cool, it works. I had some issues with the text files because messing around with them on windows seems to have confused the file encoding. So I'll start again and download the text to my linux box and do all the hunalign, split, etc. there and then run the forced alignment tool.
I played it in SubTitle Edit and it was perfectly aligned for the 3 different srt files I tried. So, fix the encoding and it is a done deal. I had to compile hunalign, but since I'm modifying the file to have one sentence per line, it was pretty good on the alignment FR to EN as well. Actually it makes for a good LR system. You can Listen & Read all day long.
OK, Windows wasn't the culprit for the encoding problem, it was the dipstick in charge of the programming (me) who forgot that perl needs you to tell it that it is getting UTF-8 input and that you want UTF-8 output. So having fixed that by adding a couple of lines to the perl script, it output the files correctly and the srt files.
Now for me, I'm actually cool with just having subtitle files in French and I think I could work with this as a complete french card. However, the sentence length is really long. I had to look at them in Sub-Title Edit because even VLC will not display the mp3 with sub-title (and I thought it did everything!). But a little digging showed that if you do the following:
enabled visualizations (Audio > Visualizations > Spectrometer) => subtitles are autodetected and play fine (srt)
Then it will show the subtitles, and even the really long sentences. So this is great for LR. I can listen and read along in French. Sweet.
I will align the English and generate a deck at some point to share on AnkiWeb. But this will be great for me just the LR.