2 weeks to build a language app (a music-style language trainer)

ryanheise · Postby **ryanheise** » Thu Sep 10, 2020 2:03 pm

Cainntear wrote:
ryanheise wrote:On an unrelated note, the file format will be similar to what Cainntear proposed, but with room for multiple translations and transcripts.

Code: Select all
{ "translationLanguages": ["en"], // ISO 639-* "segment": { "translations": ["..."], ] } }

Personally I'd favour using a dictionary for the translations rather than parallel arrays. That would save having a whole load of null values floating around while a translation is work in progress. It also avoids having to add in all the nulls as soon as a new translation is started, if the format is used as a live datatype rather than just the archival format. (For example, if someone built a Vue app to work off the same data or to browse it online, JSON would obviously be the native format.)
That said, you might have solid architectural reasons for your choice.

I'm not tied to that decision apart from the fact that I want the transcripts to be ordered with the primary transcript appearing first (e.g. in dropdown menus). The translations don't particularly need to be ordered (Well, Netflix happens to order available subtitles and audio tracks, but there's not really any inherent or universal order here so translations could be a dictionary with the display order being decided in some other way).

start/end/transcripts/translations can be omitted from parent segments and they will be automatically inferred from the child segments.

"Can be" as in "optional", or "can be" as in "will be...?"

What I'm referring to here is that if the node "AB" lacks a transcript but its children "A" and "B" both have transcripts, then the transcript of "AB" can be inferred by concatenating "A" + "B". Who does the inferring? It is up to whichever program opens the file whether it wants to do that. The same applies to translations. In some cases, it may be useful to insert a special translation for "AB" which is different from the simple concatenation of "A" and "B", particularly if the translation language is very different from the source language with a different word order. The natural translation of "AB" may involve placing words at different ends of the sentence that could not be reconstructed through simple concatenation. The start/end times can easily be inferred. The start time of "AB" comes from the start time of "A" and the end time of "AB" comes from the end time of "B", and this is advantageous since if you edit the timestamps of "A" or "B" they will automatically propagate up the tree.

I'm thinking about materials specifically designed for learners with a pause for response. In certain use cases, the gaps may be considered part of the data, whereas in others they might not, and in certain situations that will correspond to the levels of grouping. Or maybe that can be obviated by automatic silence insertion in your app.

The plan is that there will be gaps, but they will be part of the training routine (which is a separate data structure) rather than baked into the content. The training routine will allow you to vary the gaps depending on the maturity of the segment, for example.

ryanheise · Postby **ryanheise** » Thu Sep 10, 2020 6:46 pm

Day 12: Goal achieved (although it's now 4:45 in the morning...) Published an update to Google Play. (Since I haven't written the audio analyser yet, this just cuts up the audio randomly, but it then uses the proper algorithm to merge those segments simulating what would happen if the segments were correctly cut. Only one training routine type has been implemented for now, but I spent most of the day actually sketching on paper how all of the other ones would work.)

ryanheise · Postby **ryanheise** » Fri Sep 11, 2020 6:12 pm

Day 13: I've built a rudimentary audio segment splitter and published to Google Play (with lots of bugs). One more day to go...

I was going to create a video of the thing in operation, but it's getting late again, so here is a sketch from yesterday's brainstorming session on how the app's review system is going to work:

I wonder if anyone can figure out what it means? (After I wake up tomorrow, I wonder if *I* can figure out what it means?)

ryanheise · Postby **ryanheise** » Sat Sep 12, 2020 6:53 pm

Day 14. Drum roll... Tada!

Success! Well, it's probably not ready to release to the public, and the UI needs a complete redesign (so that it doesn't look like it was designed by a programmer), but I'm happy with what I was able to build during these two weeks, and I'm quite pleased with how the automated segment splitting worked out. I managed to get a lot done on the last day, probably because it was the first day of the weekend.

As you can see in the video, I have implemented my first attempt at some different training sequences in the top-right menu, but these will need some tweaking. We won't really know how these will feel in practice until we use them.

And to that end, my focus will now turn to the following:

1. Getting the training session to run on auto-play, where it can be programmed to repeat each segment a certain number of times before continuing (I'm pretty sure I can finish this tomorrow).
2. Get the review system working. I expect I'll be able to start using the app within a few days, but it's through using it over several days that I'll notice if anything in the review system needs to be fixed.
3. Get this out on iPhone, too.

After that, I'm going to keep improving it until it's a lot easier to use.

Adrianslont · Postby **Adrianslont** » Sun Sep 13, 2020 8:46 am

Congratulations, Ryan. I’m eagerly awaiting its arrival on iPhone.

ryanheise · Postby **ryanheise** » Mon Sep 14, 2020 4:27 pm

Thanks! I will take a rest from coding for a couple of days since I'm feeling a bit burnt out.

Instead, I've recently been thinking about this thing which I've drawn up below:

While coding the app, I made some rushed decisions on how I wanted to structure the app and came up with the concept of a "routine". I knew that I wanted to use the app to study Peppa Pig intensively, i.e. listening and repeating over and over again for a long time. But I also wanted to do my listening attention span-stretching exercises where there is still a lot of repetition, but the daily throughput is higher.

That was when I came to the realisation of the above chart. Those 3 pipelines are what I was earlier referring to as routines. The top pipeline is the material I'm studying intensively at a slow pace. Each circle represents material I'm studying (e.g. a new Peppa Pig episode). As it passes through the pipeline from left to right, the days pass, and eventually it will come out the other side where it will be retired and I'll move onto the next material I want to study.

The middle pipeline has a higher throughput rate but essentially works the same way. This is about the rate at which I would study audio stories where I'm not trying to memorise, I'm just trying to understand it and maybe shadow or echo it, while also maybe practicing my listening ability as the review segments will be merged into longer and longer segments as they make their way towards the right exit of the pipeline.

The bottom pipeline is even faster where maybe what you listen to passes through the pipeline completely within a day and you never hear it again, sort of like extensive listening (if that's a word).

I think this more accurately reflects how I would want to design my personal study routines. I'd want to have a good mix of extensive and intensive listening, and something in-between, and I'd like to use different preferences for each pipeline for how repetition, merging and reviewing would work within that pipeline.

Cainntear · Postby **Cainntear** » Tue Sep 15, 2020 4:34 pm

ryanheise wrote:The bottom pipeline is even faster where maybe what you listen to passes through the pipeline completely within a day and you never hear it again, sort of like extensive listening (if that's a word).

Yes, it's a word. Extensive listening isn't a true opposite to intensive listening, although they are in many ways opposites in a practical sense. Intensive listening is about listening to something closely and deliberately, whereas extensive listening is focused on listening to lots of different material. Listening to lots of material necessitates listening at a decent pace, clearly...

So listening at full speed isn't extensive listening in and of itself, and within the SLA community, you'd probably find some disagreement over whether repeated relistening to something counts as extensive listening at all, or if it has to be new material to count.

TL;DR: probably best to avoid the word. ;-)

Anyway, good going there. I'm going to copy you and set myself public goals here. I've been noodling around with my FSI project prototypes recently, so I'll try to get something up this week about that.

Looking forward to seeing how your tools develop!

ryanheise · Postby **ryanheise** » Tue Sep 15, 2020 5:15 pm

Cainntear wrote:
ryanheise wrote:The bottom pipeline is even faster where maybe what you listen to passes through the pipeline completely within a day and you never hear it again, sort of like extensive listening (if that's a word).

Yes, it's a word. Extensive listening isn't a true opposite to intensive listening, although they are in many ways opposites in a practical sense. Intensive listening is about listening to something closely and deliberately, whereas extensive listening is focused on listening to lots of different material. Listening to lots of material necessitates listening at a decent pace, clearly...

So listening at full speed isn't extensive listening in and of itself, and within the SLA community, you'd probably find some disagreement over whether repeated relistening to something counts as extensive listening at all, or if it has to be new material to count.

I suspected as much, and I'll take your advice and try to use different words.

Anyway, good going there. I'm going to copy you and set myself public goals here. I've been noodling around with my FSI project prototypes recently, so I'll try to get something up this week about that.

Looking forward to seeing how your tools develop!

Thanks! And good luck with your own goals. Sometimes you need to push yourself a bit to make things happen. I was SO intent on trying to build something in two weeks that I ended up going to sleep at around 4:30am for the last several nights in a row. I'm not sure if that's such a good thing! However, I DID get it done in the end, just in the nick of time, and I'm happy about that because the hardest part is now over. I can now relax a little, start using the app, and just tweak it as needed.

ryanheise · Postby **ryanheise** » Fri Sep 25, 2020 3:45 am

Update:

I have unfortunately been sick recently, so I am trying to hold myself back from working on this app until I've recovered my health.

I know I shouldn't overexert myself, but one loophole I just thought of is that, although thinking about some of the app design decisions hurt my brain (and I shouldn't do that right now), maybe I can still just share those brain strainers here, and see if I can tap into other people's brains instead of my own (muahahah).

Earlier the problem was discussed that an audio file can't always be perfectly arranged into a binary tree. I came up with this concept of a "unit" (I am also tossing up between calling it a "unit" or a "section". A unit consists of around 6 to 12 atomic segments and is subdivided into a tree. If the unit length is 8, then it can be divided into a perfect binary tree. E.g.

Unit: ABCDEFGH
Halves: ABCD EFGH
Clumps: AB CD EF GH
Atoms: A B C D E F G H

A unit is also the unit of review, and will be scheduled for review in future days. Units will mature, and as they mature, adjacent units will also be joined together. But at the sub-unit levels, a key point is that a clump can have between 1 to 3 atoms which provides some flexibility in how the tree can adapt to the natural clumps of connected phrases. So we could have this arrangement:

Unit: ABCDEFGHIJ
Halves: ABCD EFGHIJ
Clumps: AB CD EFG HIJ
Atoms: A B C D E F G H I J

I'm happy with this solution, and the algorithms are all working great. But the next challenge is how to handle the super-unit levels. Let's say that our audio file has 8 units. In this case, it is possible to merge these into a perfect binary tree:

U1 U2 U3 U4 U5 U6 U7 U8
U1-2 U3-4 U5-6 U7-8
U1-4 U5-8
U1-8

Because each unit can flex between 6 to 12 atoms, this gives me the flexibility to rebalance any tree into a perfect binary tree. But my latest thought is that I don't necessarily want to do this because it may mean that unit boundaries will not be placed in the most ideal places. There will be natural sections of audio which may be thematic, and I'd ideally like the unit boundaries to be inserted there, and this may result in a total number of units that is not a power of 2 (required to create a perfect binary tree).

So my current plan is to give up on the idea of connecting all units into one massive binary tree with a single root. If the audio file was long, say 64 units, then, at the first super-unit level, those 64 units would be merged into pairs of 32 super-units, and then at the second super-unit level, those would be merged into 16 super-super units, at which point I think we've gone far enough. The user should be able to at least choose how many extra levels they want to merge beyond the unit level, but for long files it doesn't make sense to keep on merging beyond the second level.

One benefit of this is that we don't need to constrain the number of units in the file to being a perfect power of 2 (e.g. 8, 16, 32, 64). If the user only wants to merge up to one super-unit level, we end up with pairs of units, and so that constrains it to having an even number of units which is a much easier constraint to meet. If we go one more level, then the file would need to have a total number of units divisible by 4.

Anyway, here's the brain strain part. For convenience, I don't want the user to have to look at the whole audio file in advance and check that all the unit boundaries are the way they want them to be. That would be too much of a burden up front (a burden that could alternatively be eliminated by pre-prepared content, but I still want to solve this problem for self-prepared content). Instead, I would want the user to open the app, and just take each unit as it comes. E.g. one unit each day. By just looking at the current unit, they can make a decision on whether they want to adjust the length of that unit by adding or removing atoms from the end of it, to create a more natural unit boundary (if the algorithm didn't already do a good enough job).

If we do it this way, there is a risk that the user will not end up with an optimum number of units for merging. E.g. if merging to two super-unit levels, we want to have groups of 4 units. Let's see how this would pan out:

U1 U2 U3 U4 | U5 U6 U7 U8 | U9 U10 U11 U12 | U13 (U14) A A A A A A A A A A A A |

If today you're preparing the next unit U14 and there are only 12 more atoms left in the file (enough for 2 small units on their own), the challenge here is to let the user now not to add more atoms onto the end of U14. Of course, I can communicate this to the user, but then what if 5 of those atoms are actually filler outros that you don't want to actually include in your training routine? So maybe before we start training the file at all, we should ask the user to just take a look at the beginning and end of the file and trim off any atoms from the intro/outro that they want to eliminate. That could solve it.

But there are also some alternatives that I have considered:

1. Allow super units (and super super units) to have an imperfect length. E.g. allow the last super super unit to have 3 units instead of 4. I am already adding this sort of flex capability at the clump level (clumps can have any length ranging from 1 to 3), I just need to figure out a training algorithm that can make sense of this. Back chaining could actually work at this level. E.g. review U15, then U14U15, then U13U14U15.

2. If you look at my sketch in a previous post where training routines are viewed as pipelines where new content/files are constantly being fed into the pipeline, then we could actually just continue on with the next file. So, U16 could be the start of a new file. I don't really like this approach, but it could make sense if you have an audio story that was split into multiple files and they really are all part of the same story.

So that's the big issue that's been hurting my brain. I need to get this sorted out before I do a next release because changing the database structure afterwards would be a bit complicated.

Aside from that, another feature I want to get right at the beginning (also because of its effect on the database design) is tracking statistics. I am already tracking how many times you listen to each segment, but what statistics and reports would actually be useful?

On this topic of statistics, one interesting fact is that the back chaining approach will result in a very uneven listen time for different parts of the audio. If you listen to:

C
BC
ABC

Then you're listening to C 3 times, b 2 times, and A only once. The other two algorithms give a more even number of reps for each segment.

P.S. A week ago before I got sick, some good news is that I finally got the iOS side working. But before I do a release, I want to make a decision on the above.

ryanheise · Postby **ryanheise** » Thu Oct 08, 2020 3:36 pm

Progress update

This past week I have been feeling better, and decided to work on the segment editor:

This allows you to adjust the segment boundaries, and split/join segments if needed. I'm quite happy with the way it turned out, although the only thing I don't like is that probably the colour scheme doesn't really make sense. I just went off my original drawing with those "highlighter" colours painted over the top, but anyway, I can change the colours later. Also, I think it would be useful to have a different colour on the sound wave sweeping from left to right as it is being played back, so you know where it's up to.

Anyway, my new plan is this (and it's not going to be some intense deadline that I'll force myself to meet, since that's probably what got me sick in the first place):

- Add a screen to let you trim the unwanted parts from the beginning and end of the file.
- Add UI options to allow you to switch between different playback algorithms, how many reps per segment, how long to pause between reps, audio playback speed, etc.
- Release the iOS beta.

By the way, I'm not sure if I mentioned it yet, but I did get the auto-play feature working and the review system working (I believe, but I haven't tested it yet). So I shouldn't be that far off from a beta release that meets Apple's standards.

A language learners’ forum

2 weeks to build a language app (a music-style language trainer)

Re: 2 weeks to build a language app (a music-style language trainer)

Re: 2 weeks to build a language app (a music-style language trainer)

Re: 2 weeks to build a language app (a music-style language trainer)

Re: 2 weeks to build a language app (a music-style language trainer)

Re: 2 weeks to build a language app (a music-style language trainer)

Re: 2 weeks to build a language app (a music-style language trainer)

Re: 2 weeks to build a language app (a music-style language trainer)

Re: 2 weeks to build a language app (a music-style language trainer)

Re: 2 weeks to build a language app (a music-style language trainer)

Re: 2 weeks to build a language app (a music-style language trainer)

Who is online