Postby ryanheise » Fri Sep 25, 2020 3:45 am
Update:
I have unfortunately been sick recently, so I am trying to hold myself back from working on this app until I've recovered my health.
I know I shouldn't overexert myself, but one loophole I just thought of is that, although thinking about some of the app design decisions hurt my brain (and I shouldn't do that right now), maybe I can still just share those brain strainers here, and see if I can tap into other people's brains instead of my own (muahahah).
Earlier the problem was discussed that an audio file can't always be perfectly arranged into a binary tree. I came up with this concept of a "unit" (I am also tossing up between calling it a "unit" or a "section". A unit consists of around 6 to 12 atomic segments and is subdivided into a tree. If the unit length is 8, then it can be divided into a perfect binary tree. E.g.
Unit: ABCDEFGH
Halves: ABCD EFGH
Clumps: AB CD EF GH
Atoms: A B C D E F G H
A unit is also the unit of review, and will be scheduled for review in future days. Units will mature, and as they mature, adjacent units will also be joined together. But at the sub-unit levels, a key point is that a clump can have between 1 to 3 atoms which provides some flexibility in how the tree can adapt to the natural clumps of connected phrases. So we could have this arrangement:
Unit: ABCDEFGHIJ
Halves: ABCD EFGHIJ
Clumps: AB CD EFG HIJ
Atoms: A B C D E F G H I J
I'm happy with this solution, and the algorithms are all working great. But the next challenge is how to handle the super-unit levels. Let's say that our audio file has 8 units. In this case, it is possible to merge these into a perfect binary tree:
U1 U2 U3 U4 U5 U6 U7 U8
U1-2 U3-4 U5-6 U7-8
U1-4 U5-8
U1-8
Because each unit can flex between 6 to 12 atoms, this gives me the flexibility to rebalance any tree into a perfect binary tree. But my latest thought is that I don't necessarily want to do this because it may mean that unit boundaries will not be placed in the most ideal places. There will be natural sections of audio which may be thematic, and I'd ideally like the unit boundaries to be inserted there, and this may result in a total number of units that is not a power of 2 (required to create a perfect binary tree).
So my current plan is to give up on the idea of connecting all units into one massive binary tree with a single root. If the audio file was long, say 64 units, then, at the first super-unit level, those 64 units would be merged into pairs of 32 super-units, and then at the second super-unit level, those would be merged into 16 super-super units, at which point I think we've gone far enough. The user should be able to at least choose how many extra levels they want to merge beyond the unit level, but for long files it doesn't make sense to keep on merging beyond the second level.
One benefit of this is that we don't need to constrain the number of units in the file to being a perfect power of 2 (e.g. 8, 16, 32, 64). If the user only wants to merge up to one super-unit level, we end up with pairs of units, and so that constrains it to having an even number of units which is a much easier constraint to meet. If we go one more level, then the file would need to have a total number of units divisible by 4.
Anyway, here's the brain strain part. For convenience, I don't want the user to have to look at the whole audio file in advance and check that all the unit boundaries are the way they want them to be. That would be too much of a burden up front (a burden that could alternatively be eliminated by pre-prepared content, but I still want to solve this problem for self-prepared content). Instead, I would want the user to open the app, and just take each unit as it comes. E.g. one unit each day. By just looking at the current unit, they can make a decision on whether they want to adjust the length of that unit by adding or removing atoms from the end of it, to create a more natural unit boundary (if the algorithm didn't already do a good enough job).
If we do it this way, there is a risk that the user will not end up with an optimum number of units for merging. E.g. if merging to two super-unit levels, we want to have groups of 4 units. Let's see how this would pan out:
U1 U2 U3 U4 | U5 U6 U7 U8 | U9 U10 U11 U12 | U13 (U14) A A A A A A A A A A A A |
If today you're preparing the next unit U14 and there are only 12 more atoms left in the file (enough for 2 small units on their own), the challenge here is to let the user now not to add more atoms onto the end of U14. Of course, I can communicate this to the user, but then what if 5 of those atoms are actually filler outros that you don't want to actually include in your training routine? So maybe before we start training the file at all, we should ask the user to just take a look at the beginning and end of the file and trim off any atoms from the intro/outro that they want to eliminate. That could solve it.
But there are also some alternatives that I have considered:
1. Allow super units (and super super units) to have an imperfect length. E.g. allow the last super super unit to have 3 units instead of 4. I am already adding this sort of flex capability at the clump level (clumps can have any length ranging from 1 to 3), I just need to figure out a training algorithm that can make sense of this. Back chaining could actually work at this level. E.g. review U15, then U14U15, then U13U14U15.
2. If you look at my sketch in a previous post where training routines are viewed as pipelines where new content/files are constantly being fed into the pipeline, then we could actually just continue on with the next file. So, U16 could be the start of a new file. I don't really like this approach, but it could make sense if you have an audio story that was split into multiple files and they really are all part of the same story.
So that's the big issue that's been hurting my brain. I need to get this sorted out before I do a next release because changing the database structure afterwards would be a bit complicated.
Aside from that, another feature I want to get right at the beginning (also because of its effect on the database design) is tracking statistics. I am already tracking how many times you listen to each segment, but what statistics and reports would actually be useful?
On this topic of statistics, one interesting fact is that the back chaining approach will result in a very uneven listen time for different parts of the audio. If you listen to:
C
BC
ABC
Then you're listening to C 3 times, b 2 times, and A only once. The other two algorithms give a more even number of reps for each segment.
P.S. A week ago before I got sick, some good news is that I finally got the iOS side working. But before I do a release, I want to make a decision on the above.
1 x