Studying languages to test my multilingual keyboard layout

Continue or start your personal language log here, including logs for challenge participants
User avatar
Deinonysus
Brown Belt
Posts: 1221
Joined: Tue Sep 13, 2016 6:06 pm
Location: MA, USA
Languages:  
• Native: English
• Advanced: French
• Intermediate: German,
   Spanish, Hebrew
• Beginner: Italian,
   Arabic
x 4635

Studying languages to test my multilingual keyboard layout

Postby Deinonysus » Thu Mar 05, 2020 5:03 pm

About two weeks ago I posted my QUFLXÜ multilingual keyboard layout, which I have been using with minor changes for the past year or so, in the Language Programs and Resources section. I guess I should have waited a bit longer because I have made an enormous number of changes since then and I think it is greatly improved (or maybe posting gave me the inspiration for the changes).

Before I go into the changes, here are a few explanations about terms I will use:
  • Dedicated key: A letter has a dedicated key if you just press that key (without needing to type any additional keys) to type the letter. On most Latin keyboard layouts, there are dedicated keys for most letters and numbers, and there will be a variable number of dedicated keys for punctuation.
  • AltGr key: On many keyboard layouts where the number of dedicated keys is not enough to type all the symbols that the language needs, the Right Alt key will often become the AltGr (Alternate Graph) key, which can turn a dedicated key into an alternate symbol. For example, on the French AZERTY layout, the AltGr key is required to type many symbols such as the # or @ signs. On the QWERTY International and Colemak layouts, the AltGr key turns any vowel aeiou into the acute accented version áéíóú.
  • Deadkey: If you press a deadkey, the next key you type will be changed in a certain way. The Greek keyboard layout has an acute accent deadkey that will give the next vowel you type an acute accent. I have many deadkeys in the number row that can be accessed with the AltGr key, and they can be used to type a wide variety of precomposed Unicode letters with 1-2 diacritic marks.
  • Primary support: Using my deadkeys means that it will take several keystrokes to type one letter, as well as moving your hand all the way up to the number row. This means that using deadkeys can seriously slow down your typing. When I say that I have primary support for a language, I mean that the language can be typed without using any deadkeys except for punctuation or extremely rare letters (such as ÿ in French, which only occurs in a few proper nouns).
  • Secondary support: I use this to mean that a language can be typed using deadkeys.
  • No support: If a language has any letters that a layout has no way of typing, that language is not supported.

First, I decided that it was a mistake to require the AltGr key to type an apostrophe and I needed to put it back on a dedicated key. That got me to rethink the goals of my layout, and I decided that rather than being nebulously optimized for assorted Western and Northern European languages that are popular for Americans to learn, the focus should be to make typing as easy as possible for the most people in their native languages. I saw that I would be able to provide secondary support for Vietnamese if I freed up a couple of keys (previously, there were several Vietnamese letters that could not be typed at all on QUFLXÜ).

Then, I realized that if I dedicated three AltGr keys to the Turkish letters ı, ğ, and ş, I would have primary support for ten of the eleven most spoken Latin Script languages in the world by total number of speakers (or nine of the ten most spoken by native speakers, which is the same list but with Swahili removed and Indonesian generalized to Malay). There were three keys left over so I also added ə to the layout (used in Azerbaijani, 23m native speakers), and retained ø (used in Danish and Norwegian, 11 million native speakers combined) and ý (used in many languages including Czech, Kazakh, and Vietnamese), but I got rid of AltGr keys for ð and þ, although they can still be typed with the symbol deadkey.

The eleven most spoken Latin script languages by total speakers are:
  1. English (1.132b speakers)
  2. Spanish (534.3m)
  3. French (279.8m)
  4. Portuguese (234.1m)
  5. Indonesian (198.7m)
  6. German (132.1m)
  7. Swahili (98.3m)
  8. Turkish (79.7m)
  9. Vietnamese (76.9m)
  10. Javanese (68.2m)
  11. Italian (67.8m)
source

And the ten most spoken by native speakers are:
  1. Spanish (390m)
  2. English (365m)
  3. Portuguese (205m)
  4. German (92m)
  5. Javanese (82m)
  6. Malay (77m)
  7. Vietnamese (75m)
  8. French (75m)
  9. Turkish (63m)
  10. Italian (59m)
source

Unfortunately, the 11th most spoken Latin Script language by native speakers, Polish, would have required eight extra AltGr keys and the 12th most spoken by total speakers, Hausa, would have required five, but I only had three spare AltGr keys for letters.

This is what the layout looks like at the moment:

2020-03-05 DeinonML.png

A couple of unintuitive deadkeys are:
  • ¤ (shift+4) - Currency symbols (eg $, €, ₽)
  • ¿ (AltGr+1) - symbols and some letters (eg ‽ … ™ © ð þ)
  • º (AltGr+2) - the masculine ordinal, plus superscript letters and numbers
  • ᵃ (AltGr+shift+1) - the feminine ordinal, plus letters featured in many African alphabets, such as ɔ and ɓ.
  • ̨ (AltGr+shift+5) - vowels with ogonek (ą ę į ǫ ų) and consonants with comma below (ș, ț)
  • « (AltGr + 9) - some left quotation marks as well as a double grave accent, as in ȁ
  • » (AltGr + 0) - some right quotation marks as well as a double acute accent, as in ő, ű
  • α (AltGr+shift+8) - The Greek alphabet, α β γ δ ε etc.

I think this keyboard layout is pretty good, but I can't say so for sure until I have tested it out with a large amount of typing in the primary support languages. So that is why I have so many languages as the focus for this log. I'm saying "practice" rather than "learn" because it would be a ridiculous task to get this many unrelated languages up to a high working level. I am only aiming to familiarize myself with the ones I don't know as well, and practice typing in the ones I know better.

I've had low energy for language learning lately, so my best bet for making progress is with my classic beginner resource combo of Pimsleur, Duolingo, and Assimil, which don't require much creative thought. As it happens, all of the languages I'm focusing on for my keyboard layout are available in all three of these resources, except for Javanese. I do have Pronunciator Javanese available through my library, though, and thanks to Speakeasy's recent Javanese Resources thread I can see that there are also some good books available.

Most of the Assimil books for these languages are not available in English, but my limited French is enough for me to use the French editions. I have gone through a bit of Assimil Indonesian, Italian, and Yiddish in French, and there is the occasional word that I don't understand but I can usually figure it out from context.

I am starting with Turkish. I have been working on that for the past week or so. I haven't received my copy of Assimil Le turc yet, but I have been doing Duolingo and I just got Pimsleur from the library.

Unfortunately, the Pimsleur courses for Turkish, Vietnamese, Indonesian, and Swahili only have one unit of 30 lessons each, which will each take me six weeks to get through. I might move on to the next language once I've finished each Pimsleur course, without trying to complete Duolingo or Assimil. Or I might try to do FSI French in between pimsleur sessions if I have the energy to finish FSI French Phonology at home and then memorize the dialogues for FSI Basic French.

I think Vietnamese will be next. It will probably be the most challenging of these languages but luckily the FSI course has a nice pronunciation section I can do in the car. The FSI course is for Southern Vietnamese, and the other resources are all for Northern Vietnamese, but I think it should still help a lot.

I will create a series of spinoff keyboard layouts for languages that only get secondary support from my main keyboard layout, and I think I found a very cool solution for my Vietnamese layout. I did need to make some major changes, but it is still close enough to the main layout that it is easy to learn if you know the other, and it is not too tough to switch back and forth.

I think my solution is a big improvement over the standard QWERTY-based Vietnamese layout, where you need to move your hands all over the keyboard in order to type tones and the special letters, while letters that aren't even used in Vietnamese take up prime real estate.

This is what my Vietnamese layout currently looks like:
2020-03-05 DeinViet.png
You do not have the required permissions to view the files attached to this post.
Last edited by Deinonysus on Sat Nov 20, 2021 6:41 pm, edited 2 times in total.
3 x
/daɪ.nə.ˈnaɪ.səs/

User avatar
Deinonysus
Brown Belt
Posts: 1221
Joined: Tue Sep 13, 2016 6:06 pm
Location: MA, USA
Languages:  
• Native: English
• Advanced: French
• Intermediate: German,
   Spanish, Hebrew
• Beginner: Italian,
   Arabic
x 4635

Re: Practicing Turkish, Vietnamese, Indonesian, Swahili, Javanese, & PFIGS to test my improved multilingual Keyboard lay

Postby Deinonysus » Fri Mar 06, 2020 6:12 pm

Turkish is going well. I'm trying to take it slow but making good progress with Pimsleur and Duolingo.

Turkish, Vietnamese, Indonesian, and Swahili are the main languages that I want to focus on for this project. With PFIGS I will stick to Duolingo, except that I'll keep trying to get started with FSI French. With Javanese I'm not sure whether I want to get a book or just do a bit of Pronunciator.

Those four main languages each only have a one-level Pimsleur course available. It takes me about six weeks to get through a Pimsleur level, so I'm looking at 24 weeks to complete all four. Let's call it 26 weeks because I want to do FSI's Vietnamese phonology drills before I start Pimsleur Vietnamese. That will also give me some good study time with Assimil and, more importantly, typing practice with Duolingo. And then after I am done with those introductions, I can decide whether I want to keep going with any of them, or focus on French, Hebrew, or even Russian.

Another way I could play it is that once I finish Pimsleur in a language, I keep going with Assimil and Duolingo and do FSI French in the car (assuming I have the energy for it). This would certainly get me to a higher level, but it would keep me tied up for two years, which is much more than I want to commit to (especially considering that I tend to start new projects fairly often). So I'm leaning towards the first plan.

I think it will make sense to study Indonesian after Turkish. I already have some experience with it, and I already own all of the materials so I won't need to spend any money on it. After that, I think it will make sense to do Swahili, and then Vietnamese last, because it will be the most challenging.
2 x
/daɪ.nə.ˈnaɪ.səs/

User avatar
Deinonysus
Brown Belt
Posts: 1221
Joined: Tue Sep 13, 2016 6:06 pm
Location: MA, USA
Languages:  
• Native: English
• Advanced: French
• Intermediate: German,
   Spanish, Hebrew
• Beginner: Italian,
   Arabic
x 4635

Re: Practicing Turkish, Vietnamese, Indonesian, Swahili, Javanese, & PFIGS to test my improved multilingual Keyboard lay

Postby Deinonysus » Sat Mar 07, 2020 7:22 pm

I must say, Turkish is an absolutely delightful language. I haven't spent much time on it but it's already one of my favorites.

I think the vowel system is very elegant and I love the symmetry, especially for the closed vowels which can be not only front or back, but rounded or unrounded. I also love how Latin letters for these sounds are featural (I assume this was intentional but either way I love it). The front vowels i and ü both have dots, and the back vowels ı and u do not. And the rounded vowels ü and u have a curved shape, while the unrounded vowels i and ı are just straight lines.

The vowel harmony system is also quite elegant. I was intimidated by it at first, but then I realize that there are only two rules that are very easy to remember if you're familiar with the three variables of height, backness, and rounding, which are visualized in the IPA vowel chart:

Image
The three Turkish vowel letters that don't match their IPA symbol are:
  • /y/ is written as <ü>
  • /œ/ is written as <ö>
  • /ɯ/ is written as <ı>
It might not be immediately clear from the chart, but ignore /a/ for a second. The left half of the chart contains the front vowels (unrounded on the left, rounded on the right), and the right half contains the back vowels (again, unrounded on the left, rounded on the right); /a/ is a bit out of position, but of the two unrounded low vowels it is the furthest back, so you can pretend that it goes to the right of œ and right under ɯ).

There are two kinds of vowel harmony, two-way (where the vowel can be e or a, as in the plural suffix) or four way (where the vowel can be any of the four symmetrical closed vowels, as in the definite accusative suffix). For each kind of harmony, there is only one rule to remember.
  • Four-way: All vowels are raised to the closed vowel with the same roundness and backness.
  • Two-way: All vowels are unrounded, and then lowered to /e/ or /a/ depending on their backness.
The harmony is based on the last vowel of the root. So for example, take the words gün (day) and balık (fish). Let's try to add a plural suffix, which uses two-way harmony. We might want to repeat the same vowel and make the plurals günlür and balıklır, but since this is two-way harmony these need to be lowered and unrounded, so the front vowel ü is lowered and unrounded and we get günler; the back vowel is already unrounded, so we simply lower it and get balıklar.

And now instead let's add the definite accusative ending. This ending uses four-way harmony, so we raise the vowel. In this case the vowels are already closed, so you keep them the same. Gün becomes günü, and balık becomes balıkı. But look at the words domates (tomato) and adam (man, a loanword from Arabic); these vowels are low so the accusative can't be domatese or adama, you need to raise them. E is a front unrounded vowel so it raises to i, and the accusative is domatesi. A is a back unrounded vowel so it raises to ı, and the accusative is adamı.

Turkish only marks the accusative on definite nouns. So, if you say "I eat the lemon", the word for lemon (limon) is marked with the accusative suffix: "limonu yerim". But if you just say "I eat a lemon," that is "limon yerim". This is interesting to me because Hebrew behaves exactly the same way:
"I eat the lemon":
אני אוכל את הלימון.‏ /ani oxel et halimon/.

But, "I eat a lemon":
אני אוכל לימון.‏ /ani oxel limon/

The particle את (et) only shows up with "the lemon", not "a lemon". That is an interesting parallel between those two unrelated languages.
5 x
/daɪ.nə.ˈnaɪ.səs/

User avatar
Deinonysus
Brown Belt
Posts: 1221
Joined: Tue Sep 13, 2016 6:06 pm
Location: MA, USA
Languages:  
• Native: English
• Advanced: French
• Intermediate: German,
   Spanish, Hebrew
• Beginner: Italian,
   Arabic
x 4635

Re: Practicing Turkish, Vietnamese, Indonesian, Swahili, Javanese, & PFIGS to test my improved multilingual Keyboard lay

Postby Deinonysus » Wed Mar 11, 2020 5:46 pm

The last couple of days were pretty crazy for my keyboard layout. I was about to post my updated version, but I was reading up on layout optimization and I saw that one common criterion was keeping common keys out of the bottom row. I tried a new configuration that was completely different from any previous version, and after some testing I realized that it was worse after the changes. The tradeoffs of taking b and c out of the bottom row were not worth it and the layout is much better with them in.

However, the exercise did give me some good ideas, so I moved a bunch of keys around in the right hand, and I think I fixed a couple of suboptimal things, like a two row jump with the same finger to type "my". But moving m from its QWERTY spot will take me a while to get used to. I adjusted to the other changes fairly quickly.

I also reanalyzed the letter frequencies based on the ten languages that I'm trying to focus my support on. I don't think that changed the letter priority much; the main difference is that since Turkish is now weighted the same as all of the other languages (except for English, which is weighted higher), the frequency of the special characters used in Turkish went way up, since Turkish uses its special characters much more than the other languages. Now, five of the top six special characters are used in Turkish (ı, ü, é, ş, ç, and ö in order). I ended up swapping out the ö key for a ç key. Although ı and ş are used more based on the pure average, they are only used in Turkish while ç is also used in French and Portuguese. Since this is supposed to be a multilingual layout, I thought it made sense to give ç the key even though two other Turkish letters beat it in raw numbers.

Ü does beat é in raw numbers, but I am keeping é as the most accessible dedicated key for a special character. Of the three, it is the most likely to be used in English because of the large number of French loanwords that we use, such as résumé, entrée, and fiancé(e). I think it would be much harder to convince anglophones of this layout's merits if it were ü in that spot instead.

Here is what the layout looks like now, with the changed letters circled:
You do not have the required permissions to view the files attached to this post.
1 x
/daɪ.nə.ˈnaɪ.səs/

User avatar
Deinonysus
Brown Belt
Posts: 1221
Joined: Tue Sep 13, 2016 6:06 pm
Location: MA, USA
Languages:  
• Native: English
• Advanced: French
• Intermediate: German,
   Spanish, Hebrew
• Beginner: Italian,
   Arabic
x 4635

Re: Practicing Turkish, Vietnamese, Indonesian, Swahili, Javanese, & PFIGS to test my improved multilingual Keyboard lay

Postby Deinonysus » Fri Mar 13, 2020 3:39 pm

It's been another crazy couple of days for the layout! I had heard of the Workman layout before but I had never investigated it before. I looked into it and I think it has a brilliant way of ranking the keys based on where the fingers prefer to stretch, if at all. In particular, the middle and ring fingers want to stretch up, but the index fingers prefer to curl down. Here are the rankings:
Image

I reassigned a bunch of keys this morning based on the rankings (ignoring the 1.5 rating for the pinky home keys) and everything seemed to fall into place! Here is my latest layout with the numbers I used to generate it.
2020-03-13 DeinonML.png

I think this layout feels much smoother than before, but I can't be completely sure because it's so radically different from QUFLXÜ that it's taking me a while to learn it. In fact, the only keys that haven't changed are z, ü, the eight home keys, and most of the punctuation!

I have been using keybr.com to learn the new layout. It's slow going but I'm making good progress.

My actual language learning has been on hold between all the work I've been doing on my layout and all the pandemic craziness, but my copy of Assimil Le turc came yesterday and I'm hoping to get started on it tonight.
You do not have the required permissions to view the files attached to this post.
1 x
/daɪ.nə.ˈnaɪ.səs/

crush
Blue Belt
Posts: 514
Joined: Mon Nov 30, 2015 3:35 pm
Languages: EN (N), ES, ZH
Maintain: EUS, YUE, JP, HAW
Study: TGL, SV
On Hold: RU
x 953

Re: Practicing Turkish, Vietnamese, Indonesian, Swahili, Javanese, & PFIGS to test my improved multilingual Keyboard lay

Postby crush » Sat Mar 14, 2020 4:32 pm

I've been using a slightly modified Colemak layout (with extra layers for numbers/symbols) on a custom keyboard so i could limit pinky usage as modifier keys were causing a lot of stress in my pinky/hand and moving modifier keys to the thumb made a huge difference. Is there any preference given towards inward rolls? That's one of the things that i enjoy the most about Colemak (in English). Are you aiming to optimize typing for these ten languages or just make it "possible" from within your layout?
1 x

User avatar
Deinonysus
Brown Belt
Posts: 1221
Joined: Tue Sep 13, 2016 6:06 pm
Location: MA, USA
Languages:  
• Native: English
• Advanced: French
• Intermediate: German,
   Spanish, Hebrew
• Beginner: Italian,
   Arabic
x 4635

Re: Practicing Turkish, Vietnamese, Indonesian, Swahili, Javanese, & PFIGS to test my improved multilingual Keyboard lay

Postby Deinonysus » Sun Mar 15, 2020 2:36 am

crush wrote:I've been using a slightly modified Colemak layout (with extra layers for numbers/symbols) on a custom keyboard so i could limit pinky usage as modifier keys were causing a lot of stress in my pinky/hand and moving modifier keys to the thumb made a huge difference. Is there any preference given towards inward rolls? That's one of the things that i enjoy the most about Colemak (in English). Are you aiming to optimize typing for these ten languages or just make it "possible" from within your layout?

I have a huge preference for inward rolls in my layout, maybe to its detriment. I just discovered a layout analyzer from the creator of the DH mod version of Colemak and as it turns out, the version of the layout that I posted yesterday has a very high same-finger bigram rate (which is something to avoid because it's uncomfortable to type consecutive different letters with the same finger). The rate is arount 5%, which is much higher than Colemak's rate which is only 1.5%, and also worse than Dvorak (2.5%) and Workman (3%). It's almost as bad as QWERTY, which has around a 6.5% rate.

But, many of the most common letter combinations are very, very smooth in my layout, so maybe the tradeoff is worth it. I played around with a few letter swaps (b-y, s-n, e-a, and d-u) and I got the same-finger bigram rate below 4% with this layout:

Code: Select all

j l c h f ü y u k q
s n t r g p i e a o
z v b m x w d , . /

That's still higher than most alternative layouts but a bit less bad. However, it takes away the nice inward rolls for the combinations "ea" and "eau". It does take care of 'a' and 'd' being on the same finger, which is troublesome, but this is at the expense of putting 'e' and 'u' on the same finger, which is pretty rough for French, Spanish, and German.

Skipping the d-u and e-a swaps raises the bigram rate back up to around 4.3% for English, but that version does have a lower bigram rate for French, German, and Spanish and it feels smoother, so I think that's the version I like for now:

Code: Select all

j l c h f ü y d k q
s n t r g p i a e o
z v b m x w u , . /

My layout optimized for the ten languages in three ways. First, the best spots are assigned based on the average frequency for all ten languages. Second, every common letter in these languages has a direct AltGr shortcut with no need for deadkeys (so you almost never need more than two keystrokes to type any lowercase letter in these languages). And third, I considered these languages while I was thinking about which bigrams and trigrams I wanted to work well. For instance, 'j' couldn't be on the same finger as an apostrophe (which would make French typing a nightmare), and 'm' and 'b' can't be on the same finger because that is a common combination in Swahili. But it is not actually machine optimized.

I am considering creating a truly optimized keyboard layout with carpalx, using training texts from all ten languages. Unfortunately, carpalx doesn't have a way of evaluating AltGr keys, which are very important in every iteration of my layout. And also, I need to set up a Linux VM to get it to work because I can't get carpalx to work on Windows. But I'll see where that gets me. A computer generated layout may be missing the human touch and may not take stylistic things like inward rolls into account. But it would at least give me some good ideas.

Edit: I found two other letter swaps that drop my English same-finger bigram rate down to 3.7% and also helps my bigram rate in German and (very slightly) in French, although Spanish is slightly worse.

Here is the revised layout:

Code: Select all

f l c h j ü y d k q
s n t r b p i a e o
z v g m x w u , . /


And here are the comparative same-finger bigram rates:

ENFRDEESMEAN
Deinonysus3.72%3.44%4.69%4.68%4.13%
Colemak1.52%3.60%4.98%3.25%3.34%
Dvorak2.54%3.57%3.49%2.48%3.02%
Workman2.97%4.41%4.15%4.66%4.05%
2 x
/daɪ.nə.ˈnaɪ.səs/

crush
Blue Belt
Posts: 514
Joined: Mon Nov 30, 2015 3:35 pm
Languages: EN (N), ES, ZH
Maintain: EUS, YUE, JP, HAW
Study: TGL, SV
On Hold: RU
x 953

Re: Practicing Turkish, Vietnamese, Indonesian, Swahili, Javanese, & PFIGS to test my improved multilingual Keyboard lay

Postby crush » Sun Mar 15, 2020 11:30 am

As for machine-optimized layouts, have you looked into the BEAKL layout? I've been considering trying that out for a while but again i believe it's optimized for English. The main hindrance has been me being too lazy to program it onto my keyboard (changed shifted punctuation makes it a bit of a pain) and my keyboard doesn't have as many keys (though there are more keys available to the thumbs).

It would also be interesting to see layouts optimized either for specific languages or language groups (e.g. Romance languages, Germanic/Scandinavian languages, etc.).
1 x

User avatar
Deinonysus
Brown Belt
Posts: 1221
Joined: Tue Sep 13, 2016 6:06 pm
Location: MA, USA
Languages:  
• Native: English
• Advanced: French
• Intermediate: German,
   Spanish, Hebrew
• Beginner: Italian,
   Arabic
x 4635

Re: Practicing Turkish, Vietnamese, Indonesian, Swahili, Javanese, & PFIGS to test my improved multilingual Keyboard lay

Postby Deinonysus » Mon Mar 16, 2020 8:54 pm

I spun up an Ubuntu VM and got carpalx to work, but my first impression wasn't stellar. I tried running the very basic optimization from the tutorial on my layout and it just messed around with all the keys without really improving the score. I'm not confident that I can use it to improve my layout but I think it is worth investigating further. Maybe a more advanced user can get more out of it.

My carpalx total effort score was 2.068, which is not great but not terrible. It's worse than Colemak (1.842, which is the best score of any layout that wasn't specifically optimized for carpalx), but it beats Dvorak (2.098) by a very narrow margin, and loses to Workman (1.993) by a half the margin by which Workman loses to Colemak. Of course, all of the alternatives beat QWERTY which has a score of 3.000.

I ended up reverting the b-g swap. Although it brings my English same-finger bigram rate back up to 4% and also raises my 4-language average by a very small amount (0.03%), the swap had put m and b on the same finger, and as I had mentioned before, that's really bad for Swahili. So my current layout (first 10 columns) is:

Code: Select all

f l c h j ü y d k q
s n t r g p i a e o
z v b m x w u , . /


The s-n swap is taking me a very long time to get used to. It may take me several more days at least to stop mixing them up.

crush wrote:As for machine-optimized layouts, have you looked into the BEAKL layout? I've been considering trying that out for a while but again i believe it's optimized for English. The main hindrance has been me being too lazy to program it onto my keyboard (changed shifted punctuation makes it a bit of a pain) and my keyboard doesn't have as many keys (though there are more keys available to the thumbs).

It would also be interesting to see layouts optimized either for specific languages or language groups (e.g. Romance languages, Germanic/Scandinavian languages, etc.).

Cool, I wasn't aware of it. It heavily penalizes the home pinky key, which I don't know if I agree with. Most models avoid pinky stretches, but I think that it's better to use the pinky home key than to stretch another finger. Unless minimizing pinky typing is important to you I think Colemak is better by nearly every metric.
2 x
/daɪ.nə.ˈnaɪ.səs/

User avatar
Deinonysus
Brown Belt
Posts: 1221
Joined: Tue Sep 13, 2016 6:06 pm
Location: MA, USA
Languages:  
• Native: English
• Advanced: French
• Intermediate: German,
   Spanish, Hebrew
• Beginner: Italian,
   Arabic
x 4635

Re: Learning Castilian Spanish and working on my multilingual keyboard layout

Postby Deinonysus » Fri May 01, 2020 9:00 pm

TL;DR: Quarantine derailed my Turkish learning. I spent most of my time machine-optimizing and then relearning my keyboard layout. Over the past week I've gotten into Castilian Spanish.

Español
I decided that I might have an easier time mastering French if I can achieve a good level of Spanish, because several grammatical features that are vestigial in French (only used in literature) are fully used in Spanish, particularly the preterite (simple past) tense. They seem fairly similar: compare the French preterite of "to be" (être: fus, fus, fut, fûmes, fûtes, furent) with Spanish (ser: fui, fuiste, fue, fuimos, fuisteis, fueron).

Spanish also should be easier than French because it is more regular, and listening comprehension in particular should be much easier because Spanish has a much smaller phonemic inventory, doesn't have French's large number of homophones due to consonant deletion and vowel mergers, and has a stress accent which should make it easier to parse word boundaries in the spoken language. Also, Spanish has the most phonemic writing system of the Romance languages, while French has the least (although at least it's not as bad as English). Due to these factors, I'm expecting to have a much smoother ride with Spanish, possibly comparable to the drop I experienced switching from German to French.

I am starting with European Spanish, not just because I like the way it sounds but also I think it would be easier to switch from European to Latin American than the reverse, because you can just drop the distinción and second person plural conjugation rather than having to learn new grammar rules and unlearn a sound merger.

Resources:
  • Pimsleur: This is my go-to. Unfortunately, Pimsleur only has two levels of European Spanish, compared to five levels of Latin American Spanish. I generally do Pimsleur lessons on my commute so it was hard for me to get back into a routine now that my commute is gone, but I've been successfully doing a lesson almost every day while I do my household chores.
  • Assimil: I had previously purchased the Spanish course in Italian, because I was planning on learning Italian first, but that didn't pan out. Fortunately the English version uses the exact same dialogues, I ordered the English book and it just came in the mail, so I'll start it tonight.
  • Babbel: Since Duolingo is only available for Latin American Spanish, I thought I'd try Babbel. It seems quite good so far, but not as easy to drill as Duolingo. It seems more like an app version of Assimil than a direct Duolingo clone. I think I prefer Duolingo's less structured format that's basically just drills with some optional notes, but I'll keep going with Babbel and see how it goes. My progress has been much less consistent that with Pimsleur and I have only been doing a couple of lessons a week.
  • Teach Yourself: I haven't started it yet but it should be useful.
  • FSI Head Start: The basic course seems to be based on Latin American Spanish, but there is a shorter course for Castilian Spanish. I probably won't get into it until I have finished some of my other resources.

Keyboard layout

It's been over a month since the last time I posted in this log. I was lamenting that my layout wasn't machine optimized and didn't know where I would ever find an optimization tool I liked when I realized: hey, I know some Python! Why can't I just machine optimize on my own! It doesn't have to be a complicated app like Carpalx which is made up of dozens of files and requires a tutorial. It can be something simple.

So I ended up with a small Python class (just under 100 lines) that allow me to optimize groups of up to eight keys at a time using brute force rather than machine learning. I am able to look at only 6-8 keys at once because I am strictly keeping the keys in four different groups, based on frequency. The groups correspond to the Workman rankings (see a previous post for the chart):
  1. Workman ranks 1 and 1.5 (but keeping the most frequent keys out of the rank 1.5 spots)
  2. Workman rank 2
  3. Workman rank 3
  4. Workman ranks 4 & 5 (but keeping the most frequent keys out of the rank 5 spots)
I might theoretically be able to do a bit more than eight keys at a time, but that already takes me about an hour to run (vs a minute or two for six keys). If I squish I have three different criteria that I measure:
  • Same-finger bigrams
  • Outward roll bigrams
  • Double-row jumps with the same finger, with two adjacent non-index fingers, or from the middle finger bottom row up to the index finger top row (the other direction is fine).
I gave myself the option of using one of two frequency sets for the rankings: one was the set based on all ten languages that I posted earlier, and the other is based only on English, French, Spanish, and German. I ended up with two candidate layouts: a "balanced" layout that wasn't great in any stat but also wasn't bad in any, and a "dump stat" layout with amazing stats except for the outward rolls which are abysmal. I ended up preferring the balanced layout and I learned it up to a speed of around 50 WPM.

I think I would get much better results if I got into machine learning, but before I go down that rabbit hole I want to create one more brute-force layout with some changes to my formula. As I was learning the new layout I realized that not all same-finger bigrams are equally bad. Same-finger bigrams in for the index finger that are in different columns are very easy to type with alternate fingering: you just move your hand inward and type it with your index and ring fingers. So in my next version those bigrams will only get half the penalty.

Another change is that I want to penalize trigrams on the same hand that switch direction. So for example, I have S, T, and R on the left hand and they are an inward roll. I think an outward roll would also be fine. But if you typed them with your pinky, index, and then middle fingers, that would be very awkward so it should be penalized.

Another change is that I want to try optimizing for just English, French, and Spanish, not German. They are the three official UN languages that use the Latin alphabet, and over half of the world's countries has one of them as an official or majority language. Since the three of them share a massive amount of vocabulary, I think I will get better results if I leave out German. Once I do get around to optimizing all ten languages, it will be very useful to see how much is sacrificed by adding in a bunch of completely unrelated languages. If it isn't much worse then it would be great to be able to optimize for all ten languages, but if it doesn't end up being better than a completely unoptimized layout, it could make sense to only optimize for English, Spanish, and French.
3 x
/daɪ.nə.ˈnaɪ.səs/


Return to “Language logs”

Who is online

Users browsing this forum: No registered users and 3 guests