lavengro wrote:Putting aside what appears to be a gathering consensus that we should arm ourselves with pitchforks and chase this scoundrel Wyner out of the village (and certainly no point in waiting to do so until beta testing is available),
Don't mind me -- I'm just angry at myself that I'm not willing to ask for squillions of dollars for a very simple idea masquerading as something revolutionary.
I was surprised to hear these comments about the status of text-to-speech quality. I wind up using quite a bit of text-to-speech for learning and while it appears to have very significantly improved recently, I would have thought most learners would still be in favour of native speaker recording over a Hal descendant given a choice.
Well there are several points here.
First up, one of the points Wyner pushes hard on his website, in his book and even in
the pitch video for this software is how important it is to build your own deck. But then he sells you premade decks... hardly consistent with his message. The biggest bonus of the premade deck is the accompanying audio -- very, very few Anki users go to the lengths of getting a native speaker to record all their cards for them. Using text-to-speech makes it possible to have a fully customisable and personalisable deck
and have audio for every card -- you really can have your cake and eat it.
Secondly, while native speakers are good
in theory, quite often what you get out of them in the studio is kind of unnatural. As a teacher, I've worked with various recordings that are utterly painful to listen to, because the actors hired to do the recordings tend to distort the language with an aim to make it clearer and easier to understand -- it becomes easier to understand by being less like natural language. For example, the weak schwa sound in English is often replaced with a clear vowel (in my head, I'm hearing the word ex-cit-ing being enthusiastically overpronounced by a drama school graduate as I type). In fact, the first time I was in the studio recording for a language course, I was being actively directed to overpronounce (my voice appears in two English courses).
Text-to-speech is designed to be easy to understand
for native speakers, often in sub-optimal conditions (e.g. using a voice menu across a bad line or interacting with an app while sitting on a busy train). This means that they exaggerate the key characteristics of the phonemes -- what you might call their "salient features". One of the biggest problems to speaking and listening that learners face is getting stuck in the trap of processing the language through the sound system of their native language. I remember reading about a study claiming that exaggerating the salient features of phonemes in input led to learners noticing those salient features and resulted in better comprehension and production of spoken language.
So if you have a choice between a native speaker who exaggerates the
wrong things and a computer that exaggerates the
right things, which one are you going to choose?
In theory, a native speaking voice actor and director who both understand phonology would be the best thing, but there's not that many of either, and you definitely need both.
Plus, again, that restricts you to a list of preselected sentences which is diametrically opposed to Wyner's original philosophy.