I posted
some thoughts on AI and language learning in the original thread.
emk wrote:Now, in practical terms, I can use GPT-3.5-Turbo to translate 22 minutes of easy television in 3 minutes of server time. As far as anyone can figure out, the early, unoptimized versions of GPT-3.5 ran on about US$80,000 worth of hardware, but they've reduced that with the GPT-3.5-Turbo models. Translating an episode costs me about $0.03. The speech-to-text, which is also pretty good, costs me about $0.15/episode using Whisper-1. Again, not flawless, but more than good enough for my purposes.
I'm pretty sure I can also get GPT to handle requests of the form, "Explain what the words '...' mean in this sentence, and give me two examples of how to use them." Although in this case, I might need to cough up the money for GPT-4-Turbo. (Migaku actually has this working surprisingly well in their flash card creator—about 50% better than I would expect GPT-3.5-Turbo to do without very clear instructions.)
For actual practical demos (and some examples of where the tech fails), see
my "How not to learn Spanish" log. This gives me better subtitles than the average French DVD publisher provides (not hard), and perfectly reasonable translations (certainly better than Google Translate).
Cainntear wrote:Deep Blue wasn't doing anything new -- it was just a computer with enough memory and processing speed to work through a number of probable states that were scored by how likely each player was to win.
Deep Blue is ancient history at this point. It's like comparing the Wright brothers' airplane to a modern air-superiority fighter. They both fly, and they're both heavier than air, but trying to use one to draw inferences about the other is likely to mislead more than it helps.
One modern system for deterministic board games is Alpha Go Zero. Go can't be tackled with brute-force search like chess, because the board is too big and the pieces all have equal weight. But Alpha Go Zero taught itself the game from scratch,
with no examples of human play as input. Within three days, it could beat the world champion.
But Go is still too easy, because players can see the whole board. Current research focuses on strategy games where players have imperfect information.
LLMs (Large Language Models) like GPT are a weird offshoot. They are essentially trained to be "improv actors", able to play different characters and write in different styles. But someone told them, "Hey, I want you to play the role a helpful assistant who follows instructions." (And gave it plenty of examples.) And suddenly the model started beating state-of-the-art performance on a wide variety of tasks, ones that previously needed specialized tools. This was a pretty shocking development.
The free GPT-3.5 is pretty easy to break, and when it breaks, it falls back on sheer improv. GPT-4 is noticeably more robust, and it does much better on college level exams. But even GPT-4 cannot reliably plan and execute multi-step tasks. And it has no memory and no real internal monologue. And it can't learn new tricks by interacting with the world, because the underlying model is read-only. It learned about the world by reading books and looking at photos. Frankly, given these limitations, it performs pretty well.
Here is a lashed-up mix of several different AI models trying to interact with the world and follow instructions from a human. This is about as good as these models get if they need to plan and interact with reality, and I'm sure that this video is the best run of 10 (at least).
But all the limitations are being worked on. Thousands of very smart academics have just overhauled their research programs, we're seeing ludicrous investments in specialized chips, and an entire industry of well-funded companies are trying to catch up to OpenAI.
Cainntear wrote:And this is why AI is very, very dangerous.
I do not truly fear an AI that makes lots of easily-spotted mistakes. Instead, I fear the first AI that
doesn't. If we're talking
Dune, I have a lot of sympathy for the Butlerians.
We can use the existing tools of politics and government to deal with unreliable AIs, if we get our act together. We are not ready, however, to deal with an AI that could flawlessly perform complex tasks and carry out goals. I would strongly prefer we not build one until we've gotten a lot wiser, and carefully thought through the consequences.
But as subtitles go AI, the worst affected people will be the deaf and partially deaf, because transcriptions are a poor alternative to carefully crafted subtitles.
Honestly, as a student of French, I don't buy this argument. When I was learning to listen to French, the majority of French DVDs had no subtitles at all. And when I did get subtitles, they normally had a very loose relationship to the spoken dialog. As recently as 12 years ago, few French publishers cared at all.
My Whisper-1 results with intermediate Spanish TV are producing
much more accurate subtitles than those all but 4 of the episodes originally came with. I'll take a few errors here and there over hand-crafted subtitles that don't match the audio at all.
It's not like I was going to hire translators just to produce subs. Especially if I wasn't even allowed to share them with other people.
And if I were deaf, I'd be trying to get Whisper to transcribe real-life lectures, and display them on the inside of my glasses. I'd trade in France's mediocre attempts at subtitles for semi-reliable real-time transcription in a heart beat, I'm pretty sure.
Exactly. AI is a tool that needs a skilled operator, because the operator needs to at least know enough to be able to identify when the computer is wrong. This is, perhaps surprisingly, a problem that gets worse as AIs get better. People will become increasingly confident in the AI's output and will become less and less capable of analysing it.
This observation is exactly correct, at least until the output gets good enough that nobody cares about the errors.
In the stuff that I am doing, I am largely working around this by focusing on approaches where an observant learner will be able to identify and ignore most of the errors, and where the sheer volume of correct examples will outweigh any mistakes. I'm building command-line tools for language-learning hobbyists doing extensive watching. Not selling courses to schools.
I've watched a number of programmers using GitHub CoPilot, an AI coding assistant. It's interesting how the results are affected by skill level:
- People who can barely code at all can actually now glue together dodgy programs that enable them to automate things. I actually see this as a win. It's buggy but empowering.
- Junior programmers can sometimes get lost in a maze of subtly broken code, when perhaps they could be learning to be precise and accurate instead.
- Skilled programmers write a short comment, then they wait half a second for CoPilot to implement the function, and then they spot most errors at a glance. Those errors get pruned and CoPilot gets asked to generate something better. At full speed, it's impressive. But the bottleneck is proofreading and designing automated QA.
If I could communicate one idea: This stuff is very real, if not especially reliable at the moment. And it's going to get better. If we're clever, we can do some neat tricks with it right now. But we need to extrapolate ahead 10-20 years and really start thinking about the larger issues. These are conversations that the broad public should get some informed say in, not just a few billionaire tech execs.