Iversen wrote:So AI boxes already write better essays than human pupils,
Under a great many circumstances, yes. That essay really is quite a work of art, in a very specific way. It includes a brief discussion of environmental issues, which is a favorite trick of students who take these exams. The examiners like students who talk about the environment. And detouring to talk about the environment offers the test taker a chance to build "islands" on a predictable subject. So the wise test taker practices making plausible connections between the question asked, and a topic they feel more comfortable talking about.
But the phrase "En adoptant un cadre réglementaire adapté et en impliquant l'ensemble de la société dans le processus décisionnel," is really quite exquisite. You could use it in almost any essay of this sort. And it shows off some lovely academic vocabulary. If you can master a hundred stock phrases like that, you will make your C2 (or
baccalauréat) examiner quite happy.
And the essay has a clear structure
and goes about 10% over the minimum word count. None of the early sections are too long or too short, and the conclusion arrives when it should, referring back to points made earlier. It's not the best possible essay you could write about robots. Rather, it's finely-tuned to guarantee good marks on a generic exam.
These models are trained by quite literally doing thousands of "cloze" exercises. And so their greatest strength is creating a plausible-sounding text in a specified style. But in a final phase of training, the models are also trained to be as helpful and inoffensive as possible. So the models would rather write optimistic, well-socialized essays. If you ask them to write a scene from a
film noir, they will gently balk at all the smoking and drinking. Generated stories will wrap up nicely, with reconciliations and new-found wisdom all around, and with the smallest plausible number of dead bodies. No Montagues or Capulets will be harmed in a street brawl if there's a way to avoid it.
Iversen wrote:and if they come up with blatant errors that's the result of being fed humanmade nonsens and lies (which basically also is something humanmade humans are prone to do).
Oh, no, these models will happily come up with errors of their own. They're extremely gifted at providing
plausible answers, but less skilled at providing correct ones. So they fall back on something called "hallucination", but which you could equally well call "confabulation." GPT 3.5 hallucinates constantly. GPT 4 cuts the hallucination by half, I believe.
This is an advantage of working with a skilled human tutor, rather than using the model as a tutor! The human tutor is unlikely, one would hope, to make up plausible nonsense.
Iversen wrote:We say that the boxes don't understand what they utter,
Some researchers recently did
a remarkable experiment where they trained a much smaller GPT model on written series of Othello moves, recorded in standard notation. And from nothing but those written transcripts, the model actually learned to play. And a very clever examination of the model revealed a set of "neurons" which actually recorded the state of an Othello board. During training, the model had realized that it was playing a game, on a square board. And it inferred the size of the board, the spatial relationships, and the legal moves. (Though not perfectly.) And it learned all of this by filling in "clozes" and guessing the next word in a text. Clozes, as it turns out, are far more powerful than anyone ever imagined.
A far more sophisticated model like GPT 4 can actually "reason". Not always well, and not always consistently. But when a father and son
tried to teach it the son's conlang, it inferred quite a few grammatical rules from a short parallel corpus, and produced a few attempts at translations to and from the conlang. The conlang was linguistically exotic, so nothing in the model's training could have taught it to "parrot" the answers. Rather, the model was actually listening to instructions, and trying to follow them by doing a translation "manually."
The harder one pushes the models to perform genuine reasoning, the more obvious it becomes that they still have significant weaknesses. Certainly, a completely novel task like translating to and from a brand new conlang will produce results that are clearly better than mere "parroting" of the training text (which after all included no examples of the conlang). But the results are still worse than what a diligent and capable human could do. Of course, a
typical human would also do poorly translating a conlang. Humans are also better at regurgitation than genuine reason.
Iversen wrote:Which reminds me about Voltaire's dictum at the end of Candide: "Il faut cultiver son jardin". And now I have got one, and for the time being it eats up half my allotted study time. It looks like a sinister plan..
Well, there was always someone out there who learned languages faster than I could, and someone who wrote wittier essays, and someone with a better grasp of math. There are even programmers who consistently make me realize that I am still the barest novice after decades in the field. I was never bothered that Deep Blue could beat me at chess, any more than I was bothered by the fact that Kasperov could.
If the process is enjoyable in itself, there's no need to be world class. And yes, I could use Google Translate to speak to my in-laws, if they'd sit still for such a barbarity, but I rather prefer being able to do it myself.
But to return to the topic, human tutors have little to fear from ChatGPT. Except, of course, for the ease with which dishonest students can now cheat.
It is certainly possible to get ChatGPT to act as a tutor. But actually getting good results requires quite a bit of cleverness, as well as a keen eye for misleading or incomplete answers. Some students will be able to figure out how to do it on their own, just like some people walk into a gym and teach themselves weightlifting with nothing but books and YouTube videos. And yet, there's still a robust market for personal trainers and coaches to hold people's hand, to provide motivation, and to (very occasionally) provide true expert insight into what's subtly wrong with someone's deadlift. Automated tutoring with current tools will no more replace human tutors than a well-stocked library ever replaced schools and professors. The number of students who would walk into a library and teach themselves was only ever a tiny fraction of the population.
And by the time that these models
can entirely replace a teacher, well, we'll have bigger issues to worry about. Perhaps we ought to take ChatGPT's advice, and
adopter un cadre réglementaire adapté et impliquer l'ensemble de la société dans le processus décisionnel. Preferably
before someone builds a model a lot more clever than this one.