Most of the Scots Wikipedia written poorly by non-speaker

General discussion about learning languages
Cenwalh
Green Belt
Posts: 267
Joined: Thu Mar 28, 2019 9:14 am
Location: UK
Languages: English (N), Spanish (C1), Catalan (B2).
Language Log: https://forum.language-learners.org/vie ... 15&t=12467
x 849

Most of the Scots Wikipedia written poorly by non-speaker

Postby Cenwalh » Wed Aug 26, 2020 8:52 am

I happened upon this post on Reddit yesterday which demonstrates that a huge part of the Scots Wikipedia (>20,000 articles out of 58,000 in total) is written by an American non-speaker of Scots, and real native speakers consider it to be very poorly written. It does not use Scots grammar, non-English cognates, and its spelling is completely off, sometimes using orthography invented by this person with complete disregard for traditional spelling.

It appears to me like an attempt in good faith by someone who didn't realise the consequences of their actions to add information in an endangered language. However, I think it's noteworthy here because of the potential damage it's done. Scots is often viewed with little regard because people consider it to be badly spelt/spoken English, and then they might quote the Scots Wikipedia to prove their point because most of the Wikipedia is just badly spelt English! I myself have been on Scots Wiki pages and remarked at how similar it is to English whereas Scots I've seen elsewhere has been quite distinct.

I thought it might also be worth sharing this tweet where a native speaker of Scots makes a call to other native speakers or people who've genuinely learnt it properly to help clean up the Wikipedia https://twitter.com/Cobradile94/status/ ... 5111943168

Edit: I originally posted this with the title and post saying "non-native speaker". I have changed that to "non-speaker" because the person apparently just doesn't speak Scots at all.
Last edited by Cenwalh on Wed Aug 26, 2020 1:55 pm, edited 1 time in total.
9 x
Double SC films: 200 / 200 (updated 2022-07-28)
Double SC books: 34 / 200 (updated 2022-07-28)

annelions
White Belt
Posts: 42
Joined: Wed Aug 19, 2020 12:31 am
Languages: English (N), Spanish (A1), German (A1), Italian (A1), Croatian (Beginner)
Language Log: https://forum.language-learners.org/vie ... 15&t=15851
x 72

Re: Most of the Scots Wikipedia written poorly by non-native speaker

Postby annelions » Wed Aug 26, 2020 9:26 am

According to that twitter user, it looks like the original writer isn't very happy that his contributions are being called into question. I can't blame him, 20,000 articles is an awful lot of work. Even if each one is only a stub article that took 5 minutes to put together, that represents close to 2000 hours of work.

Doesn't Google Translate sometimes draw on Wikipedia for texts to train the AI in, especially when it's a rarer language that doesn't have many texts available? If that's the case, who knows how much of an issue this poor guy has accidentally caused. I'm sure he meant well and wanted to help, but it seems to have turned out more like a very young child dumping several kilos/pounds of flour into the pot while "helping" their parent to cook. To know that would be absolutely devastating.
3 x
Spanish
: 1 / 17 ModernStates
: 2 / 16 Forge
: 0 / 100 Busuu
Croatian
: 0 / 18 Teach Yourself
: 1 / 38 Mondly
German
: 0 / 100 Assimil
: 1 / 64 Lingodeer

User avatar
devilyoudont
Blue Belt
Posts: 571
Joined: Tue Jun 26, 2018 1:34 am
Location: Philadelphia
Languages: EN (N), EO (C), JA (B), ES (A)
Language Log: https://forum.language-learners.org/vie ... 15&t=16424
x 1829
Contact:

Re: Most of the Scots Wikipedia written poorly by non-native speaker

Postby devilyoudont » Wed Aug 26, 2020 11:48 am

It is the case that some AI uses Wikipedia as a corpus, not sure about Google Translate.

A disturbing realization is that Wikipedias for most languages may have similar systematic issues. r/linguistics indicates that only between 3% and 9% of Wikipedias have a sufficiently large community to prevent the wiki from being dominated by a clique. This thread and some searching turns up other Wikipedias with systemic issues: Cebuano (almost entirely written by a bot) and Croatian (a neo-nazi clique successfully has taken over the administration of this small wiki)
6 x

User avatar
Iversen
Black Belt - 4th Dan
Posts: 4787
Joined: Sun Jul 19, 2015 7:36 pm
Location: Denmark
Languages: Monolingual travels in Danish, English, German, Dutch, Swedish, French, Portuguese, Spanish, Catalan, Italian, Romanian and (part time) Esperanto
Ahem, not yet: Norwegian, Afrikaans, Platt, Scots, Russian, Serbian, Bulgarian, Albanian, Greek, Latin, Irish, Indonesian and a few more...
Language Log: viewtopic.php?f=15&t=1027
x 15040

Re: Most of the Scots Wikipedia written poorly by non-native speaker

Postby Iversen » Wed Aug 26, 2020 1:17 pm

When I saw this my first reaction was to check some of the articles in the Scots WIkipedia and then write something about it in my own thread - probably in as rotten Scots as that of the AmyrillisGardener, but at least I didn't make myself an administrator and write thousands of articles in the Wikipedia for a foreign language, so the actual harm done by me on society will surely be minimal. Not so with a version of Wikipedia - if it weren't for the fact that most articles there are short and not very interesting I might also have been using it as source for study texts. It seems that Laeland Scots hasn't been included among the languages offered by Google Translate - otherwise you might also have seen the errors pop up in the translations offered there.

The sad thing about this is that the Scots themselves don't seem to care much about the written version of their language - if they even try to keep it as a written language. As I wrote in my log I scoured the bookstores of both Stirling and Edinburgh during my last visit in the hope that I might find some books in Scots (nevermind which dialect - just anything in Scots), but apart from one comic book in Scots I found absolutely nothing. I own two paper dictionaries - the minuscule Collings Gem and the Essential Scots - but I bought both in London.

If the natives don't care about having a decent Wikipedia in their own language then less qualified outsiders can take over without any whimper from the locals.
8 x

User avatar
chove
Green Belt
Posts: 374
Joined: Sun Jul 19, 2015 10:42 pm
Location: Scotland
Languages: English (N), Spanish (intermediate), German (intermediate), Polish (some).
Language Log: https://forum.language-learners.org/vie ... =15&t=9355
x 920

Re: Most of the Scots Wikipedia written poorly by non-native speaker

Postby chove » Wed Aug 26, 2020 1:47 pm

Iversen wrote:The sad thing about this is that the Scots themselves don't seem to care much about the written version of their language - if they even try to keep it as a written language. As I wrote in my log I scoured the bookstores of both Stirling and Edinburgh during my last visit in the hope that I might find some books in Scots (nevermind which dialect - just anything in Scots), but apart from one comic book in Scots I found absolutely nothing. I own two paper dictionaries - the minuscule Collings Gem and the Essential Scots - but I bought both in London.

If the natives don't care about having a decent Wikipedia in their own language then less qualified outsiders can take over without any whimper from the locals.


Might be that it's not a prestige dialect -- a lot of Scots think of the way they talk as "bad English" which isn't helped by the relatively niche role Scots (of any sort) plays in the local media and when you're in school there's always a teacher to tell you to "speak properly" ie in Standard English. Which has left a fair few of us Scots unsure if we even speak it any more -- there was a census question a few (10?) years ago of "Do you speak Scots?" and Holyrood had to set up a website where you could read texts in Modern Scots to check if you could understand them. Personally, I'd have said 'no' but I could understand the West of Scotland Scots on the website so I ticked 'yes' for that one. I *had* been thinking "well I can't read Burns very well" but thats miles and centuries away isn't it?

Scots joke time: there was a man in court as a witness in Glasgiw and he said "I was sitting watching TV and there was a chap at the door" and the (English) judge stopped him and said "What was this chap's name?" :lol:
6 x

Cenwalh
Green Belt
Posts: 267
Joined: Thu Mar 28, 2019 9:14 am
Location: UK
Languages: English (N), Spanish (C1), Catalan (B2).
Language Log: https://forum.language-learners.org/vie ... 15&t=12467
x 849

Re: Most of the Scots Wikipedia written poorly by non-speaker

Postby Cenwalh » Wed Aug 26, 2020 2:35 pm

Iversen wrote:The sad thing about this is that the Scots themselves don't seem to care much about the written version of their language - if they even try to keep it as a written language.


I suppose that's because there isn't much tradition for it. There was little education until the 18th-19th centuries meaning most Scots were illiterate in their native tongues, then when Scots became literate, they learnt how to read and write in English. As you say there isn't really much literature out there which would possibly make the author of these Wikipedia articles one of the largest authors of published "Scots" works for hundreds of years. If each article created was the size of one page of a book, that's like 100 books.

A lot of genuine Scots speakers are calling for it to just be deleted which I think is a bit sad because if it was fixed it'd be a great corpus of written Scots. How to fix it on the other hand is quite the challenge.
5 x
Double SC films: 200 / 200 (updated 2022-07-28)
Double SC books: 34 / 200 (updated 2022-07-28)

annelions
White Belt
Posts: 42
Joined: Wed Aug 19, 2020 12:31 am
Languages: English (N), Spanish (A1), German (A1), Italian (A1), Croatian (Beginner)
Language Log: https://forum.language-learners.org/vie ... 15&t=15851
x 72

Re: Most of the Scots Wikipedia written poorly by non-speaker

Postby annelions » Wed Aug 26, 2020 3:31 pm

I hope it gets saved and not deleted. Double-check the research and citations to make sure that they're accurate but (IMO) the research & citations is the hard part. Translation isn't exactly easy, but it should be easier than having to completely start over from scratch.

Either way, it's a monumental undertaking to fix everything.
2 x
Spanish
: 1 / 17 ModernStates
: 2 / 16 Forge
: 0 / 100 Busuu
Croatian
: 0 / 18 Teach Yourself
: 1 / 38 Mondly
German
: 0 / 100 Assimil
: 1 / 64 Lingodeer

Cainntear
Black Belt - 3rd Dan
Posts: 3531
Joined: Thu Jul 30, 2015 11:04 am
Location: Scotland
Languages: English(N)
Advanced: French,Spanish, Scottish Gaelic
Intermediate: Italian, Catalan, Corsican
Basic: Welsh
Dabbling: Polish, Russian etc
x 8806
Contact:

Re: Most of the Scots Wikipedia written poorly by non-speaker

Postby Cainntear » Wed Aug 26, 2020 5:04 pm

Cenwalh wrote:A lot of genuine Scots speakers are calling for it to just be deleted which I think is a bit sad because if it was fixed it'd be a great corpus of written Scots. How to fix it on the other hand is quite the challenge.

All of the articles should be deleted, because it's easier to write something from scratch than to decode and reconstruct something so badly broken. Also, as they were all search-and-replace substitutions from English articles, you'd be working from a bad mangling of an old version rather than the source of the current version.

And fixing rather than deleting also means that you're leaving up loads of bad material until and unless someone gets round to editing it, which means you are deliberately leaving bad material up their indefinitely, which means the AI language programs trained on Wikipedia will still be exposed to it.

It needs to be cleaned out as soon as humanly possible, so that people can trust it and see some value in it.
8 x

guyome
Blue Belt
Posts: 604
Joined: Wed Jan 01, 2020 1:41 pm
Languages: French (N)
x 2436

Re: Most of the Scots Wikipedia written poorly by non-speaker

Postby guyome » Wed Aug 26, 2020 5:14 pm

As someone who's learning endangered/minority languages and has learnt a bit of Scots last year (read a couple of books too), it pains me to see that. These languages (or dialects or whatever) are under enough pressure without having to deal with that kind of bullsh...

I hope it gets deleted soon. It would take far more time and energy to correct something this bad than it would take to write something from scratch.

If you're interested in Scots, have a look at Wilson's Luath Scots Language Learner and https://www.scotslanguage.com/.
3 x

User avatar
Adrianslont
Blue Belt
Posts: 827
Joined: Sun Aug 16, 2015 10:39 am
Location: Australia
Languages: English (N), Learning Indonesian and French
x 1936

Re: Most of the Scots Wikipedia written poorly by non-speaker

Postby Adrianslont » Wed Aug 26, 2020 11:39 pm

I recommend reading the reddit thread linked to in the OP.

The main takeaway for me is that the rogue article writer started when he was 12 and is now 19 years old. Clearly an enthusiastic, likely autistic, kid. I think we should consider that when commenting on this subject.

While this is obviously a complete mess I can’t help but think about the opportunities that something like Wikipedia offers for preservation of languages. It would be great to see institutions that teach or promote Scots harness their learners to build a robust Scots Wikipedia. Off the top of my head, setting assessments (for advanced students) that involve writing articles for the wiki which would then edited by academic staff before posting? I think this would be really motivating for those involved.
5 x


Return to “General Language Discussion”

Who is online

Users browsing this forum: Dragon27, Msparks and 2 guests