arthaey wrote:aokoye wrote:I'm also wondering what corpora are being used by people who aren't working within academia but who are making frequency lists.
This thread has made me think about a corpus based on tweets. Since you could search by "lang:ru" (or whichever L2), it wouldn't even be that difficult.
Whether it would be useful remains to be seen.
You're right, it is not that difficult. I got a nice list, and the first on the list was a translation of a quote from comic Jim Carrey. Not knowing Russian, not much use to me at all, but with lang:fr (French), another nice list and comprehensible, too. The lack of continuity from one Tweet to the next disconcerts me, however.
No swear words, however, drat the luck!
Didn't emk start a thread on the use of Tweets on HTLAL? I'm going to take a gander and will report back if I find something.
Edit: emk did a post about Twitter on this forum: https://forum.language-learners.org/vie ... emk#p13689