Page 2 of 2

Re: Opinions of Readlang

Posted: Wed Sep 30, 2020 1:37 am
by mcthulhu
Gordafarin2, I think you may mean lemmatization, to get back to the dictionary headword, rather than stemming, which may not even result in an actual word when suffixes are stripped off. Stemming is easier to do but there are a lot of packages that provide lemmatization these days, so it's not all that pie-in-the-sky. I've been using TreeTagger, which supports a lot of languages, to convert inflected words to their lemmas before doing dictionary lookups. It doesn't do a perfect job and misses now and then, but it reportedly has around 95% accuracy and is much better than typing in headwords manually. TreeTagger doesn't support Persian, unfortunately... But other packages like CST's lemmatizer do support Persian, and Hazm, produced in Iran, definitely does. I'll be working on incorporating them as well. Anyway, it's fairly easy to provide this kind of de-inflection support, and there are a lot of options available.

Of course, this discussion is all academic now, since development of Readlang stopped years ago.

Regarding your last point, does allow you to search on reading or listening passages by level of difficulty, though in terms of ILR levels, rather than the user's estimated vocabulary size.