Hi everyone,
I'm making a word list from a French book I have in text form. I want to learn the top couple thousand words used in the book that I don't yet know via Anki before trying to read the book. I have a list of unique words already, but some are conjugated. Does anyone know any tools where one can feed in a list of conjugated words in French and get the infinitive forms as the output?
EDIT: I do already know about websites that I can input each word individually, but I'm looking for something that I can upload a list to and receive a list back from rather than doing all that manual work.
Tool to convert conjugated words to the infinitive form
-
- White Belt
- Posts: 34
- Joined: Sun Nov 15, 2020 7:29 pm
- Languages: English (N)
Learning: French (beginner)
Dabbled: Korean
School classes long forgotten: Spanish - x 68
- Adrianslont
- Blue Belt
- Posts: 827
- Joined: Sun Aug 16, 2015 10:39 am
- Location: Australia
- Languages: English (N), Learning Indonesian and French
- x 1936
Re: Tool to convert conjugated words to the infinitive form
I believe you are looking for a lemmatiser. Have a google for French lemmatiser and you might find one.
Sorry, I am not being more helpful but I have never used one and I’m not attracted to this approach to learning - but I noticed no one had chimed in with a specific suggestion.
I’d be interested to hear details about your experience later, though.
Sorry, I am not being more helpful but I have never used one and I’m not attracted to this approach to learning - but I noticed no one had chimed in with a specific suggestion.
I’d be interested to hear details about your experience later, though.
0 x
-
- Green Belt
- Posts: 404
- Joined: Sat Jul 18, 2015 6:21 pm
- Languages: German (N)
- x 806
Re: Tool to convert conjugated words to the infinitive form
mverse wrote:EDIT: I do already know about websites that I can input each word individually, but I'm looking for something that I can upload a list to and receive a list back from rather than doing all that manual work.
I don't think that such a tool exists, but if you have basic Python programming skills or are interested in acquiring them, you might be able to write a custom NLTK 3 script for this task.
There are also a couple of stand-alone Python libraries that you could use, for example:
Pattern (supports Dutch, English, Spanish, German, French and Italian.)
You also might find the French Verb Conjugation Rules library helpful, which comes with a stand-alone Windows conjugation app. (It's in the FrenchVerbWorkshop\FrenchVerbWorkshop\bin\Debug folder.)
There's also a website with inflection lists for German, English, Spanish, French, Italian, Portuguese and Russian that you might be able to use.
For more links, also see the list of natural language processing resources and tools topic.
3 x
-
- White Belt
- Posts: 34
- Joined: Sun Nov 15, 2020 7:29 pm
- Languages: English (N)
Learning: French (beginner)
Dabbled: Korean
School classes long forgotten: Spanish - x 68
Re: Tool to convert conjugated words to the infinitive form
Thank you for the help, Doitsujin and Adrianslont!
For future readers, I also found this library: https://mlconjug3.readthedocs.io/en/latest/.
For future readers, I also found this library: https://mlconjug3.readthedocs.io/en/latest/.
0 x
-
- Orange Belt
- Posts: 242
- Joined: Wed Mar 21, 2018 6:54 pm
- Languages: English, Portuguese, Spanish, Catalan, French, Persian, Arabic, Mandarin, Japanese.
- x 444
Re: Tool to convert conjugated words to the infinitive form
There's http://lexique.org , which you can query online.
and there are some lemmatization dicts here:
https://github.com/michmech/lemmatization-lists
using them in python is trivial:
If you can't get your way around it paste your wordlist at https://pastebin.com/ I can lemmatize it for you.
and there are some lemmatization dicts here:
https://github.com/michmech/lemmatization-lists
using them in python is trivial:
Code: Select all
#encoding: utf8
lemmaDict = {}
with open('lemmatization-es.txt', 'rb') as f:
data = f.read().decode('utf8').replace(u'\r', u'').split(u'\n')
data = [a.split(u'\t') for a in data]
for a in data:
if len(a) >1:
lemmaDict[a[1]] = a[0]
def lemmatize(word):
return lemmaDict.get(word, word + u'*')
def test():
for a in [ u'salió', u'usuarios', u'abofeteéis', u'diferenciando', u'diferenciándola' ]:
print(lemmatize(a))
test()
If you can't get your way around it paste your wordlist at https://pastebin.com/ I can lemmatize it for you.
4 x
-
- White Belt
- Posts: 34
- Joined: Sun Nov 15, 2020 7:29 pm
- Languages: English (N)
Learning: French (beginner)
Dabbled: Korean
School classes long forgotten: Spanish - x 68
Re: Tool to convert conjugated words to the infinitive form
Amazing! Thank you, 白田龍! That is exactly what I was looking for. I've used Python before, so I'm good to go.
0 x
-
- Orange Belt
- Posts: 228
- Joined: Sun Feb 26, 2017 4:01 pm
- Languages: English (native); strong reading skills - Russian, Spanish, French, Italian, German, Serbo-Croatian, Macedonian, Bulgarian, Slovene, Farsi; fair reading skills - Polish, Czech, Dutch, Esperanto, Portuguese; beginner/rusty - Swedish, Norwegian, Danish
- x 590
Re: Tool to convert conjugated words to the infinitive form
I've been using the Stanza Python library from Stanford NLP lately for lemmatization in Jorkens. I'm sending a chapter's worth of words at a time to be lemmatized, and the response time is pretty good. FWIW, Jorkens can generate a lemmatized word frequency list for the current book and save it in .csv format.
There's also TreeTagger, which supports French and can be used as a stand-alone program without Python. It can handle bulk input, though you have to put each word on a separate line. You can also find at least one JavaScript library on GitHub that is for French lemmatization: https://github.com/bastienbot/nlp-js-tools-french, which gave me pretty good results.
Another way to do it is with a finite state transducer (software with rules to convert a set of strings into another set of strings, basically). See https://sourceforge.net/projects/hfst/f ... ansducers/, which includes one for French.
As you can see from the other responses, there are a lot of ways to do this.
There's also TreeTagger, which supports French and can be used as a stand-alone program without Python. It can handle bulk input, though you have to put each word on a separate line. You can also find at least one JavaScript library on GitHub that is for French lemmatization: https://github.com/bastienbot/nlp-js-tools-french, which gave me pretty good results.
Another way to do it is with a finite state transducer (software with rules to convert a set of strings into another set of strings, basically). See https://sourceforge.net/projects/hfst/f ... ansducers/, which includes one for French.
As you can see from the other responses, there are a lot of ways to do this.
3 x
Return to “Practical Questions and Advice”
Who is online
Users browsing this forum: No registered users and 2 guests