Software for detecting english words in a french text

Ask specific questions about your target languages. Beginner questions welcome!
zamine
Posts: 1
Joined: Wed Dec 13, 2017 6:54 pm
Languages: French, English(beginner)

Software for detecting english words in a french text

Postby zamine » Wed Dec 13, 2017 7:09 pm

Hello
I'm searching a software with wich i can auotmaticaly detect all english words used in a french text
I'm not familiar with linguist or advanced NLP tools and my searching, so i wish finding by your help a software that permit this detection ( as simply as possible ).
thanks by advance
0 x

mcthulhu
Orange Belt
Posts: 228
Joined: Sun Feb 26, 2017 4:01 pm
Languages: English (native); strong reading skills - Russian, Spanish, French, Italian, German, Serbo-Croatian, Macedonian, Bulgarian, Slovene, Farsi; fair reading skills - Polish, Czech, Dutch, Esperanto, Portuguese; beginner/rusty - Swedish, Norwegian, Danish
x 590

Re: Software for detecting english words in a french text

Postby mcthulhu » Wed Dec 13, 2017 10:16 pm

You might try installing the Helsinki Finite State Technology software at https://sourceforge.net/projects/hfst/f ... ansducers/ and the French transducer available at the same site. (A transducer, if you're not familiar with the term, turns a set of strings into another set of strings, in this case by adding morphological information.) You didn't say what operating system you are using, but HFST is in Java, so it should be able to run on Windows or Linux, etc. It should be fast enough to handle a fairly high volume of data.

The idea here is that a transducer that knows only French words will be able to analyze French words correctly, but will fail with non-French words. It will also fail with valid French words that don't happen to be in its dictionary, of course, but I don't think there's a way to avoid that with any software. So if you run it in interactive mode with "hfst-optimized-lookup french.hfst.ol," and then input individual words (note - you'd probably want to redirect input from a file instead in practical use), you will get output like

Code: Select all

laissera        laisser+verb+singular+indicative+future+thirdPerson
canard  canard+commonNoun+masculine+singular


for valid French words, but output like

Code: Select all

dog     +?
systematic      +?


for unrecognized words that the transducer can't parse. So, if you can filter on the "+?" (e.g. with grep, if you're using Linux), you should have a list of words that the FST doesn't think are French.

It might also be possible to do this with a French spelling checker, if there's one you can use from the command line. An FST should be more capable, though.
3 x


Return to “Practical Questions and Advice”

Who is online

Users browsing this forum: No registered users and 2 guests