using a custom corpus to master specialized vocabulary

All about language programs, courses, websites and other learning resources
mcthulhu
Orange Belt
Posts: 228
Joined: Sun Feb 26, 2017 4:01 pm
Languages: English (native); strong reading skills - Russian, Spanish, French, Italian, German, Serbo-Croatian, Macedonian, Bulgarian, Slovene, Farsi; fair reading skills - Polish, Czech, Dutch, Esperanto, Portuguese; beginner/rusty - Swedish, Norwegian, Danish
x 590

using a custom corpus to master specialized vocabulary

Postby mcthulhu » Mon May 29, 2017 1:55 pm

http://www.translationtribulations.com/ ... study.html is an interesting post on a professional translator's blog about how he used a corpus linguistics approach to master the technical vocabulary of a specialized field in Portuguese, starting with building his own corpus from online forums on that field. It seems like an approach that could be generalized to other languages and fields.

You have to admire someone who prepares this thoroughly for a test.

http://www.translationtribulations.com/ ... -2016.html is another post of his on the use of corpora, including a very positive review of AntConc for this purpose. (I agree AntConc ought to be in everyone's toolbox.)
4 x

User avatar
rdearman
Site Admin
Posts: 7231
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23128
Contact:

Re: using a custom corpus to master specialized vocabulary

Postby rdearman » Mon May 29, 2017 5:18 pm

Interesting!!!
0 x
: 0 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

User avatar
aokoye
Black Belt - 1st Dan
Posts: 1818
Joined: Sat Jul 18, 2015 6:14 pm
Location: Portland, OR
Languages: English (N), German (~C1), French (Intermediate), Japanese (N4), Swedish (beginner), Dutch (A2)
Language Log: https://forum.language-learners.org/vie ... 15&t=19262
x 3309
Contact:

Re: using a custom corpus to master specialized vocabulary

Postby aokoye » Mon May 29, 2017 5:47 pm

That first blog post was really interesting. I might try this with the material for TestDaF.
0 x
Prefered gender pronouns: Masculine

User avatar
MorkTheFiddle
Black Belt - 2nd Dan
Posts: 2114
Joined: Sat Jul 18, 2015 8:59 pm
Location: North Texas USA
Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
Language Log: https://forum.language-learners.org/vie ... 11#p133911
x 4824

Re: using a custom corpus to master specialized vocabulary

Postby MorkTheFiddle » Mon May 29, 2017 6:21 pm

AntConc is useful, among other things, for searching for examples of usage of phrases and of the subjunctive.

It helps to have several texts in one directory, and to have said texts as txt files. The suite has a handy converter for pdf-to-txt (among others). Using the core executable, AntConc.exe, the text files can be searched for specific words or phrases. Here are some results for French.

The French texts consist of several longish novels: Les Miserables by Victor Hugo, Les Thibault by Roger Martin du Gard, Le compte de Monte Cristo by Alexandre Dumas, La comptesse de Rudolstadt by George Sand, and Vie de Jeanne d'Arc by Anatole France. Originally the list included À la Recherche du temps perdu, but its results always swamped the others.

A search for 'disions' (I was working on the subjunctive) gave the following results

1 firent-ils? Il faut bien que nous le disions, puisque ceci est de l'histoire. Tandis qu Les Mis 4.txt 3 1
2 rs la chaleur. « Depuis des années, Antoine, nous disions : “Quand on sera l’élève de Jalicourt…” No Les Thibault - Tome 5 - La Sore - Roger Martin Du Gard.txt 10 1
3 semble étrange de mêler à ce que nous disions une question d'argent, eh! bien, mon ami, montecristo_2_pg17989.txt 12 1
4 du globe, cette quantité d'arsenic que nous disions tout à l'heure. C'est là réellement montecristo_2_pg17989.txt 12 2
5 mais j'en reviens à ce que nous disions. Eh bien, si ma mère pouvait savoir cette montecristo_2_pg17989.txt 12 3
6 fille. Ainsi, récapitulez: Villefort, comme nous disions, perdant toute sa famille d'une façon étra montecristo_4_pg17989.txt 14 1
7 , dans une sorte de ravissement, et nous nous disions par nos regards qu'il y avait là sand_comptesse.txt 15 1

Interpreting the results for the first example you get:
a) the word in context: firent-ils? Il faut bien que nous le disions, puisque ceci est de l'histoire. Tandis qu
b) the Source: Les Mis 4.txt, meaning les Miserables Book 4
c) the last number shows the number of occurrences of 'disions' in the whole text.

This tool is rather simple to use and gives very useful results. My stuff just skims the surface.
0 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson


Return to “Language Programs and Resources”

Who is online

Users browsing this forum: No registered users and 2 guests