question on useful level of detail

Ask specific questions about your target languages. Beginner questions welcome!
mcthulhu
Orange Belt
Posts: 228
Joined: Sun Feb 26, 2017 4:01 pm
Languages: English (native); strong reading skills - Russian, Spanish, French, Italian, German, Serbo-Croatian, Macedonian, Bulgarian, Slovene, Farsi; fair reading skills - Polish, Czech, Dutch, Esperanto, Portuguese; beginner/rusty - Swedish, Norwegian, Danish
x 590

question on useful level of detail

Postby mcthulhu » Sun Jul 02, 2017 3:41 pm

I'm curious about the level of detail that people would find desirable in the information provided about a given chunk of text. The trade-off might be between ease of use, and level of detail. Right now I'm having Jorkens automatically process the sentence you're hovering the mouse over. That processing is limited (right now) to doing a glossary/dictionary lookup for every word in the sentence and displaying a list of whatever definitions are found in the local glossary. This is working pretty well, and is easier than having to hover over individual words. The output can be pretty concise, assuming your personal glossary entries are just one or several translations (possibly a couple of primary meanings taken from a page-long comprehensive/verbose dictionary entry).

It would also be possible to automate an online dictionary lookup of any not-found words, or even mash together the information from multiple online dictionaries and show it all together. In that case the sentence's mini-glossary probably could be pages of information, if that's what comes back. So far I'm not automatically looking up not-found words, and letting the user do it manually. That's more work for the user, of course, especially if there are many new words. Another consideration here is that you might not even want to look up every word not in your personal glossary, e.g. if it's too obvious to you, or if it's a name or place name, in which case it would just be taking up space in the search results. Prepositions, articles, pronouns, etc. tend to just pad out search results (unless you are very new to a language), so I leave them out of my local glossary. I also wouldn't want them to show up in automated dictionary searches.

I'm also leaving for manual searches the retrieval of frequency information, sample sentences from online corpora or the local translation memory, etc., all of which could be displayed automatically instead. My assumption is that while displaying glossary search results by default is desirable, this additional information is not (because it would probably be displayed for everything).

The main thing I'm pondering at this point, however, is how to show a user some morphological analysis information, possibly merged with the glossary results. I'm going to try to add automatic morphological analysis to the default sentence processing, in addition to the glossary searches. It's useful information, especially if you're trying to read a new language; but how much could be shown by default without having it be too much of a good thing? The lemma would be useful, of course, and probably the part of speech; but morphological parsers tend to list all possible parses and parts of speech. Even if stop words were removed (probably desirable), it would still be a lot of information. Doing it for a single word at a time, on demand, would be a possibility; but doing it automatically for a reasonable chunk of text (like a sentence) would be less work for a reader. Another consideration, of course, is that people's needs differ; beginners struggling with grammar need more help than someone who's already reading with ease. See the note about common or unwanted words above, and the space taken up by the parses for "der" and "die" in the example below.

One idea might be to alter the display of the current sentence, e.g. colorizing parts of speech (make all verbs green), or inserting lemmas in superscript or title attributes (mouseover) after inflected forms. The less work I have to do to make use of morphological information the better.

Or I could not show any of it by default; the Vocabulary tab is a frequency-sorted word list for the current chapter, and I could add it there. Then the user would have to move to that tab, and search for a word in that table; which might get old fast if you needed to do it much.

Maybe I should add some preference settings for level of detail. I'd still have to decide what's a reasonable default.

Anything is possible; it's just a matter of how much work it is do do it.

Here's an example of morphological output for a sentence from a book I'm reading in Jorkens, which might illustrate the problem better:

Ich versuchte, nicht mehr hinzusehen, aber meine Augen wurden magisch von der Schaukel angezogen … fast, als würde ein Teil von mir die Qualen genießen.

The morphological parser's raw output for that, which I probably would not want to display unfiltered, is:

Ich <CAP>ich<+PPRO><pers><1><Sg><NoGend><Nom>

versuchte ver<PREF>suchen<+V><1><Sg><Past><Ind>
versuchte ver<PREF>suchen<+V><1><Sg><Past><Konj>
versuchte ver<PREF>suchen<+V><3><Sg><Past><Ind>
versuchte ver<PREF>suchen<+V><3><Sg><Past><Konj>
versuchte versuchen<+V><1><Sg><Past><Ind>
versuchte versuchen<+V><1><Sg><Past><Konj>
versuchte versuchen<+V><3><Sg><Past><Ind>
versuchte versuchen<+V><3><Sg><Past><Konj>
versuchte versuchen<V><PPast><SUFF><+ADJ><Pos><Masc><Nom><Sg><Sw>
versuchte versuchen<V><PPast><SUFF><+ADJ><Pos><Fem><Akk><Sg>
versuchte versuchen<V><PPast><SUFF><+ADJ><Pos><Fem><Nom><Sg>
versuchte versuchen<V><PPast><SUFF><+ADJ><Pos><NoGend><Akk><Pl><St>
versuchte versuchen<V><PPast><SUFF><+ADJ><Pos><NoGend><Nom><Pl><St>
versuchte versuchen<V><PPast><SUFF><+ADJ><Pos><Neut><Akk><Sg><Sw>
versuchte versuchen<V><PPast><SUFF><+ADJ><Pos><Neut><Nom><Sg><Sw>

nicht nicht<+PTKL><Neg>

mehr mehr<+ADV>
mehr mehren<+V><Imp><Sg>
mehr mehr<+INDEF><pro><oD>

hinzusehen hin<PREF>sehen<+V><Inf><zu>
hinzusehen hinzu<PREF>sehen<+V><1><Pl><Pres><Ind>
hinzusehen hinzu<PREF>sehen<+V><1><Pl><Pres><Konj>
hinzusehen hinzu<PREF>sehen<+V><3><Pl><Pres><Ind>
hinzusehen hinzu<PREF>sehen<+V><3><Pl><Pres><Konj>
hinzusehen hinzu<PREF>sehen<+V><Inf>

aber aber<+ADV>

meine meinen<+V><1><Sg><Pres><Ind>
meine meinen<+V><1><Sg><Pres><Konj>
meine meinen<+V><3><Sg><Pres><Konj>
meine mein<+POSS><pro><Fem><Akk><Sg>
meine mein<+POSS><pro><Fem><Nom><Sg>
meine mein<+POSS><pro><NoGend><Akk><Pl>
meine mein<+POSS><pro><NoGend><Nom><Pl>

Augen Auge<NN>en<+NN><Masc><Akk><Sg>
Augen Auge<NN>en<+NN><Masc><Nom><Sg>
Augen Auge<NN>en<+NN><Masc><Dat><Sg>
Augen Auge<+NN><Neut><Akk><Pl>
Augen Auge<+NN><Neut><Dat><Pl>
Augen Auge<+NN><Neut><Gen><Pl>
Augen Auge<+NN><Neut><Nom><Pl>
Augen Augen<+NN><Masc><Akk><Sg>
Augen Augen<+NN><Masc><Dat><Sg>
Augen Augen<+NN><Masc><Nom><Sg>

wurden werden<+V><3><Pl><Past><Ind>
wurden werden<+V><1><Pl><Past><Ind>

magisch Magen<NN>isch<SUFF><+ADJ><Pos><Pred>
magisch Magen<NN>isch<SUFF><+ADJ><Pos><Adv>
magisch magisch<+ADJ><Pos><Pred>
magisch magisch<+ADJ><Pos><Adv>

von von<+PREP><Dat>

der die<+REL><subst><Fem><Dat><Sg>
der die<+ART><Def><NoGend><Gen><Pl>
der die<+ART><Def><Fem><Dat><Sg>
der die<+ART><Def><Fem><Gen><Sg>
der die<+DEM><subst><Fem><Dat><Sg>
der die<+DEM><subst><Fem><Gen><Sg>
der der<+DEM><subst><Masc><Nom><Sg>
der der<+REL><subst><Masc><Nom><Sg>
der der<+ART><Def><Masc><Nom><Sg>

Schaukel <CAP>schaukeln<+V><Imp><Sg>
Schaukel Schaukel<+NN><Fem><Akk><Sg>
Schaukel Schaukel<+NN><Fem><Dat><Sg>
Schaukel Schaukel<+NN><Fem><Gen><Sg>
Schaukel Schaukel<+NN><Fem><Nom><Sg>

angezogen an<PREF>ziehen<+V><PPast>
angezogen an<PREF>ziehen<V><PPast><SUFF><+ADJ><Pos><Pred>
angezogen an<PREF>ziehen<V><PPast><SUFF><+ADJ><Pos><Adv>

fast fast<+ADV>

als als<+KONJ><Vgl>

würde werden<+V><3><Sg><Past><Konj>
würde werden<+V><1><Sg><Past><Konj>

ein ein<+ART><Indef><Neut><Akk><Sg>
ein ein<+ART><Indef><Neut><Nom><Sg>
ein ein<+ART><Indef><Masc><Nom><Sg>

Teil <CAP>teilen<+V><Imp><Sg>
Teil Teil<+NN><Neut><Akk><Sg>
Teil Teil<+NN><Neut><Nom><Sg>
Teil Teil<+NN><Neut><Dat><Sg>

von von<+PREP><Dat>

mir ich<+PPRO><prfl><1><Sg><NoGend><Dat>

die die<+DEM><subst><Fem><Akk><Sg>
die die<+DEM><subst><Fem><Nom><Sg>
die die<+DEM><subst><NoGend><Akk><Pl>
die die<+DEM><subst><NoGend><Nom><Pl>
die die<+REL><subst><Fem><Akk><Sg>
die die<+REL><subst><Fem><Nom><Sg>
die die<+REL><subst><NoGend><Akk><Pl>
die die<+REL><subst><NoGend><Nom><Pl>
die die<+ART><Def><Fem><Akk><Sg>
die die<+ART><Def><Fem><Nom><Sg>
die die<+ART><Def><NoGend><Akk><Pl>
die die<+ART><Def><NoGend><Nom><Pl>

Qualen Qual<+NN><Fem><Akk><Pl>
Qualen Qual<+NN><Fem><Dat><Pl>
Qualen Qual<+NN><Fem><Gen><Pl>
Qualen Qual<+NN><Fem><Nom><Pl>

genießen genießen<+V><1><Pl><Pres><Ind>
genießen genießen<+V><1><Pl><Pres><Konj>
genießen genießen<+V><3><Pl><Pres><Ind>
genießen genießen<+V><3><Pl><Pres><Konj>
genießen genießen<+V><Inf>

null
0 x

User avatar
coldrainwater
Blue Belt
Posts: 687
Joined: Sun Jan 01, 2017 4:53 am
Location: Magnolia, TX
Languages: EN(N), ES(rusty), DE(), FR(studies)
Language Log: https://forum.language-learners.org/vie ... =15&t=7636
x 2392

useful level of detail

Postby coldrainwater » Sun Jul 02, 2017 9:08 pm

A couple of ideas.

For L2 work, I would probably like something that breaks super long sentences into smaller more digestible chunks (phrases or sentences) while losing a minimal amount of information. That technique could be triggered by sentences beyond some length (set by the user in configuration) and would connect portions of the sentences that may be related grammatically (ie 'the dog flies'...embedded in a sentence that goes 'the dog...[followed by 14 lines of deep descriptive detail with surprisingly obscure diction]....flies). For those 14 line descriptive sections, I might want all those in a dictionary or list structure where the dictionary could be selected in the case where the words used were of a particularly low frequency and omitted for higher frequency words that you can assume the reader would know by default.

Personally, I may fail at understanding a sentence when worded in a particular manner but would succeed easily if the sentence would be rewritten smoothly in [say three or four] other ways. The only point I might make here is the other variants of the sentence would not necessarily need to be a simplification, but rather they would use modified diction and grammar. My odds of being able to figure it out go up dramatically with the rewordings.

So far, I made a couple of minuscule points in a much larger arena, but hopefully they will be of some assistance. What I have in mind is essentially a majorly improved version of Chrome extensions that correctly read whole phrases and sentences using Google Translate. I know it can be annoying for me as an end user to have to chunk it out myself and I would probably want a tool that split the split the level of detail more so than brought it down.

Another idea is that I would probably make hotkeys available (or right click context menu) for the type of translation and examples provided. For example alt+key might yield morphological level information should I need it.
0 x


Return to “Practical Questions and Advice”

Who is online

Users browsing this forum: No registered users and 2 guests