My new subset of the study of grammar. Thoughts?

General discussion about learning languages
User avatar
Iversen
Black Belt - 4th Dan
Posts: 4787
Joined: Sun Jul 19, 2015 7:36 pm
Location: Denmark
Languages: Monolingual travels in Danish, English, German, Dutch, Swedish, French, Portuguese, Spanish, Catalan, Italian, Romanian and (part time) Esperanto
Ahem, not yet: Norwegian, Afrikaans, Platt, Scots, Russian, Serbian, Bulgarian, Albanian, Greek, Latin, Irish, Indonesian and a few more...
Language Log: viewtopic.php?f=15&t=1027
x 15040

Re: My new subset of the study of grammar. Thoughts?

Postby Iversen » Mon Jan 11, 2021 1:35 pm

Quote from WIkipedia (so that people can judge the whole context):

Strong verbs: past tense formed by changing the vowel of the stem, past participle in -en

Class 1: pattern ij-ee-ee
Class 2: pattern ie-oo-oo or ui-oo-oo
Class 3: pattern i-o-o or e-o-o
Class 4: pattern ee-a/aa-oo
Class 5: pattern ee-a/aa-ee or i-a/aa-ee
Class 6: pattern aa-oe-aa
Class 7: pattern X-ie-X (specifically, oo-ie-oo, a-ie-a, a-i-a, ou-iel-ou, aa-ie-aa or oe-ie-oe)
Other strong verbs, which don't follow any of the above patterns


If you know a wee bit about Dutch verbal morphology then the overview in the English Wikipedia isn't bad. If not, then you're lost. However the point is that there are strong verbs that change their stem vowels in different tempora, and there are weak ones that don't. And that's a common feature for all Germanic languages I know of, including English, so the authors have apparently assumed that they didn't need to explain the system.

To illustrate the formular X-ie-X, take a verb like blazen (to blow): hij blaast, hij blies, hij heeft geblazen (he blows, he blew, she has blown). You could have added more separate lines for the cases illustrated between the parenthesis, but that would have taken up more space. So X here is just meant as a convenient dummy-sign, whose interpretation should be clear from the context, and it is not meant as a sign which should be used beyond that, i.e. it isn't a proposal for a new type of elements or something like that. My main grieve with that line is that there is one item with -i- as past tense stem wowel and another one with -iel-, where the rest have -ie-. Those two items should have formed their own classes since they aren't covered by the preceding formula.

Actually some verbs are even more irregular, like gaan (to go): hij gaat, hij ging, hij is gegaan (he goes, he went, he has (is) gone), but their patterns are relegated to the following line where anything can happen.

You could try to have a look at the corresponding article in the Dutch Wikipedia, where the abstraction has gone too far in my opinion and become symbolism for symbolism's sake.
2 x

User avatar
AcademiaNut
White Belt
Posts: 47
Joined: Mon Jan 04, 2021 9:54 pm
Location: U.S.A.
Languages: English (N).
Spanish (beginner), French (beginner).
Medium interest: Latin, Dutch, German.
Mild interest: Japanese, Danish, Swedish, Portuguese, Greek, Hawaiian.
x 32

Re: My new subset of the study of grammar. Thoughts?

Postby AcademiaNut » Tue Jan 12, 2021 12:40 am

Cainntear wrote:You're also stuck in the outdated flat-text paradigm, as also demonstrated by your use of SAMPA rather than IPA.


And you're stuck on my mention of SAMPA. I'm not sure where we're not communicating, and I'm not sure it's worth my time to write up a lot of examples and justifications. Maybe you need to see a large set of examples to go along with a given X-Y notation example. I don't mind: I'll write up a sizeable Spanish example just because I'm curious myself, but it will take some time, maybe 1-2 days, especially since I have a school course starting today that will keep me extremely busy for the next two months.

Somewhere along the way it seemed you were trying to make my notation much more difficult than it is. My notation is based on an extremely simple concept, so obvious that I didn't even think about it when I began to use it. I suspect it's used everywhere, probably even in genetics. I'll try to make some basics clear:

    This system has nothing whatsoever to do with SAMPA, IPA, or the differences between SAMPA and IPA.
    This system has nothing whatsoever to do with flat text, Unicode, or the differences between flat text or Unicode.
    This system has nothing to do with context-free grammars (which are Type-2 in the Chomsky Hierarchy).

https://en.wikipedia.org/wiki/Chomsky_hierarchy

(However, if I added the "*" and "+" operators to the X's, that would turn it into a (probabilistic) regular expression, which is equivalent to a weaker grammar in the Chomsky Hierarchy called Type-3. The reason I don't do that is because I typically use the X notation for very simple patterns, so even regular expressions are far more complicated than I need.)

I'll repeat the original problem I was trying to solve, and the conditions I encountered that others may encounter: I want to be able to figure out how to write a new word when it does not exist in any dictionary I have, or in any readily available form online. These conditions existed when I began learning several new languages (simultaneously) years ago. I had a number of foreign pen pals and I was trying to learn their languages. That was before the Internet was common, so all that any of us had were foreign dictionaries or maybe textbooks to help us learn the other's language. One female, Portuguese-speaking pen pal in particular was so weak in English that if I had simply written the English word I couldn't translate ("cat food" is one example), she might well not have known what either of those two words meant, so I was the one who had to figure out how to write that so we could communicate. Eventually I realized after I began using the X notation that it could also be used to describe patterns in phonetics, spelling, grammar, and more.

Now let's talk about representation, which is what this X notation thread is about. Suppose that the general way to represent a pair of adjacent nouns in Portuguese, the first one of which is acting as an adjective, is to reverse the order that English uses, and to put a "de" between them. Here are the only ways I can think of how this concept could be represented:

    Describe it in text: "reverse the order that English uses, and to put a 'de' between them".
    Draw a picture, with two boxes being swapped, and the motion represented as arrows. (The upload options in this forum don't allow me to post such a photo since it doesn't allow uploads of photos from one's own computer.)
    Describe it with descriptive variables, as the 10 English sentence patterns does, with N1 and N2 representing the nouns.
    Do the above, but use subscripts on the variables: N1, N2. (The formatting options in this forum don't allow me to write that here.)
    Do the above, but use different letters for the variables: X, Y.
    Use Backus-Naur Form, which are full-word variables: <noun1> <noun2>

Ultimately all the above representations contain the same information, so the decision of which is best will come down to issues like readability, compactness, ease of production, universality of transmission, universality of keyboard, etc., and ultimately a person's taste. Now suppose in this age of the Internet a native Japanese speaker wants to learn Portuguese, and the only book he can find on the topic is a book written in English, a language he barely knows. When he comes to the textual description "reverse the order that English uses, and to put a 'de' between them" he will not be able to understand that representation easily, but a photo or mathematical expression would be understandable to him. Now suppose he wants to send that knowledge via e-mail to a pen pal of his in Kenya, who speaks Swahili, and is also weak in English. Suppose the guy in Kenya lives in an impoverished area with an extremely slow computer and extremely slow Internet, and perhaps extreme censorship of e-mail in his school so that photo attachments are filtered out. That rules out the pictorial representation, and probably the textual description, too. Suppose that the Kenya school computer also does not have MS Word or Adobe to read any files other than text. All these conditions repeatedly create pressure for all communicating parties to use text, and to avoid subscripts. That leaves only two options above, one of which is the X notation. This is where I'm coming from: universality of communication, ease of production, and optimal understanding under difficulties of not having high-quality computer hardware and high-quality software.

P.S.--I just found another site that is using X as a root word, though they use hyphens as in "non-X-philia":
https://www.facebook.com/etymonline/pos ... 047380077/
Last edited by AcademiaNut on Wed Jan 13, 2021 2:51 am, edited 4 times in total.
0 x

User avatar
Querneus
Blue Belt
Posts: 841
Joined: Thu Dec 01, 2016 5:28 am
Location: Vancouver, Canada
Languages: Speaks: Spanish (N), English
Studying: Latin, French, Mandarin
x 2287

Re: My new subset of the study of grammar. Thoughts?

Postby Querneus » Tue Jan 12, 2021 10:04 pm

AcademiaNut wrote:Somewhere along the way it seemed you were trying to make my notation much more difficult than it is. My notation is based on an extremely simple concept, so obvious that I didn't even think about it when I began to use it.

The conversation started going that way after I mentioned your notation is pretty reminiscent of that used to define programming languages, or in the handling of strings in programming languages. I didn't mention Chomsky, and it isn't Chomskyan.

I wouldn't call your notation simple and obvious at all, and I also said it's something that someone who's done programming or computer science would understand best. It's an okay niche, but to appeal more broadly (if you're interested in that at all), simple prose and examples is the way to go. I mean, nothing here is mutually exclusive, you can do the notation cheatsheet AND the prose cheatsheet / summary AND examples...

And for what it's worth, yes I got from the start that you're talking about common grammar differences between languages. Like how written Arabic ʔɪðaa + imperfect/perfect verb can map to 'if sb does, if sb really did' (real conditions) but law + perfect verb maps to 'if sb did in the future, if sb was doing it now, if sb had done it' (unreal conditions). The equivalences aren't perfect, but are useful rules of thumb.

When I focus on your notation, that's because that's the part of interest as I feel you're overestimating your readers with it (not everyone knows about programming languages' string formats). Again I'm just saying your work would appeal more broadly if you drop that notation as it'll usually prove unintuitive, but you decide what you wanna do...

I mean, I understand your notation just fine, but when you write things like:
/Xc/ [French] = {83% /Xk/, 17% /X/} [SAMPA]
I'm telling you you're gonna lose most of your readers. No offence intended, seriously.

Cainntear wrote:You're also stuck in the outdated flat-text paradigm, as also demonstrated by your use of SAMPA rather than IPA.

By the way, I'm still a happy user of X-SAMPA, when chatting with those who don't mind it. So much easier to type.
baI D@ "weI || aIm stIl @ "h{pi "juzɚ @v "Eks "s{mp@ || wEn ˈtS{tIN wIT "DoUz hu "doUnt "maInd It || "soU mVtS "izi@` tu "taIp

But I do use IPA in e.g. public forums, after simply copy-pasting it from this X-SAMPA-to-IPA converter, so even then I'm just typing X-SAMPA. But yeah, I know there are good reasons why people don't like reading dealing with (X-)SAMPA —it is harder to read.
0 x

User avatar
AcademiaNut
White Belt
Posts: 47
Joined: Mon Jan 04, 2021 9:54 pm
Location: U.S.A.
Languages: English (N).
Spanish (beginner), French (beginner).
Medium interest: Latin, Dutch, German.
Mild interest: Japanese, Danish, Swedish, Portuguese, Greek, Hawaiian.
x 32

Re: My new subset of the study of grammar. Thoughts?

Postby AcademiaNut » Wed Jan 13, 2021 1:50 am

Querneus wrote:I mean, I understand your notation just fine, but when you write things like:
/Xc/ [French] = {83% /Xk/, 17% /X/} [SAMPA]


Thanks for your polite response. Actually the notation I posted may not have even been SAMPA. At one point I may have developed my own phonetic notation, for some reason, but since I was rushed to provide an example as a response, I just pulled text from files that were 20+ years old. Remember, I'm just getting back into language learning again, and I haven't had a chance to go over all my crib sheet files to see where I left off. Again, the specific phonetic alphabet name is not an important issue--just change "[SAMPA]" to "[ANAN]" (AcademicNut's Asinine Notation) if you like, or put in the IPA equivalent--since the point I was trying to illustrate is more general than such details. Also, my own crib sheets never used that {...} percentage notation I posted, and my sheets didn't even have percentages calculated, but instead they used indented lines of lists with numerous supporting examples. As I mentioned, what I posted is a form I hadn't quite used before. That said, I still believe the brace-percentage notation I posted would be the best notation to use to condense all those lines of text into a quick summary... For anyone who needed it enough to persist at deciphering it.

There *is* an issue regarding convenient production of foreign characters, the issue to which Cainntear kept alluding, but that issue is not relevant to the topic of this thread. If I get time I want to start a thread on that foreign character production issue, too.

Anyway, at least I got a little feedback on the notation, which is what I wanted from this thread. 2 votes against, 0 votes for. As a result I'll be trying to think of more understandable alternatives.

By the way, here's another person using X and Y to represent words, in this case when discussing plurals in English:
https://fandom-grammar.livejournal.com/21593.html
0 x

Cainntear
Black Belt - 3rd Dan
Posts: 3531
Joined: Thu Jul 30, 2015 11:04 am
Location: Scotland
Languages: English(N)
Advanced: French,Spanish, Scottish Gaelic
Intermediate: Italian, Catalan, Corsican
Basic: Welsh
Dabbling: Polish, Russian etc
x 8806
Contact:

Re: My new subset of the study of grammar. Thoughts?

Postby Cainntear » Wed Jan 13, 2021 2:38 pm

AcademiaNut wrote:I'm not sure where we're not communicating, and I'm not sure it's worth my time to write up a lot of examples and justifications.

First up, just to apologise if I'm coming across as rude. I'm not the sort of person who's great at social nicities at the best of times, and I'm far worse in a text medium. In particular, when things are not clear, I'm pretty much incapable of being anything other than completely direct, as indirection often loses clarity.

But yes; until you've given enough data to illustrate and disambiguate, I will not understand you. The important point to consider here is that I am one of the users of this site best equipped to understand you, having coded rule-based language processing apps before, but here I am, confused and attempting to fill in the gaps in your explanations, and failing to do so.
But the more explanation I need, the more explanation average readers will need.
Somewhere along the way it seemed you were trying to make my notation much more difficult than it is.

I'm simply confused, and everything is difficult when you don't understand it.

I'll repeat the original problem I was trying to solve, and the conditions I encountered that others may encounter: I want to be able to figure out how to write a new word when it does not exist in any dictionary I have,

OK, here's a specific problem in your explanation so far: you have jumped between different problems throughout.
You say I fixated on your mention of SAMPA, fine; but it was something that you mentioned but which has nothing to do with "how to write a new word". When you were talking about naming it "formulaics", you said this:
AcademiaNut wrote:I did a lot of thinking last night about what to call this equation method. I tentatively decided on the name "formulaics," where it would be called "formulaic grammar" when applied to grammar and "formulaic phonetics" when applied to phonetics.

You have literally not been talking about "how to write a word", so how was I to know that was what you were intending to write about.
Now let's talk about representation, which is what this X notation thread is about. Suppose that the general way to represent a pair of adjacent nouns in Portuguese, the first one of which is acting as an adjective, is to reverse the order that English uses, and to put a "de" between them. Here are the only ways I can think of how this concept could be represented:

    Describe it in text: "reverse the order that English uses, and to put a 'de' between them".
    Draw a picture, with two boxes being swapped, and the motion represented as arrows. (The upload options in this forum don't allow me to post such a photo since it doesn't allow uploads of photos from one's own computer.)
    Describe it with descriptive variables, as the 10 English sentence patterns does, with N1 and N2 representing the nouns.
    Do the above, but use subscripts on the variables: N1, N2. (The formatting options in this forum don't allow me to write that here.)
    Do the above, but use different letters for the variables: X, Y.
    Use Backus-Naur Form, which are full-word variables: <noun1> <noun2>

Ultimately all the above representations contain the same information, so the decision of which is best will come down to issues like readability, compactness, ease of production, universality of transmission, universality of keyboard, etc., and ultimately a person's taste.

Yes. However, this leaves a very important elephant in the room: none of these contain all the information. Whatever notation you use, you've taken the trivially easy part that most learners will intuit for themselves. The difficult part is how you choose the rule to apply. The reason your rule failed in the example of dog food is because you didn't pick the correct rule in the first place, which was why I brought up first-class functions -- selecting the right rule from the set is part and parcel of identifying the right variables.

But again, you have reverted to describing something which is a transformation, after telling me that this isn't about transformations. Can you really blame me for being confused?
and the only book he can find on the topic is a book written in English, a language he barely knows. When he comes to the textual description "reverse the order that English uses, and to put a 'de' between them" he will not be able to understand that representation easily, but a photo or mathematical expression would be understandable to him.

A comparison to English word order would be every bit as meaningless to him if he doesn't speak English, though, wouldn't it?

There are two ways to describe language -- either by comparison to a known language, or in terms of linguistics. The more a system becomes independent of language, the more it relies on knowledge of linguistics. Attempts to make formal or algebraic notations have failed in the past, because most people come to language learning with lots of knowledge of their own language, and little knowledge of linguistics.
P.S.--I just found another site that is using X as a root word, though they use hyphens as in "non-X-philia":
https://www.facebook.com/etymonline/pos ... 047380077/

Querneus is quite right to say:
Querneus wrote:nothing here is mutually exclusive, you can do the notation cheatsheet AND the prose cheatsheet / summary AND examples...

...and I've already said as much. No number of examples of people doing what myself and Querneus have both already said is going to prove that I'm wrong. Instead, it works against your argument.

We've tried formal* grammars alone, and they don't work, because human brains don't work that way. The grammar of English I saw in BNF had been checked out of the library about 6 times in its entire existence, because no-one could use it.
[* Note that I am using formal grammars in a wider semantic sense of grammatical descriptions that are formally defined, not in the more restricted sense of CFGs or FSAs]
If you think your system is different and will work better, you need to show other people why and how it doesn't have the same failings as the systems that came before it.
0 x

User avatar
Deinonysus
Brown Belt
Posts: 1222
Joined: Tue Sep 13, 2016 6:06 pm
Location: MA, USA
Languages:  
• Native: English
• Advanced: French
• Intermediate: German,
   Spanish, Hebrew
• Beginner: Italian,
   Arabic
x 4636

Re: My new subset of the study of grammar. Thoughts?

Postby Deinonysus » Wed Jan 13, 2021 6:23 pm

AcademiaNut wrote:Math easily solves the description length problem you are discussing. It does this by conventions about what variables mean. For example, math uses variables a, b, c... for parameters or scalars, and uses variables x, y, z... for unknowns or vectors...

https://en.wikipedia.org/wiki/Variable_(mathematics)#:~:text=In%20mathematics%2C%20the%20variables%20are,and%20even%20a%20mathematical%20expression

Therefore a set of naming conventions set up in advance such as...
W = weak verb
L = vowel
C = consonant
N = noun
V = verb
A = adjective
etc.

...would greatly reduce the length of my description you are complaining about. Surely you are aware that more complex systems use such preliminary definitions to simplify things later on? That's why I don't use CSS when writing HTML code: CSS is too much overhead for the short web pages I tend to write, so the extra overhead is not worth it. In a larger HTML document, though, the CSS would in fact reduce the length. The same holds for object-oriented programming (OOP): I normally won't use OOP for short computational programs because the OOP requires setting up classes and methods that together is too much overhead for a short program. For a long simulation program, however, OOP saves length. The same holds for math proofs: At the beginning of the proof it states which variables are elements of R (known in advance to mean real numbers), Z (known in advance to mean complex numbers), and so on, before stating a short conclusion about all those entities. That same used to apply to FORTRAN programs, where variables with names that start with letters from I through N were understood to be integers, the rest were understood to be real numbers. This is just a general principle that applies to all fields, not just math or computers.

https://www.physicsforums.com/threads/f ... rs.756287/

You're correct that my thread was not only about calques, but more general, dealing with my shorthand notation where the variables could take on root words to form calques, but that my notation could do a lot more.

Suppose there were a word order convention that looked like...

X's very A M-N [English] = N-M A-issimus Xi [language L]

If all the variables were already defined, could you write a description that was shorter of how to make the needed modifications? I believe that would be very difficult. I can see at at glance the exact transformation that is made with that rule, but it would take me at least several seconds to read through a textual description of the same transformation. My notation is shorter, easier to understand at a glance, easier to find in a text search, would be *much* easier to standardize as a grammar description across all languages for all readers, and it generalizes greatly. Those are a lot of benefits to have at once.

Are you familiar with gloss notation? It is a bit similar to where you are heading with preassigned abbreviations, and there are a lot of them.

What makes this notation so effective is that it is designed for linguists and expects them to take the time to learn the system and all of its intricacies. Therefore, nothing needs to be defined on the spot. Any compromise to make it readable to the uninitiated would destroy its compactness. Also, it is for one thing and one thing only: an exact description of the grammatical function of each word in a sentence. Also note that it generally uses a romanization rather than a phonetic transcription. On Reddit's r/conlangs community, snippets of text in a conlang are conventionally given along with separate lines for a romanization, a gloss, and a phonemic transcription, and the phonetic transcriptions are generally given in IPA between forward slashes.

Of course, it is not uncommon to use some limited mathematical notation to explain morphological rules to non-experts. See this exerpt from G. Mauger's Cours de langue et de civilisation françaises (1953):
Mauger formula.png

Note that it is perfectly readable to write "FEMININE = MASCULINE + E" and "PLURAL = SINGULAR + S". There was no need to say, "Let F = FEMININE, M = MASCULINE, P = PLURAL, and S = SINGULAR. F = M + e; P = S + s". This would be less visually clear and compact then simply using the words themselves as variables.

If you want to create a notation system that is useful to others, I think you will need to think hard about a lot of questions, figure out exactly what it's for, and maybe compare to existing successful systems such as gloss notation.

And if you do figure out precisely what you want your system to do, you need to work on your elevator pitch. It was really hard to determine what your first post was even about, and after several pages I'm still not 100% sure what you're trying to convince us of.
You do not have the required permissions to view the files attached to this post.
3 x
/daɪ.nə.ˈnaɪ.səs/

User avatar
AcademiaNut
White Belt
Posts: 47
Joined: Mon Jan 04, 2021 9:54 pm
Location: U.S.A.
Languages: English (N).
Spanish (beginner), French (beginner).
Medium interest: Latin, Dutch, German.
Mild interest: Japanese, Danish, Swedish, Portuguese, Greek, Hawaiian.
x 32

Re: My new subset of the study of grammar. Thoughts?

Postby AcademiaNut » Thu Jan 14, 2021 1:11 am

Deinonysus wrote:Are you familiar with gloss notation?


Awesome. No, I haven't heard that term or seen the notation, but the concept is very familiar to me because that's what exactly what Barry Farber calls a "middle language" in his (great) book "How to Learn Any Language." I bought that book and browsed it many times.

https://www.amazon.com/How-Learn-Langua ... 1567315437

I'm fascinated by that topic, too, especially since that's another topic that foreign language learning textbooks never seem to mention, at least none that I've ever seen. I was thinking about starting a thread on that topic, too. Through introspection I've noticed that when I want to convert a sentence from one language to another, the very first thing I do is to convert it to middle language, and only then do I try to convert each word. That's a very good example of where the X notation would be useful, but the X notation wouldn't cover everything, since part of that translation to middle language is just knowing specific knowledge of how the target language expresses concepts. I tend to use the X notation for simpler, more specific things like <color adjective> <noun> [English] = <noun> <color adjective> [Spanish].

Example: The English sentence "Mister Pérez lives in the white house" gets converted to the
middle language "The Mister Pérez lives in the house white" before being translated to Spanish
as "El Señor Pérez vive en la casa blanca."

Example: The English sentence "Is that a black cat?" gets converted to the middle language "That
is a cat black?" before being translated to French as "C'est un chat noir?"
0 x


Return to “General Language Discussion”

Who is online

Users browsing this forum: Herodotean and 2 guests