This morning, I got some questions from David asking me to go into more detail about what I know about Linguistics.

The problem with the field is that a lot of it is pretty unscientific, so it’s hard to give definitive answers. Certain subfields like Phonetics are very scientific and measurable, but others like Syntax, Morphology, and Discourse Analysis tend to be more of an armchair science. Linguists like Noam Chomsky (the man behind most of modern Linguistics) made observations based on a small sample size of languages and then generalized that to some abstract “universal grammar,” disregarding any outliers or proof that their methods do not hold up to experiments.

A turning point for me happened during my Senior year of college, when Dr. Patrick Juola from Duquesne University came to speak in one of my classes. Dr. Juola had recently become famous for making a discovery about the author J. K. Rowling. Using a software he had developed to identify authorship, he correctly attributed a book she had written under a pseudonym back to her. The software worked by measuring the distribution of things like word length, word pairings, and function words (prepositions, conjunctions, and articles). Authors tend to write very consistently according to these stylistic attributes.

Dr. Juola had used the same methods to identify the authors of historical documents, scammers, and anonymous cyberbullying posts. His stories fascinated me, so I asked him how traditional linguistics applied to his software’s technology.

He explained that any time he had tried using traditional linguistic analysis and hiring linguists to help improve the accuracy of his software, the results actually got worse. When he worked with people trained in statistical analysis, the results improved. Thus, the Theory of Syntax was completely useless in real world applications.

He went on to explain how translation technologies originally attempted to use Chomsky’s idea of a Universal Grammar, converting words from one language into some universal medium and then translating that into the foreign language. These attempts never worked. It wasn’t until they began using more statistical methods of simply using the most common translations for words and phrases in a large database that these technologies developed the incredible accuracy that programs like Google Translate have today.

Needless to say, that conversation was somewhat of a shock for me. I realized that people had known for decades that much of traditional Linguistics was unempirical and largely useless. Yet somehow, the field of Linguistics hasn’t acknowledged this yet. Professors still receive thousands of dollars of grant funding to perform research on such pressing topics as critiques on past research on the syntax of the word “the” (this was an actual topic that one of my instructors was researching).

Perhaps now you can understand my disillusionment with the subject a little. I would never say that my time was all a waste. I learned some valuable skills while I was at it. Certain linguistic theories helped me speak foreign languages better. If nothing else, I learned how to learn—as well as a lot of random facts that I can share at parties during awkward silences: Did you know that some Americans pronounce “cot” and “caught” differently? Did you know that the silent “gh” in many English words used to be a sound, and that the correct plural of “octopus” is “octopodes”?

One thought on “Octopodes”

  1. Wow, if authors have a stylistic “fingerprint” then at some point that statistical software might be good enough to undermine a lot of anonymity from the internet. Awesome post.

Leave a Reply

Your email address will not be published. Required fields are marked *