Here’s an intriguing thought. I have a super-intelligent friend (one of those whose guides is Turing Award winner) who works on NLP. We have our occasional long-term phone calls where some or the other topic comes up for discussion. This time it was worth blogging about.
Quick overview – languages have rules, structures, etc. Sometimes, the rules become too complex, or at times, they are so specialized, they turn into a look-up table (i.e., not a lot of generaization.) Whenever you can’t generalize, you add entropy. Putting aside, for a moment, the poetic beauty of a language and the art of eloqution, many rules are redundant.
Consider language as simply a tool, a means to an end rather than the end in itself, designed to express a thought. If so, the less ambiguity something has, the easier it is, and the better it solves its purpose. When one first begins to learn computer languages, or even when they think of “parsing” English, every single person that I know goes through the thought process above. Why not just have a language that isn’t as nuanced? Why not design a simpler language? Esperanto certainly came out of a need, but building out a complete new language may not have been the solution. It appears that the need is already being met by modification to English itself.
I am beginning to believe that the very efficiency computational linquists want in a simple-to-parse language, is also the kind of simplicity the human brain wants. There is a certain idea you want to express. The nuances of whether I will do something, as opposed to whether something will be done by me, while undoubtedly helpful, may not be as necessary as we think. Facebook/Twitter are helping reinforce that idea. If you look at most non-proofread contemporary speech, it almost feels like a context-free language. It appears that what NLP wants, NLP may end up getting, simply because what makes NLP so hard is what also makes language itself so hard for most people.
Texting is the classic blatant example. Most texts are simply a gathering of words put together. There is a certain amount of context and syntax present to avoid ambiguity, but overall, the tools used to elimiate ambiguity are the ones that can do it in as blatant a way as possible, with as little
simplicity subtlety as possible. Similarly, few FB/Twitter posts seem to be carefully crafted treatises, but generally just words that present an idea. The less context necessary for the idea to be parsed, the better it is communicated. Five years ago, a lot of “old school” people, including me, would complain of the utter lack of punctuation in sentences. Instead of adapting punctuation correctly, I found that people learnt to phrase their text in such a way that addition of commas and full-stops became unnecessary. A modern FB post is as decipherable without punctuation, as it is with. That’s some creative adaptation, right there.
Another reason for this is search engines. Very rarely do you search for something like, “Give me movie times for today evening in Redmond.”
The same idea is expessed as simply as, “Movies redmond today”
Over time, it is not difficult to imagine this is how I might begin communication with a friend. Even the verb is implied and not explicitly stated! The parsing rules for this language are just ridiculously simple – tokenize the sentence, and you know what it’s saying.
Then again, I’m not blaming the internet or machines for this phenomenon. I think it is simply been the first time that a large population the entire earth is literate (even 30 years ago, when I was born, I knew plenty of people who couldn’t read or write.) Written language was, no matter how many people may dislike this, an elite previledge – and to some extend, an end in itself. When you are a club of handful people, you can end up in an ego-pissing match. What we might call spoken ‘peasant’ language was always utterly simple and efficient (though I find a lot of ideas I cannot express to them due to the lack of a vocabulary that can convey subtle differences.)
I’m not advocating anything here, but we have to admit that any complex and large system always tends towards reducing entropy over time. It does not mean literary art will have no appreciation, but it is an interesting thought. This would be an interesting hypothesis to test out, if only for the academic validity of the idea. Is modern human language finding a path towards reduction in the energy and ambiguity required to express an idea? Is it a dual-feedback loop where NLP systems are getting better with feedback, but also driving certain generalizations back into the human world?