The bit rate of human languages


Readers!
 
For many reasons, mostly political but partly ethical, I do not use Google, Facebook, Twitter. They practice corrupt business policies, while targeting conservative websites for censoring, facts repeatedly confirmed by news stories and by my sense that Facebook has taken action to prevent my readers from recommending Behind the Black to their friends.
 
Thus, I must have your direct support to keep this webpage alive. Not only does the money pay the bills, it gives me the freedom to speak honestly about science and culture, instead of being forced to write it as others demand.

 

Please consider donating by giving either a one-time contribution or a regular subscription, as outlined in the tip jar below.


 

Regular readers can support Behind The Black with a contribution via paypal:

Or with a subscription with regular donations from your Paypal or credit card account:


If Paypal doesn't work for you, you can support Behind The Black directly by sending your donation by check, payable to Robert Zimmerman, to
 
Behind The Black
c/o Robert Zimmerman
P.O.Box 1262
Cortaro, AZ 85652

 

You can also support me by buying one of my books, as noted in the boxes interspersed throughout the webpage. And if you buy the books through the ebookit links, I get a larger cut and I get it sooner.

Language scientists think they have determined that the universal bit rate for transmitting information across multiple languages is about 39 bits per second.

Scientists started with written texts from 17 languages, including English, Italian, Japanese, and Vietnamese. They calculated the information density of each language in bits—the same unit that describes how quickly your cellphone, laptop, or computer modem transmits information. They found that Japanese, which has only 643 syllables, had an information density of about 5 bits per syllable, whereas English, with its 6949 syllables, had a density of just over 7 bits per syllable. Vietnamese, with its complex system of six tones (each of which can further differentiate a syllable), topped the charts at 8 bits per syllable.

Next, the researchers spent 3 years recruiting and recording 10 speakers—five men and five women—from 14 of their 17 languages. (They used previous recordings for the other three languages.) Each participant read aloud 15 identical passages that had been translated into their mother tongue. After noting how long the speakers took to get through their readings, the researchers calculated an average speech rate per language, measured in syllables/second.

Some languages were clearly faster than others: no surprise there. But when the researchers took their final step—multiplying this rate by the bit rate to find out how much information moved per second—they were shocked by the consistency of their results. No matter how fast or slow, how simple or complex, each language gravitated toward an average rate of 39.15 bits per second, they report today in Science Advances. In comparison, the world’s first computer modem (which came out in 1959) had a transfer rate of 110 bits per second, and the average home internet connection today has a transfer rate of 100 megabits per second (or 100 million bits). [emphasis mine]

When I went to Russia the first time in 1995 I used to joke with my caving friends there that the real reason the U.S. won the cold war was that English words routinely used one syllable for the three required by Russian and thus we could get things done in one third the time. I would start listing comparable words, (for example “good” vs “khah-rah-shoh” and “please” vs “pa-ZHAL-sta”) and challenge them to come up with any example where the Russian word had fewer syllables. It drove them crazy because they couldn’t do it.

I was joking of course. It makes sense that the information rates should actually be pretty much the same, as this study suggests. However, the highlighted words also suggest that the subtle differences should also not be ignored.

Share

9 comments

  • Garry

    This part

    Japanese, which has only 643 syllables

    confuses me. There are roughly 50 syllables in Japanese, each represented by one hiragana character (used if the word is of Japanese origin) and one katakana character (used if the word is of foreign origin). They must mean something different than what is conventionally considered a syllable.

  • Imagine a new language that got concepts across (e.g. words) using the fewest number of syllables. By increasing the number of bits per syllable and reducing the number of syllables per word, more infuriation could be translated in the shortest period of time.

    It’s probably an impractical concept because learning a new language is not easy and a new language has few others that speak it to make it worth learning.

  • mike shupp

    I seem to recall — I don’t speak the language actually — that Russian leaves out the definite (“the”) and indefinite (“a”, “an”) articles which are fixtures in most English speech. Presumably context supplies non-speech equivalences. “I’ll have big red apple” works as well as “I’ll have that big red apple.” So that would increase the efficiency of Russian.

  • mike shupp: You are correct. Russian does not have articles like “a,” “an,” and “the.” However, there is a big difference between asking for “an apple” or “the apple.” Lacking the article means you have to use a lot more words to refer to that specific apple. Thus, the efficiency you refer to doesn’t really help much.

    I often wonder if the reason for the stereotype of Russian bluntness comes from this lack. When they learn other languages, they are not used to using articles, which means it is not rare for them to routinely leave them out. And to an English ear, speaking without articles always appears blunt and brutish.

  • Diane E Wilson

    Russian also lacks some forms of “to be”, especially in present tense, leading to Hamlet’s quandary, “to be, or not to be?” became an existential question, to exist or not exist. The “Hamlet Question” became an ethical question among minor Russian nobility stuck in the army, pondering whether one had an obligation commit suicide. Russia is certainly a fascinating culture.

    Semi-related, I read Clockwork Orange a few years ago, after seeing the movie when it first came out. The thug gangs have a lot of slang that is nearly impenetrable, unless you’ve studied a bit of Russian and realize that virtually all the slang is simple substitution of Russian words for English. One fascinating exception is a bilingual rhyming slang, transforming “khorosho” (good) into “horror show” (also meaning good).

  • Edward

    If the bit rate is similar for most or all languages, it seems to me that there must be an optimal rate for comprehension by the listener. Has anyone else heard Ben Shapiro speak? How about those rapid-fire disclaimers on radio?

  • Previously noted: best form on the ‘Net.

    And the thread earned a Diane E Wilson comment.

  • mike shupp

    Edward —

    Nice points. Thinking about it, I’d suggest that an awful lot of rapid-fire speech isn’t intended for effectively conveying data. The disclaimers that come at the end of pharmaceutical commercials is mostly boiler plate; people skim over this unless they catch a word or phrase that reverberates with them. A lot of political speech or similar commentary is designed to get the listener’s acquiescence rather than communicate details — if I’m trying to persuade an audience that Germany must expand to the East, the last thing I want is some pedant in the back arguing that my interpretation of medieval Hungarian land ownership patterns is incorrect.

    Thinking further, I’ve been watching a lot of television recently, after years of generally ignoring it, and one of the changes I notice is that a lot of newscasters speak really quickly these days.

  • Edward

    mike shupp wrote: “The disclaimers that come at the end of pharmaceutical commercials is mostly boiler plate; people skim over this unless they catch a word or phrase that reverberates with them.

    This reminds me of the time I was listening to the radio (but not very carefully to the ad that was playing), and when the disclaimer came, I heard the phrase, “including death.” That got my attention, but I wasn’t quite sure what the drug’s name was, and I never heard the ad again. Since then, I have been avoiding all medications that begin with or contain the syllables “vita-” because that is what I (mis)remember hearing from the ad.

    the last thing I want is some pedant in the back arguing that my interpretation of medieval Hungarian land ownership patterns is incorrect.

    Wait, mike shupp, you think that your interpretation of medieval Hungarian land ownership patterns is not incorrect? Or did I miss your point, and I should have read that so fast that I didn’t catch that part to get all pedantic about it?

    I’ve been watching a lot of television recently, after years of generally ignoring it

    I’ve been generally ignoring television recently, after years of watching a lot of it. Could it be that the newscasters speak really quickly these days because they are just excited about the prospects of overthrowing — er — de-electing Trump (e.g. impeachment, Twenty-fifth Amendment, Meuller’s report, Strzok’s insurance policy, Comey’s coup, whatever-this-week’s-scandal-is, etc.).

Leave a Reply

Your email address will not be published. Required fields are marked *