Language and Society Blog: The "Language" of DNA

Introns, short for "intragenic regions", are regions of DNA, also known as "junk DNA." They gained this nickname because they do not seem to code for anything. Unlike exons, the regions of DNA that are transcribed and translated into proteins, introns are spliced out during protein synthesis. Ironically, although they do not seem to have a specific function, introns make up the majority of DNA. Four types of introns are known to exist: Nuclear introns (which are spliced out by spliceosomes), and group I, II, and III introns (which "self-splice").

Several theories exist about the origin and function of introns. The two main competing theories are the Introns-Early model and the Introns-Late model. The Introns-Early model, suggests that introns are ancient, existing in the earliest prokaryotes. Over evolutionary time, introns were lost in order for organisms to grow more efficiently. Early introns functioned in exon recombination that produced new proteins and eventually new genes. The Introns-Late model suggests that introns emerged from parasitic transposons after eukaryotes and prokaryotes split off. Simon Shepherd, from the University of Bradford proposed that introns may function as a correcting system that helps to fix mistakes that are made during DNA replication.

The debate over intron origin and function is continuous and scientists have not come to a concrete conclusion. However, a recent study has demonstrated an interesting characteristic of introns in DNA. Collaboration between doctors, physicists, and linguists demonstrates that DNA obeys Zipf's Law, a law that is applicable to all human languages.

In Zipf's Law, the popularity of a word (ranked) is inversely proportional to how many times the word is used. If the popularity rank of a word in a book is graphed vs. the number of time the word appears in a book, the graph will yield a straight line. Therefore the most popular word will be used twice as often as the second most popular word which occurs twice as often as the third most popular word. For example, "the", which is the most frequently used word in the English language makes up about 7% of all words used. "Of", the second most popular word, makes up about 3.5% of words used. Linguists claim that Zipf's Law governs all human languages.

Scientists divided DNA into "words" of nucleotide sequences of varying lengths. When ranking the frequency of these "words" against the number of times the words appeared, the graph yielded a straight line. The scientists therefore claimed that the structure of DNA abides by Zipf's Law.

Many further studies of Zipf's Law take away from its magic. G.A. Miller's "monkey typing on a keyboard" experiment argued that a monkey typing randomly at a typewriter with more than one key and a space bar would generate the same pattern of Zipf's Law. Most psychologists and linguists ignore Zipf's Law, seeing it simply as a statistical probability with no inherent significance.

Although most of the articles I used to follow up on this concept were extremely complex mathematically (and I did not really understand them), I thought the concept of language having a mathematical pattern was interesting. At first it made language seem more universal, since all languages are supposed to abide Zipf's Law. But if it is true that a monkey typing randomly on a keyboard, forming words of random letters gets the same results, Zipf's Law doesn't seem very important at all in analyzing the structure of language. Studies seem to suggest that any form of written symbols will follow Zipf's law, so it would make sense that if scientists created a language for DNA, it would do the same. What I found most interesting was that someone actually discovered this pattern. I was reminded of Joe's comment a few weeks ago about how humans (especially in western culture) have a need to categorize and explain everything. Howver, despite its validity or its usefulness, this article was really interesting because of how interdisciplinary the topic was. It combined biology, linguistics and psychology in examining the "language" of DNA.

Links:
1. Kruszelnicki, Karl. "Language in Junk DNA." ABC. Retrieved from http://www.abc.net.au/science/k2/moments/s133634.htm on November 24, 2007.

2. http://ieeexplore.ieee.org/iel5/18/29003/01306541.pdf

3. http://www.jstor.org/view/00029556/ap050317/05a00180/0

4. http://en.wikipedia.org/wiki/Zipf's_law

1 comment:

Steve said...: Nice post! I should point out that spoken languages actually do possess a wealth of statistical regularities that help language learners (like babies) parse and learn the language. For instance, one problem the baby must figure out is what the individual words in a language actually are, since we do not pause after each word when we speak naturally! recently, psychologists have shown that babies can use statistical regularities in a speech stream in terms of which syllables are likely to follow one another (and therefore constitute a word) to help learn the "words" of a language! (for example, the syllable "bee" is statistically more likely follow the syllable "ba" and form the word "baby" than it is to follow the syllable "ga", which would form a non-word "gaby").; December 2, 2007 at 6:43 PM

Language and Society Blog

Sunday, November 25, 2007

The "Language" of DNA

1 comment:

Blog Archive

About Me