Monday, July 6, 2009

how can genes possibly do all they do?

There are a little over 3 billion base pairs of DNA in the human genome. The four possibilities for each one (A C G T) can be encoded in two bits, so that's 750 Megabytes of information, and the genome has lots of common sequences and repeating sections so you can compress it down a lot. It amounts to as much data as there is on a music CD, far less than on a DVD. That is completely ridiculous!

parts of earThose genes not only lay out the biochemistry of individual cells, they also are blueprints for the physical arrangement of our bodies. When you go to a doctor's office and see the pink medical illustrations of something like your eye or intestine or ear, and notice that every single nook and layer has a name, somehow your genes result in that structure, and thousands upon thousands of other structures in your body. Just the CAD program for the shape of the bones in your inner ear would take up many megabytes. The design and operating manual for the eye would be thousands of pages. You could accept this if each cell had specialized blueprints, but every one of 10 trillion cells in your body carries the identical DNA. How can one compact set of instructions make tens of thousands of complicated parts!!

But wait, there's more. The same genome also codes for innate behaviors. Bees dancing in a certain way. My dog biting the ankle to herd cattle. The staple of popular magazine thought: males who fool around perpetuate their DNA better so their cheatin' genes win. And the mother of all behaviors, human language. Now this would be plausible if humans had a monster additional section appended to our genome, an extra hundred million DNA pairs for altruism and brain development and tool-building and grammar and so on, but the chimpanzee genome is 95% identical to the human genome. No way!!!

This overloading is insane. It must mean that the genetic tweak to have, say, a tendency to Asperger's must also affect lower-level things, and that not all variations are possible, and that the command & control explanation of the genome "the DNA says build this way, operate this way, develop this way, behave this way" must be insufficient. I tried to put this absurdity to gene fan Matt Ridley at a talk, but he brushed it off, he maintains it's all in the genes. But in our DNA there are only "20,000–25,000 protein-coding genes, far fewer than had been expected before its sequencing. In fact, only about 1.5% of the genome codes for proteins," Huh??!

A long time ago I went to an Ask a Scientist talk by Dr. Terrence Deacon. It was dizzyingly complex and hard to follow, but he acknowledged the essential craziness. Instead of throwing up his hands and expressing disbelief as I'm doing, he (and presumably other scientists) reasons out what the mismatch between a CD's worth of information and the outcome must imply.

Very roughly speaking, our DNA can't possibly code for this complexity and yet the tiny differences in our DNA resulted in it. Therefore the complexity has to be emergent somehow. There has to be some interplay with the environment that emerges over time, at every level — with the chemistry in the cell, with the differentiating cells nearby, with the other synapses in the brain, with the other animals in the group, and now in humans, with language. As we evolve, these external factors co-evolve with us, and the system as a whole reliably produces the complexity. Some of those ideas are presented in this interview. Heady stuff, but it seems undeniable. Meanwhile, you can go to Ensembl.org and walk the genome of various animals; here are the first million base pairs on chromosome 1 in us, export as text to get the CCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACCCTA goodness.