This week I have a longer than usual piece, so I’m going to skip links.
Reminder for DC area readers: AI Bloomers 4 takes place today, 6pm, at Union Pub on Capitol Hill. I hope to see some of you there. Register here.
A review of How Life Works: A User’s Guide to the New Biology by Philip Ball (University of Chicago Press, 2023)
Introduction
In high school, like many, I learned a fairy tale. I learned that mean, greedy businesses caused the Great Depression, only to be saved by a benevolent government. I learned that atoms are like little pokemon balls, and electrons are like even littler pokemon balls. And, educated as I was in the shadow of the Human Genome Project, I learned that our genes are our “blueprint,” providing the “code” to life.
In college and throughout my 20s, all of this was complexified. I added epicycles upon epicycles, but eventually it became too much. I decided to start over, basically, from scratch. My academic focus after high school was history, political theory, and public policy, so I naturally started there. Easy enough. Then, I re-educated myself in physics, starting with quantum mechanics and working my way up the ladder of abstraction. But for some reason, I always assumed my conception of the genome was safe. After all, we sequenced it, right? Is that not—pardon the pun—DNA evidence? Genes encode proteins, proteins are the machines that constitute life. All that’s left to do is figure out which proteins each gene encodes, and what the functions those proteins have. A daunting task, but surely achievable with modern tools.
But then, I started writing about AI, and sought to probe into the issue of AI biorisk. If DNA really is like software code, then proteins must be a bit like apps. And that would mean that biology is within spitting distance of being “solved.” Mastery of DNA means that the entire possibility space of lifeforms is open to us, rather than just the ones nature granted us. And arbitrary changes to existing lifeforms must be within reason. And with AI to orchestrate it all, “prompt engineered babies,” as Sam Hammond joked recently, must be around the corner. So would supercharged, custom-made bioweapons. After all, biology is now an information science, I believed. Almost anything is possible.
One question lingered: where were all the new drugs? AlphaFold is five years old; the Human Genome project was completed 23 years ago. Was it the FDA’s fault? Bad market structure? Or was there something more fundamental I was missing?
So off I went to investigate. “What is the state of modern biology,” I sought to learn. And as I explored, a familiar pattern started to emerge. Maybe genes aren’t so much like a blueprint. Maybe proteins are more complicated than I thought, and knowing (or rather, being able to predict with increasing accuracy) their structure is just one part of a very large puzzle. I noticed myself starting to add epicycles to the picture of biology I learned in high school. “Uh oh,” I thought. Had the sins of mid-century modernism infected my understanding biology, as they had so much else? Was it time to throw out my old ideas, and start fresh?
In his important new book How Life Works: A User’s Guide to the New Biology, Philip Ball (no relation, as far as I know) makes a resounding case that the answer is “yes.” Ball worked as an editor at Nature for two decades, and has written books on a dizzying range of science topics, including chemistry, science history, quantum physics, and neuroscience. He brings a much-needed wide angle lens to a field that, one senses from the outside, is mired in specialization-induced myopia.
Genes, Proteins, and Fuzziness
The first, and perhaps most important, takeaway from How Life Works is that the genome is not, in any meaningful sense, the “blueprint” for life, no more than a dictionary is a “blueprint” for David Copperfield. First, there’s the fact that only 2% of our genome encodes proteins. What’s the rest of it do? A lot, as you might expect—though for years, many scientists figured it was just left over junk. Much of it appears to encode RNA that performs a host of functions, perhaps most important among them being the regulation of gene expression—a subject to which we will return.
Let’s focus for a moment, though, on the question of what biomolecules do. Surely, proteins and all the other molecules in our cells must carry out specific functions, right? As Ball shows us, it’s often a little harder to pin down than that.
Sometimes, the classical model of biology we were all taught in high school really does work: there really is one gene (INS) that encodes one protein (insulin) that does one thing (facilitates the transfer of glucose—sugar—from the bloodstream into cells, thereby lowering blood sugar after meals). Actually, it’s a little bit more complicated than that—INS actually encodes a protein called preproinsulin, which is converted later on into insulin—but basically, the classical model holds.
Examples like insulin, though, are the exception, not the rule. Most of the time, nature is far fuzzier. Many genes encode far more than one protein, and many proteins are assemblages of smaller proteins (and RNA of various kinds) made from more than one gene. And just what a gene “does” is similarly fuzzy. A mutation of one gene in fruit flies was found to cause the flies to fail in growing wings, so it was dubbed “wingless” (wg for short). Later, though, that same gene was found in mice (and confusingly, dubbed Int1 in their genomic context) to be an oncogene—a gene likely to cause cancer when mutated. It turns out, though, that this gene and very similar ones are common across many species, leading the family to be named Wnt—a portmanteau of “wingless” and “int.”
What do the genes in the Wnt family do? As Ball writes, “the best one can say is that they are involved in passing chemical signals arriving at the outside of a cell into the inside of a cell. This is a bit like asking what the meaning of the word in is… it depends on the context.”
And as it happens, many biomolecules have context-dependent functions of this kind. They don’t “do” a particular thing. Instead, they are involved in complex information processing pipelines—some chemical signal arrives at the cell, either from another cell or from the outside environment—and an intricate array of biomolecules reacts in real time. Proteins, RNA molecules of spectacular diversity, and other biomolecules are enmeshed in networks that process incoming information in probabilistic, rather than deterministic, ways.
It gets even more complex. Some proteins involved in this probabilistic information processing do not even possess structure. These so-called “intrinsically disordered proteins” constitute a quarter of proteins in human cells, and an even larger share of proteins are believed to have significant disordered regions. Frustratingly, just as we have learned to reliably predict the structure of proteins with AI models such as AlphaFold, scientists are discovering the importance and diversity of these structureless proteins.
“Disorder” is not some accident, though. These proteins’ disorder allows them to be flexible, binding in many different ways with many different potential biomolecular partners. Thus, these blobs with varying biochemical properties play a crucial role in signal processing within cells and in the regulation of genes.
There’s that concept again—gene regulation. As Ball demonstrates, it is often wrong to think of genes as “doing” a particular thing. A big part of the reason for this is that genes are dynamically regulated by proteins and RNA molecules. DNA, as most people know, is transcribed into RNA, which is then either converted into a protein by the ribosome (the cell organoid known as the “protein factory”) or directly into RNA molecules that perform many different functions.
As Ball notes, however, this transcription process can be mysterious. Transcription factors are the proteins that bind to different genes to increase (or decrease) the odds of that gene being transcribed. Their precise effect on gene transcription is not, generally, a result of their structure; again, the transcription factors do not necessarily “do” anything per se.
Rather, their effect is determined by what other biomolecules bind to them. This may include other proteins—even other transcription factors—RNA molecules of various kinds, hormones, and much else (including some drugs). The other proteins that can bind to the transcription factors can themselves be bound to yet more biomolecules, creating cascades that ultimately influence whether a gene is transcribed (and remember, depending on their binding partners, transcription factors can increase or decrease the odds that genes might be transcribed—and the various bound molecules can contradict one another!). And guess what? Transcription factors very commonly have significant “disordered regions” as described above, allowing lots of flexibility (for them) and lots of frustration (for us, in trying to figure out how any of this really works).
Of course, it doesn’t stop there. Once a particular gene is transcribed, an entirely new set of complex processes kick in. After DNA is transcribed into RNA, the resulting RNA is not usually “read out” in a linear way: the RNA sequence does not encode an ordered list of amino acids that makes a protein. Instead, RNA sequences are cut apart by a dense thicket of proteins and RNA molecules called the “spliceosome.” The segments of the sequence that are cut out are called introns, and the segments removed are called exons. The introns were once thought to be discarded waste, but scientists have since learned that these introns go on to become bits of RNA that play an important role in all the processes described above. And as you might by now be expecting, the spliceosome itself is not deterministic: the behavior of its proteins is influenced by what molecules bind to those proteins.
The Cell as an Information Processing Network
Stepping back, then, I think Ball is suggesting a new mental model for the cell. Rather than thinking of it as a series of discrete “machines” that do deterministic things in a Newtonian manner, consider instead thinking of it as a densely packed city. Looking at a literal city from a 30,000 foot view, would you say that a particular resident “does” any one thing in particular? Of course not. Instead, a city is comprised of millions of people going about their days, each responding to the information they receive from the outside world (a police siren, news of an alien invasion, a friend shouting “hello!” from across the street) in different ways. They may come together in ensembles to respond to different things, and quite often they may do so purely by chance. They may exchange information, and each may be influenced by that information in different ways.
Surely they have a common, centralized repository of information. In the old world, this was something like a library; today, it is obviously the internet. They may consult the internet in different ways, each copying and pasting, editing, and transforming different bits of information contained within for their own ends. In a cell, this is the role played by genes. They are not necessarily dispositive of anything; the genome, in Ball’s view, is more like a database than it is like a list of instructions.
It can be daunting to zoom out and realize that you are comprised of 40 trillion of these cities, that this complexity is taking place in parallel throughout your body every second of the day, that this complexity is taking place inside of your brain all the time. “Model this!,” as Tyler Cowen might say.
And many are trying. They have made good progress, and I think we will take enormous leaps over the coming decade. Yet if this article does anything, I hope that it puts into context just how small the steps we have thus far taken are relative to the grand challenge of “solving” biology. There is much work to be done.
How I learned to stop worrying, and love inscrutable complexity
Perhaps the group that has gotten most ahead of itself in this regard are the AI doomers. AI, they contend, will enable a newfound mastery of biology that will in turn lead to terrifying biorisk: biological weapons custom-tailored to specific populations or specific people, novel viruses.
Perhaps one day, but I think the doomers are mistaken in their implicit belief about how far along we are on this path. Nature does not give up her secrets easily. And even as we do discover more of them, as I have written before, regulating AI models themselves is probably not the best way to mitigate the resultant risks (some of which, to be clear, already existed without AI).
There is another connection to AI here that I find even more intriguing. Ball masterfully depicts how biology is composed of a series of probabilistic information processing networks—a signal (incoming nutrients, pathogens, hormones—really anything you can think of) reaches a cell, and the signal is processed and transmitted throughout the cell by many linked molecules that influence one another in highly multi-dimensional ways.
Does this sound a little like a neural network to you? It does to me too. And so too does it to Ball. As he writes:
“What a protein means for its host cell is mutable with the state of the whole cell, and is thus literally absent from the sequence of the gene encoding it… There is a loose analogy with the way the computational architectures known as neural networks operate. These are systems used for machine learning algorithms such as AlphaFold, and they consist of interconnected ‘nodes’ at which several input signals from other nodes are integrated into a single output. None of these nodes’ functions are specified in advance by the programmer… it is often very hard to say what the ‘function’ of any given node in the network is after the system has been suitably trained. But it is certainly meaningless to ask what its function is before training; its role is only established through ‘experience.’”
But then, of course it works this way. The more I study about the world, the more I realize that every complex system works this way. Language works this way. Cities work this way. Physics works this way. Markets work this way. Of course the first flexibly intelligent software works this way. And so, of course, do we. I suspect that this relates to the nature of time itself: the future is unpredictable, and the world is decentralized. Deterministic systems are far too brittle to contend with the uncertainty of the future. The fuzziness of probabilistic systems can be frustrating for those of us trying to understand the world in “rational” terms, but fuzziness, it turns out, is useful. It is flexible. It is robust.
In that sense, we have not “solved” the famous protein folding problem. We do not have a list of rules or equations, or even a theoretical framework, for understanding how proteins fold. We have statistical models—the most notable of which is AlphaFold—which allow us to, as Ball phrases it, “sidestep” the problem. We can make increasingly good guesses about an ever-broadening set of scientific questions that have eluded us for generations. That will almost certainly lead to more medical treatments and potentially even a bigger Industrial Revolution based on bioengineering.
In one sense, that’s great news, but I suspect for some it feels like a bit of a letdown. We wanted answers. What are the rules that govern living things? Somehow predictions made by billions of matrix multiplications don’t quite feel like an answer. Yet this itself might just be a bias on our part. We want the world to be rational—even Einstein did. “God does not play dice,” he famously said. But perhaps God does play with dice. Perhaps that is the only way to make something like the universe work. Perhaps that is the only way to make it interesting.
The baggage of our deterministic, mechanistic world view has become heavy indeed. Sometimes I think it comes from the high modernism of the 20th century. Other times from Newtonian mechanics. Other times from Plato. It’s probably a bit of each, and much more. The origin doesn’t much matter beyond intellectual curiosity: the point is that this understanding of the world clouds our understanding, reduces the dimensionality of our thought, and blinds us to the majesty that is all around us—not only the majesty of nature, but also the majesty we humans have created.
Regardless of one’s goals, I think it is best to meet this reality head on. Casuality is overrated, Platonism is threadbare. Emergent orders are all around us—they are us. Whether you are making policy or proteins, it seems wise to me to accept this simple, though non-obvious, fact our world. Perhaps it should even be embraced.
“Forget the years, forget distinctions,” said the ancient Taoist philosopher Zhuangzi. “Leap into the boundless and make it your home!”