Have you ever found yourself pondering about the future? Lately I have been pondering quite a bit. I submitted my PhD in June and will be defending it this month (September). It made August the ideal month to think at the highest level, to ponder on where things are headed and what big trends are unfolding. This blog is a collection of some of my thoughts so far, focused on technology and biology, the fields in which I like to ponder. Now, often, to ponder about the future, it is quite convenient to understand the past first. In this case, to understand evolution.
Where we’re at in bio
Billions of years of evolution have gotten us from simple single-celled organisms to the incredibly diverse and complex array of life forms that populate our planet today. This remarkable outcome is the result of the immensely elegant processes of natural selection and genetic adaptation. In the last decades, we went from deciphering the DNA helix by Watson and Crick in 1953, to being able to manipulate and edit DNA with remarkable precision using technologies like CRISPR-Cas9. These advances have opened up new frontiers in genetics and biotechnology, enabling us to potentially cure genetic diseases, enhance crop yields, and even explore the boundaries of human longevity and aging. But the truth is: biology is tremendously complex. This is why today, biologists are not out of a job yet! Quite the contrary, they continue to look to further improve how good we understand biology and push what we can do with our collection of biotechnological tools.
And today, scientists are increasingly getting help from technology. In particular, advancements in artificial intelligence (AI) have carried over to computational biology, enabling new ways to analyse and model the ever-growing amount of biological data. The more we see this blended field of AI+bio push forward, the more scientists are beginning to realize that AI can be really beneficial to bring our biological knowledge to new levels and to engineer biology in ways that haven’t been possible before. For example, Hie et al. used a protein language model to efficiently evolve human antibodies, while Yeh et al. used protein structure models for de novo enzyme design.
Discovery and design
I’ve been looking at this blended field through two complementary pillars of biological research and development: discovery and design. On the one hand, discovery is about sampling and characterizing biological entities in their natural environments. This pillar is extremely important, because it allows us to learn what biology looks like in nature, and learn from it. But many gaps remain, many biological entities remain to be discovered. An extreme example of this are viruses. There are an estimated 10^31 viral particles in nature, however our collection of publicly available viral sequences ranges only in the tens of thousands. These days, advances in metagenomics are steadily improving our rate of discovery, and AI is poised to enhance that discovery further.
On the other hand, design is the pillar that revolves around adjusting or engineering biology to improve it and to better suit our human needs. One could argue, as Rob Toews mentions in his recent article, that natural evolution is just the tip of the iceberg, because it is only a stochastic process that stumbles on combinations of nucleotides that happen to work. Therefore, there might be other combinations that work even better, or work just a little different, and could be very valuable for a variety of purposes.
So, making progress in biological research and development is about two goals: (1) to discover biology as broadly as possible, which enhances our knowledge of how biology works in nature; and (2) to design biology as broadly as possible, to search for combinations with improved or altered attributes that are useful to mankind. Both of these goals have historically been much of a trial-and-error process, of which results were not often shared properly in academia, resulting in more trial-and-error, which is why progress in biology is slow.
Today, we’re at the cusp of changing that, because of the increasing availability of data, compute and powerful AI models. Now, we want to leverage these three pillars to improve discovery and design in biology. More specifically, there’s a great need and desire for us humans to develop powerful AI-driven models both for discovery and design that can allow us to sample biology broadly → learn from that discovery → design and validate improved biology → and iterate over all these steps quickly.
How best to make progress?
If we stick to the hypotheses that progress in AI will continue and our quest to understand and engineer biology will continue (e.g., ending all diseases, age reversal, etc), then it is not far fetched to see that AI-driven biology can become an important portion of biology R&D in general, and it makes sense to invest time and energy into it. But then the question becomes: how do we go about that? How do we best invest our time and energy to make progress in biology R&D? I don’t think this question has an easy answer, not a single answer at least.
Obviously, academic institutions have played an important role historically in driving progress, and continue to do so today. But we’ve also seen industrial R&D labs make significant progress in various fields (e.g. Google DeepMind, OpenAI, Meta AI team, etc). And of course, companies with commercial interests also contribute to this progress (remember the speedy developments of vaccines against SARS-CoV-2).
But there is also a relatively new concept that I’ve discovered recently: focused research organizations (FROs). These are non-academic, non-profit startups that can fill the gap of a bunch of research projects that won’t be pursued by other players such as academia or big companies. I really think that FRO’s focused at the intersection of AI and bio can potentially do a lot of magnificent things. In the longer run, such an FRO could even turn into a well-established ‘digital biology lab’, much like Google DeepMind and OpenAI. It could, for example, operate in a capped for-profit structure like OpenAI, while being open-source first like StabilityAI. Focusing on building tools and platforms for digital biology applications, a lab like that could become highly impactful and useful to the world.
In the end, if we want to make progress at the intersection of AI and bio, we’ll need all of these different players (academia, FRO’s, startups, larger companies, etc) working on interesting problems and questions, and sharing resources, data and findings openly and often. During the COVID pandemic, scientists made tremendous progress in a fraction of the time because of global cooperation and sufficient funding. Why don’t we aim for that all the time?