Writer’s Note: I’m deeply indebted to the people who have helped me throughout this learning journey: first and foremost, Jacob Kimmel from NewLimit for the time and so many of the frameworks and ideas here. Additionally, I’m thankful for Stacy Li from Berkeley, Nick Phillips from Stanford, and Eryney Marrogi from Caltech, as well as many others who granted me time over the last few months.
The opinions of this blog post are mine and mine alone and do not necessarily reflect the investment thesis of my colleagues at Susa, Humba, or Kivu who may be interested and investing in different types of bio companies.
First, a warning:
“Evolution of a generalist fund in bio:
Fund 1: software for bio
Fund 2: diagnostics
Fund 3: therapeutics
Fund 4: unable to raise”
-Anonymous investor friend
Over the last decade, there are two sectors in venture capital where few generalists have dared to tread outside of bubble times: crypto and biotech.
Crypto is extremely hard to understand for the outsider, very easy to lose money in, and extremely volatile.
Biotech is extremely hard to understand for the outsider, very easy to lose money in, and extremely volatile.1
I tend to fancy myself as a generalist, but bio was the one sector I didn’t even bother thinking about over the past three years as a VC. It was best left to the experts, the Boston biotech funds or the few deep tech funds with life sciences teams.
That changed when I was fortunate enough to attend a retreat with some of the best biotech founders and scientists in the world a few months ago. The conversations were inspiring and the potential for progress so tangible that despite my trepidations about the space, I decided I wanted to dive down the rabbit hole.
Here’s a synthesis of what I’ve learned so far.
Section 1: Why Therapeutics
As my anonymous investor friend jokingly put it, the initial temptation for any generalist investor wanting to dabble in biotech is to look for software companies serving biotechs and pharma companies. We don’t have to lose our precious, possibly fake software margins, we know the playbooks of scaling SaaS companies, and hey, there’s even two success cases: Veeva and Benchling!
Let’s further break down software companies into two types, with the latter being one that people are even more excited about particularly in the age of “Generative AI”:
Type 1: Workflow tools to complement the office suite
“Biotechs are difficult customers because they have heterogeneous needs, big margins that tolerate inefficiencies, and conservative decision making -- important to remember that Merck is older than the United States. The few universal workflows (lab notebooks, clinical trial project management) have decent incumbents (Benchling, Veeva) and the verticals get 10-100X smaller from there.”-Jacob Kimmel
There is a lot of current buzz about companies to manage the modern bioinformatics stack, but it’s safe to say, the opportunities that might actually scale to $200m+ in ARR are fairly picked over.
One subtle nuance around software solutions is that adoption and retention do not work the same way in biotech SaaS as in regular SaaS. Quoting Stacy Li, a Ph.D student at UC Berkeley:
“In my experience, senior academics (i.e. professors) are not exactly early or eager adopters, especially if their research grants won't cover the cost of per-seat software. Industry research groups (inc. clinical research) tend to pick one platform through word of mouth and avoid switching because of how onerous the process can be.
CAC is significantly higher than other enterprise sectors, but LTV scales commensurately. There are certainly some exceptions and verticals where good businesses can be built, but require heavier capitalization than typical SaaS given barriers to entry, low willingness to pay, etc.
Type 2: Drug design tools that help build better drugs
The proverbial AI-for-drug-discovery company often sits in this camp (especially if founded by ex-SWE/ML folks rather than people from the hardscrabble field of biotech). The idea is something like, “Hey, we’ll use Gen AI to discover better drugs, you will pay us for access to our software prediction platform.” The problem is that it’s extremely hard to convince anyone your predictions are worth anything without the laboratory data to back it up. As a friend from a public biopharma company said, “Idea generation is the easy part, idea validation is what matters.”2 There are currently no successful “predictions” companies.
It’s possible this changes with improvements in AI/ML, but if you’re in the business of selling drug candidates without laboratory data, the bar is crazy high: you need to be able to say something like “We have 90% confidence that at least one of these ten drug designs we’re selling you will work.” That seems pretty unlikely without the lab data to back it up and if you’re doing the work and fronting the cost of getting lab data, you might as well start building your own therapeutics pipeline.3
As for other non-therapeutics companies (i.e. tools, automation, and diagnostics), the platform case looks interesting on first glance, but usually fails to materialize. Schrodinger, the prototypical drug design tools company, now has its own pipeline. Vial started with a focus on clinical trials automation for expediting drug development, but has since layered on its own therapeutics pipeline. There are few examples of successful enduring service product companies, and the modal expectation should be the following:
"The first law of biotechnology is that every tools or diagnostics company eventually becomes a therapeutics company.”-Patrick Malone, KdT
A good precursor example of this were the waves of genomics companies in the early 2000s. Many tried to sell target information or predictions (inCyte and Millennium even built $100m/yr revenue businesses doing this), but eventually all pivoted to developing drugs internally. A well-differentiated therapeutic is the highest-margin entity in the business and the “currency of the industry,” as Recursion puts it in public filings.
My view on biotech is pretty simple: the money is in the drugs.
If you want to build a big biotech company that goes public and endures for decades, you probably will need to start making therapeutics at some point.
On a less financial-focused note, but a more important one: therapeutics are where modern biotech has made its biggest impact. Like a lot of Americans, I’ve long been skeptical of “Big Pharma” – and rightfully so! There are many indications that I think are downstream of our broken Western diet or environment, rather than ones we can solve only with drugs. However, through reading stories like Invisible Frontiers, Billion Dollar Molecule, and Breath from Salt over the past few months, I’ve realized 1) how much progress has been made in medicine in treating and managing historically lethal diseases (eg: cystic fibrosis), 2) how earnest and ambitious the people working in the field are, and 3) how much more progress there still is to be done.
Section 2: “Techbio” vs Biotech
Before we start talking about why I’m interested in techbio companies, it’ll be helpful to build a definition of what I mean by “techbio.” As you may surmise from the previous section, when I say techbio, I don’t mean exclusively AI-for-drug-discovery companies selling predictions/drug design tooling to biotechs, nor do I mean other companies building “foundation models of biology” unless they’re planning to use that to make therapeutics.4
Jacob Kimmel’s article on techbio provides a good overview of the concept, but the key synthesis is the following, paraphrased from his article:
A techbio firm builds an in silico model of a biological process and can successfully predict the effect of changes to key parameters, collects and curates a proprietary dataset describing a biological process/system, and then generates value from the model by predicting and validating useful modifications to the biological process that make it faster, cheaper, and more effective.
The value from techbio firms is realized when the proprietary model and dataset can be used to find solutions that solve otherwise intractable problems (i.e. treat diseases with high commercial value.)
A successful modern techbio firm has loose hypotheses around the indication, modality, or target they’re going after. For a company like Recursion, the hypothesis is something like “small molecules exist that rescue genetic diseases and we can find these small molecules by seeing what chemicals make small cells “look” healthy under a microscope. Unlike most modern biotechs, they aren’t tied to a specific disease or molecule from the outset. From there, they aim to find a globally maximum solution within their specific search space through the combination of wet-lab and dry-lab work they do. Rather than trying to be a horizontal platform for drug discovery, they invest in the process of finding optimal solutions to complex problems.
If you’re newer to bio, you may be wondering: how is this different from traditional biotechs?
“Traditional biotechs often focus on areas of biology that are relatively well-characterized as a means of efficiently searching through the intractably large number of hypotheses that face them. In therapeutics, there are typically many firms competing to make medicines against the same known targets.” - Jacob Kimmel
These are the types of companies that traditional biotech investors often back: the science is well-understood from academia, you install a professional management team to spin out and run the company, and often there are multiple companies going after the same target. Unsurprisingly, the attrition rate of these companies is high. On average, it takes about $1B for a drug to move from being a promising research candidate to an approved therapeutic. Someone told me there were 200 companies going after CAR-T cell therapy for cancer in 2020, with only about 30-40 left now.
Techbio firms are often derided as “spray-and-pray” by traditional drug developers because they don’t use as many existing heuristics or prior knowledge to limit their search space, but perhaps it would be more apt think of this as a mistake of omission: biotechs are good at finding the local maximum, but techbios may be able to find a global maximum and be an asymmetric bet on finding outlier drugs.
Section 3: The Anti-Techbio Case
The promise of techbio is that advances in ML and and high-throughput lab automation enables companies like Recursion to build better “maps” of biology, creating higher success in clinical trials for drugs,5 while also enabling companies to scale from one asset to the next, realizing the previously much ballyhooed platform potential.
Despite the optimism around techbio, there actually aren’t any AI-driven drugs that have made it to market. Recall that the average successful drug development timeline is about eight years and Recursion’s clinical trial results so far aren’t great. Bio data sets are WAYYYY noisier than what ML people usually expect or used to working with. Consequently, feedback loops on these data sets are super long. On top of that, not all biological data are amenable to the ML techniques used successfully in other domains. Genomic foundation models (GFMs) have long been touted as the holy grail for dissecting genetic disease, but GFMs don’t seem to benefit from the same training paradigms as LLMs. Data and methods go hand in hand: as long as one falls short, techbio isn’t likely to hit its stride.
Or as Ron Boger put memorably in his Anti-TechBio screed:
“Reality has crushed this thesis. For each disease, new models are required, and little of our understanding scales from model to model. The activity of elements in biology is always context dependent, and it is currently impossible to recreate the entire context of human biology let alone between preclinical models of even related indications…Biology is not a playing field conducive to systematization.
He also critiques the “platform” potential of Techbios:
“While private investors see TechBio businesses as platforms and thus insulated from risk associated with single asset therapeutics businesses, public markets disagree. Single asset or biology focused therapeutics are risky from the angle that if the hypothesis is wrong, the project gets shut down. Fortunately, capital requirements for these businesses are back loaded and trials are only greenlighted if the risk reward profile is favorable. In contrast, platform TechBio business have both front loaded and back loaded capital requirements. The costs of building, maintaining, and growing platforms can be enormous given that entirely new datasets need to be generated, labs built, and specialized talent hired. Recursion raised nearly $1 billion USD before ever conducting a clinical trial. Single asset businesses require investment for trials but are able to skip spending on discovery costs.”
There’s also one final sobering point (albeit dated to 2022):
“Since 2010, TechBio businesses represent 5 of the 10 worst performing IPOs while accounting for just 5% of market entrants.”
So the question is: is this all a sham and the product of misguided investor expectations about what AI/ML can do in biology? Or will the next generation of techbio companies change the narrative?
Section 4: Refutation of the Anti-Techbio Case
We’ve now started to see evidence disproving the belief that biology is “resistant to systematization.”
Examples include:
BigHat has models for designing binders, specific to the type of binder, but not the indication.
Dyno has models for designing AAV capsids, not indication-specific.
Exscentia has models for designing small molecules to any target, not indication-specific
Recursion now uses a general embedding model trained across all their data, not specific to any indication
Absci is showing success in antibody design.
We agree that biology is complex and thus hard to understand, but as Blake Byers put it, that makes it more likely to be machine-legible than human-legible. Why?
Nick Phillips, a Ph.D candidate from Stanford, explains:
"Biology has been historically studied descriptively, but on the spectrum of few atoms <-> more atoms, it is sandwiched by rule-/law-based disciplines (physics and chemistry <-> physical sciences). In the last couple decades, biology has become a quantitative science, but the orders-of-magnitudes it occupies are too large to model from physical laws and too small to just gloss over things like individual molecular interactions…
So most bio measurements (sequencing, microscopy, -omics) are unavoidably dealing with noise, and it’s really hard to tell where noise is just technical noise or actually some kind of masked signal due to an imperfect model or not looking at the data in the right way. AI/ML offers a toolset that is possibly more flexible and provides useful context compared to conventional statistical methods and is a good tool for interpreting this sort of data.”
The anti-techbio case also argues that searching through large hypothesis spaces isn’t a good use for ML, but we already have examples contradicting that: Dyno has designed AAV capsids that far outpace prior state-of-the-art and Sanofi's mRNA therapeutics team has recently reported LNP designs that achieve the same. When it comes to novel drugs for known targets, we don’t have any drugs on market yet, but our belief is that will change in the next decade.
One interesting valuation nuance is that if AI + bio expands the search space and increases the likelihood of yielding novel targets, assuming all else equal in preclinical / clinical validation, these assets should also more valuable in acquisitions or other exits than a comparably positioned drug in a more crowded space. First-in-class drugs tend to make more money than second-in-class.
A key part of searching the hypothesis space effectively is that while the hypotheses are loose, good techbio companies are focused on highly specialized problems.
“Good techbio companies look for problems that are amenable to the search strategies enabled by their platforms: techbio hinges upon ML, and ML hinges upon good data. Thus, not all diseases are good targets: for example, assuming that the valuation of the indication isn't a concern, rare genetic diseases are going to be much harder to target than common lifestyle & environmentally driven diseases like non-alcoholic fatty liver disease.” - Stacy Li
Good techbio companies are usually generating first-of-a-kind datasets that are then being fed into models that will help them improve the next generation of datasets, which then feeds back into the model to improve it, which then improves the next set of dataset generation, and so on. The techbio bet is that this large scale experimentation and widening of the search space will help us find novel and ideally global maximum solutions.
The proof will be in the pudding in ten years whether it actually does so in a scalable way. It’s worth noting that despite all the hoopla around “AI-for-drug-discovery” companies, many venture funds, even ones initially interested in the thesis, have shifted away from investing in techbio companies due to the very public failures in the space. They’ve hired traditional life sciences investors and many people want to know the more traditional questions: what’s your target, what’s the data package you can deliver for your next fundraise, etc. This is an existential risk for techbio companies and investors: even if you’re willing to step out on the risk curve early, the question will be whether there’s going to be downstream investors who will back you.
“So if you are the one firm that strikes out and funds asymmetric companies, no one will show up to fund the company downstream and your whole portfolio will fail. For this reason, I think the biggest plays in bio over the coming decade will be mostly backed by generalist funds rather than bio funds. *Note this will perpetuate the trend of sector specific funds underperforming generalist funds (by industry and geo!) - Blake Byers
Our bet and hope is that Blake’s right.
Section 5: The Techbio Bull Case and Platform Power
The bull case for techbio companies is that unlike a lot of AI companies, they are generating and owning a differentiated and proprietary dataset. That dataset feeds into a proprietary model that will hopefully lead to a drug that’s worth billions in revenue. As the dataset improves, the model should improve and it should create a data flywheel that leads to a compounding advantage for the company.
Biotechs have traditionally struggled to develop moats that expand beyond a single asset (i.e. a single drug, engineered strain, or diagnostic test), a key reason for the acquisitional dynamic of the market. Going back to Ron’s critique of techbio, it’s usually the case that each new asset requires starting from scratch. Sure, you may have some talented people in-house, but there’s no real advantage they have versus other talented teams. There’s usually nothing that compounds from asset #1 to asset #2.6
Conversely, for a techbio firm, the datasets and model should be cornered resources with scale economies that create a compounding advantage for the company. As Jacob puts it, “As the data corpus grows, the in silico model makes better predictions that help a firm expand their data corpus more effectively (e.g. by only running experiments that provide non-redundant information).” This is a classic marriage of two powers: there’s a benefit in product creation and improvement, but it’s also a barrier for would-be competitors.
To put it more concretely, the business power that a successful techbio firm has makes it significantly more likely to become a multi-asset platform compared to a traditional biotech.
In a conversation around Platforms and Power, Hamilton Helmer and Chenyi Shi lay out the framework around how companies build platforms.
The podcast is worth listening into full, but in essence, they create a 2x2 matrix based level of knowledge along the axis of skills and need.
Diversification: When a company has low skills on the existing team and low knowledge of the customer need, usually a disaster (eg: Apple building a car)
Reinvention: When a company has a high degree of knowledge about the problem, but low skills around how to solve it and must go acquire these skills. Most of these fail and it requires founder-level sponsorship to pursue the transformation. The example Helmer gives is AWS (worked) and Alexa (didn’t work).
Co-Action: When you have a good amount of skill on the existing team, but low knowledge of the customer need. Helmer’s example is Apple making the iPhone after the iPod. The skills of the team were transferrable, but they would have to go acquire the knowledge of how to design, manufacture, and market a new type of smartphone.
Power Extension: When you have both high skills on the team and high knowledge of the customer need, the next product is simply an extension of your core business power. Netflix going international was an example of this: the fixed cost of content was a core business power, as international customers largely watch the same shared content. Porsche’s power is its brand and car design which sells the same when it goes into a new geography. Conversely, Uber thought going international was a power extension of its technology, but its power actually lay in dense network effects that needed to be built anew in each geography. Sometimes they won and sometimes they lost, but it was a knife fight in each new market with their technology advantage playing less of a factor.
I would argue that would-be platform biotechs fall squarely in the camp of either reinvention or co-action. In Billion-Dollar Molecule, Vertex started by after a small molecule to bind to an immunosuppressant target for lead to better organ transplantation success. The actual successful first drug they launched was an HIV drug. Today, their most successful drug is related to cystic fibrosis. In each case, it was either a co-action (same team or skills, totally new problem) or a complete reinvention through acquisition.7 While pivoting sometimes works, the more common history for most biotechs is an inability to continually come up with the next asset, followed by eventual acquisition, declining relevance as the drug patent expires, and/or failure. This is exemplified by the fact that most approved drugs originate in smaller, newer therapeutics companies, rather than the largest and most well capitalized firms. If making one asset made you more likely to succeed at another, we would see the opposite dynamic.
A successful techbio’s business power is the cornered resources of its model and dataset, but both of these have returns on scale. As the datasets and models improve, each new asset is easier and cheaper to produce, creating a platform of products with compounding business value. This is power extension.
It’s a hypothesis but we believe that:
Techbios have platform power, biotechs do not.
Section 6: Genetics of Techbio Startups
Perhaps the most fun part about reading the history of early successful biotech pioneers like Genentech and Vertex was seeing how much of the early days mirrored the true go-go startup ethos of Silicon Valley before the mid-2010s. These were teams of young, unproven scientists and postdocs eating Szechuan food at 9 pm, sleeping at their offices, and going after impossibly difficult problems.
I don’t think anyone was looking to maximize their Eight Sleep score.
The early teams were confident, opinionated, and convinced they had figured out something no one else had. Unlike most tech startups which have professionalized and homogenized, many biotech startups still resemble those early days, borrowing largely from the academic settings they often grow out of: people are used to waiting 13 hours to see the results of an experiment and then perhaps not show up the next day for work even though it’s a Thursday if they need to sleep in. Perhaps that free-wheeling grind culture is why one of the few places we’ve seen advances in the world of atoms over the last fifty years is in the world of medicine.
My belief is that the best techbio companies will have some of the following characteristics:
An element of a great story and narrative goal they’re going after
Somewhat unproven, non-famous, but rising star scientists, insanely motivated to prove themselves perhaps paired with some more traditional drug hunters
Computational bio people who have sat in labs and know how long bio data takes to generate and how messy and noisy it is8
At least one experienced business leader on the early team who can help navigate the treacherous fundraising and GTM path in a realistic way
Tight feedback loops between the dry lab and wet lab
Maniacally focused on a highly specialized problem with a set of loose hypotheses about how to solve it
Build like a software company with strong iterative cycles and faster predict-validate feedback loops
It’s also worth noting that the founders of techbio startups will look significantly different than traditional biotechs. In some ways, biotech startups look like the traditional startups of the 90s: “professional management” at the top, while the scientists play more of a secondary role. The goal is to derisk as much as possible and run a tried-and-true playbook for liquidity/acquisition.
In contrast, we see an opportunity for biotech to undergo the same founder-led revolution that took place post-2000s in tech. We see founders who are looking to lead their companies for multiple generations. They will have some of the best traits of tech founders: technical with commercial instincts, first-principles thinkers, adaptable learning machines, chips on their shoulder, likely young and “unproven” in the biotech world, but with a new way of envisioning how the world may look if their hypotheses are right.
Section 7: Recap and Areas of Interest
A quick recap of our opinions so far:
We are mostly interested in therapeutics companies. Wholly-owned clinical assets are how you build a generational company in the space. One exciting element of being a generalist fund is we are not limited by sector, so we only have to pick on an absolute basis amongst all seed-stage companies, rather than a relative basis.
The specific flavor of therapeutics companies we’re interested in are techbio companies. Techbio companies are companies focused on highly specialized problem (eg: using transcription factors to reprogram cells to solve aging like NewLimit, finding better antibodies for new biologic drug design like BigHat, etc.). They generate first-of-a-kind proprietary datasets and build models to find novel and differentiated solutions to problems.
Despite agreeing with some of the critiques of previous “techbio” companies and perhaps things being too early a few years ago, we do see the next decade as one where the convergence of bio and ML will lead to actual drugs in market that work.
Unlike single-asset biotechs, techbios have potential platform power due to their proprietary datasets and model, a compounding business power of cornered resources and scale economies that will lead to feasible and economically valuable future products.
Successful techbio companies will have the same chutzpah, drive, speed, and crazy ambition of the previous generation of successful startups, both in biotech and software.
Their founders will resemble the best founders of tech than their traditional biotech counterparts.
Like with all early-stage investing, the number-one thing will be focusing on people and finding the spiky people that you only encounter a few times a year at most.
However, here are some potential areas of interests with interesting tailwinds that I’ve gleaned from conversations with smart bio folks:
Combinatorial drug discovery
Cell-based delivery systems for therapeutics
Generative design of novel enzymes & virus like particle delivery
ML-guided design of “corrector” small molecules
Binders (seems like models are particularly good at designing these)
Etc…
If you’re a founder building a techbio company, I’d love to chat. Email me pratyush [at] susaventures [dot] com
Usually due to the binary risk of success/failure in clinical trials. Even after raising hundreds of millions of dollars, a company’s ultimate destiny is often completely unclear until they have trial read-outs.
One of the most important parts of a techbio platform should not only be shortening the process to arrive go from A to B, but also using data to narrow the focus and cull likely invalid ideas.
A good data point is that Schrodinger, the legacy player in the space of drug design tooling, spent 30 years below $100m ARR and recently pivoted into starting a therapeutics pipeline. By contrast, there are over 170 individual drugs doing $1B in annual revenue. Order-of-magnitude difference in value capture.
The data to build foundation models with therapeutic applications are often not publicly available. A few sources like the Protein Data Bank that enabled AlphaFold structure prediction or GenBank sequences for DNA and protein language modeling are more an exception than the rule. We lack data to build models that answer questions like "What target should I go after to treat patients with disease X?" or "Will drug Y work in patients Z?" These datasets must be constructed in-house by techbio firms. (Recursion, Noetik, Insitro, etc. are examples of companies doing this)
Even going from 90% failure to 80% failure means DOUBLE the amount of successful drugs on market, a huge improvement over the status quo.
Difficulty and cost of drug development is also nonuniform across disease areas, so investing in new assets can incur significant risks. For example, pain & ophthalmology are extraordinarily expensive areas (~200M+ cost of development), and respiratory drugs are only half as likely to pass phase 3 (efficacy) trials compared to hematology drugs.
A nuance here is that scaled pharma companies (eg: Eli Lilly, Novo Nordisk, etc.) are successful platforms because their core business powers are process power and economies of scale. Acquiring companies/assets or even developing new assets in-house is an example of power extension.
Often a bugaboo for more traditional ML folks