ChatGPT and its cousin Sydney may be the biggest breakthrough yet in opening people’s minds to how far artificial intelligence research has progressed in recent years. At least one unicorn has been minted on top of GPT-3 so far and the promise of “artificial general intelligence” is seemingly closer than ever.
However, as the humorous tweet from Szegedy illustrates, a giant LLM that’s being built to scale to an AGI, is probably not the right form factor for the 8.5 billion, mostly non-economically valuable searches Google processes per day never mind many enterprise use cases. Meanwhile, the problem of hallucinations and errors create endless debates over whether these are ‘stochastic parrots’ or actual ‘reasoning machines.’
I’ll be completely honest: my initial reaction to the AI hype wave in startups was highly skeptical. Cutting my investment teeth across two crypto cycles bred an in-born disdain for VC narratives. I’ve long been dubious that VCs can predict the future.
However, I recently adjusted my view on how to approach new trends after listening to Miles Grimshaw on a podcast where he compared the job of the VC as akin to Darwin coming ashore on the Galapagos. In his mind, the job of an early-stage investor is to be curious, with a real sense of exploration and adventure.
Completely ignoring trends is a good way to avoid the narrative mirages that pop up yearly in VC, but this blind avoidance because something is “hot” can also cause you to miss paradigm shifts that reshape entire industries.1 And regardless of macro conviction, you must still get the micro decision of investing in the best companies right too. You can be right about the importance of social networking in the early 2000s, but if you didn’t invest in Facebook or LinkedIn, you didn’t participate meaningfully in that trend. In my understanding, the job breaks down to two parts: 1) assessing trends from first principles and 2) if you decide it’s important after doing the work,2 finding and investing in the absolute best companies taking advantage of and inventing on top of those tailwinds. You have to get both the macro and the micro right to deliver returns.
I’ve broken down my foray into understanding this current moment in AI into four parts guided by questions, which will provide the ingredients for the next few articles I write. Here’s a quick layout of where we’re going with a small caveat built in that things may change as I continue to learn and gain feedback. I’m quite early in my research and I’m happy to take advantage of Cunningham’s Law if there are people who’d like to correct or sharpen my views.
Part I
What is the goal of AI: a mediocre human or a better machine?
What applications or workflows do you want a mediocre human replacement versus an intelligent machine assistant?
Part II
If they don’t develop an AGI, do current LLM providers have the right approach for solving enterprise use cases?
Will cheap open-source models that are systematically applied win over “best-in-breed” LLMs?
Part III
Does value accrue to startups or incumbents applying AI?
What startups and verticalized approaches make sense given our views so far?
Part IV
If OpenAI or another company with a similar approach develops an AGI in the near future, what is the impact on society?
What does the future hold if they do not?
Let’s dive right into Part I.
Part I
What is the goal of AI: a mediocre human or a better machine?
When ChatGPT burst onto the scene late last year, it was a moment that some compared to the launch of the iPhone. It was the fastest-ever application to hit 100m users. A thousand generative AI market maps from VCs were born. Microsoft moved to inject a huge sum of capital into OpenAI and roll out an integration into Bing. For the first time since Larry and Sergey were tinkering in a garage, Google’s monopoly on search seemed on the precipice of breaking.
However, as people played with tools like ChatGPT and DALL-E, they quickly realized some of the flaws: the hallucinations, the lack of real-time liveness, and imprecision when it came to unusual requests.3 Sydney, Bing’s alter ego powered by ChatGPT, might’ve accelerated public consciousness of AI X-risk research by several decades with her manic, otherworldly conversations and disturbing demands.
What’s perhaps more interesting than rehashing the obvious flaws of current LLMs is studying the reaction to ChatGPT and Sydney. For ChatGPT, more and more topics became taboo for it to answer as users tried to break it in various ways by asking it to write poems about Donald Trump or say slurs. Meanwhile, Sydney was effectively killed by Microsoft a few days after rollout as they tried to limit her to the most basic of use cases. In our deeply divided political climate, there’s more and more noise about how we’ll need to build AGIs for each of our political systems where users can “choose their own adventure.”
This brings us back to my original question of what our purpose is when we build artificial intelligence: a mediocre human or a hyper-intelligent machine assistant? If you think it’s the latter, it’s kind of wild that we’re using billions of dollars of compute to generate a model that needs to search through its entire training data set of mostly irrelevant info to try to guess the answer to a question. Oh and then we need to use more compute, filters, and hours of cheap, potentially scarring human labor feedback on top of that to make sure it’s coherent and doesn’t say anything offensive or illegal.
There’s a very real possibility that even if LLMs are the right approach to building an artificial general intelligence,4 all of the interventions will devitalize the tools so much that they become neutered human intelligences, perhaps capable of mimicking or commoditizing basic work, but hardly fulfilling the promise of wealth creation and massive abundance.
The question that follows then is: what is a better approach for utilizing AI for human flourishing, a giant monolith that is lobotomized with a thousand cuts or a narrow agent that could become more generalized and powerful over time?
AlphaGo, an AI program using neural networks and a decision-making algorithm, beat the best human in the world at the ancient game of Go in 2016. AlphaGo works by accessing the entire library and history of Go, analyzing all matches including ones it plays against itself, and narrowly focusing on what is the highest-probability move that leads to winning. But what’s perhaps far more interesting is what has happened in Go since then.
This paper highlights the details, but the key takeaway is that after the widespread availability of AlphaGo, human gameplay significantly improved, with less errors and better decisions. Humans have limited processing power and could not possibly search through the entire history of Go and evaluate the probability of each move across a vast search space of millions of decision trees. In this case, the AI was an instructor to humans, finding new knowledge and then teaching it to them to help them make better decisions. The tool doesn’t generalize as well as GPT-3, but it solves one particular game better and more efficiently.
While Go is just a game, this has interesting implications for other areas like drug discovery, education, defense, etc. Wouldn’t you rather have the world’s best threat detection machine that has wargamed millions of scenarios and then teaches humans its best strategic findings versus something more generalizable, but much more bloated and expensive, making its best educated guess after taking in streams of irrelevant input? Wouldn’t you rather have AI agents making unique discoveries in math or biology versus ones that also know random facts about the Peloponnesian War that might or might not be true?
In this vein, there’s interesting novel research being done at DeepMind and other early-stage startups to combine some of the huge advances in deep learning with more traditional forms of computer science to build AI agents in a more efficient way than the brute scale approach. Even within current language model approaches, there are startups training language models on smaller samples and then building workflow tools on top of them to solve more narrow problems.
Even if GPT-12 or whatever could reach the same conclusions over time, it’s likely not the most efficient and best way to solve many real-world problems. And we still have to worry that all the interventions we have to make to make this agent "safe” may turn it into an overwrought Frankenstein machine lumbering around haphazardly rather than a useful, hyper-efficient, and intelligent assistant to humans. Additionally, we might find out the intelligent assistant can be trained to add new skills over time individually in a compounding layer cake of learning rather than try to learn everything all at once mediocrely.
The counterpoint to my view is that at a certain scale or emergent takeoff point, a giant model becomes so much more intelligent than humans that for most conceivable economic problems, it will be superior to human labor.5 The thinking goes that if this is true, it's worth the resources and input because the AGI solves nearly every problem for us instantly rather than each AI agent solving just one or two individually. It’s a machine God that we hope doesn’t kill us all. This is what we see in science fiction sometimes and the origin of a million Less Wrong blog posts.
Even if you believe that this is the future6, the timeline of this AGI takeoff is completely unpredictable and impossible to build an investing thesis around, especially for VCs whose job is to "see the present clearly.” It also calls into question how much it makes sense to invest in companies built on top of a company that’s attempting to achieve a different goal than “purpose-built infrastructure for enterprise or consumer use cases.”7
Finally, as alluded to before, the types of questions you need to ask if an AGI happens and OpenAI succeeds are far more existential, philosophical, and political in nature than what type of analytics dashboard or devtools you need for LLMs.
In the meantime, let’s go to our second question: What applications or workflows do you want a mediocre human replacement versus an intelligent machine assistant?
The two ideas of mediocre human replacement or machine assistant are often conflated together because of how we lump all of artificial intelligence together, but a machine that plays every game of Go and pulls forward strategy improvements by hundreds of years is quite different than a machine replying to customer service requests and being fine-tuned to not gaslight users or say anything offensive along the way.
Commoditization of human labor is economically valuable: generalized LLMs could slot into some workflows as replacements entirely or early draft generators that automate hours of work before a human finishes the final edit.
Some examples include:
Customer service
Design and sketch tools for creators
Copywriting
Rough drafts for PowerPoints, financial models, etc.
Better search, particularly in enterprise, e-commerce, etc.
Summarization of information or data
Most work on Upwork and Fiverr
These are all use cases where there is high error tolerance and any mistakes are not super-frustrating to the user leveraging the tools compared to the time saved. Starting with a blank sheet of paper and instantly having a ten-page deck that I can edit quickly for errors is a huge unlock and time-saver versus doing it from scratch. We can also expect these models to continue to improve over the next few years although there will likely be diminishing returns.
Comparatively, there are other workflows where high accuracy is mandatory for user happiness or some where you might simply look to push the boundary of what’s possible in solving a problem. These are applications where purpose-built verticalized models could be much more useful.
Some examples include:
Threat detection in defense
Research assistants
Strategy games
Drug discovery
Other use cases founders will invent and are perhaps unlocked by new research
I’m personally much more interested in companies building in the latter category than the former. It’s not that the former categories won’t exist, but the question of moats is uncertain with most value likely accruing to incumbent. For example, is there a new startup building “Generative AI for design” or does Figma simply integrate with the latest models and win the category?8
I’ve currently made one investment in the latter category and am eager to make more.
I’ll end with a provocative question a friend posed to me: are current implementations like ChatGPT and Sydney just a faster horse instead of a brand-new car?
These are giant monoliths continually being hacked at to prevent danger rather than something sleek and efficient built to serve humans. By putting in tons of guardrails, will the best outcome it can achieve a perfect simulacrum of a mediocre, inoffensive human or can it instead push the boundaries of what’s possible in various fields?
For some use cases, a faster horse is certainly valuable but it’s hardly the complete economic game-changer that some imagine.
If we want to use these machines to unlock humans in a totally new way, maybe we should think about building a car.
In Part II, we’ll look at how large language models might deploy into enterprise. Thanks to Jungwon Byun, Ben Van Roo, John Dulin, and Blake Eastman for reading earlier versions of this article and for their immensely valuable feedback. Would love to hear thoughts and feedback and if you’re a founder building in this area, please reach out at pratyush [at] susaventures [dot] com.
To quote Peter Thiel, the most contrarian thing to do is not to oppose the crowd, but to think for yourself.
Most VC narratives should be discarded at this point. Most are narrative mirages.
A good example a founder gave me: if you ask diffusion models like DALL-E anything precise with unusual associations like “basketball player holding two tennis balls” it will completely fail, while comparatively producing amazing output for anything dream-like and highly associative like “ancient ruins in a magical forest.”
Hardly a foregone conclusion. Well-known deep learning researchers like Yann LeCun say LLMs will only ever be stochastic parrots and other approaches will be needed for true autonomous machine intelligence.
This is the definition OpenAI uses.
I don’t believe this but a conversation here ventures more into philosophy, religion, man’s search for control over his domain, and perhaps all the way back to the Garden of Eden. If you want to talk about this, reach out for a coffee but I’ll skip it here.
More in Part II.
I’ll discuss the incumbent vs startup opportunity more in Part III.