Connor Leahy thought this might happen. “I’ve been basically waiting for this moment for at least two years now,” says the co-founder of the AI start-up Conjecture, referring to the current buzz surrounding ChatGPT. Deeply enmeshed in the development of large language models (LLMs) since his time at the helm of EleutherAI, an open-source cooperative of machine learning engineers and enthusiasts, Leahy got his general amazement at the capabilities of these programs out of the way way back in 2019, when he first glimpsed GPT-2. “Now, I feel like a lot of people are having the reaction that I had,” he says.
That reaction can generally be described as one of delight, awe and not a little foreboding. Built using GPT-3, an LLM boasting some 175 billion parameters, ChatGPT has knocked the world for six for its effortless ability to write reams of eloquent, incisive prose (and some poetry, too.) Joined by other services like DALL-E 2 and Midjourney, the market for generative AI has suddenly become very hot indeed, with Microsoft poised to dramatically increase its initial $1bn investment in ChatGPT’s creator, OpenAI, with a cash injection of up to $10bn. Little wonder then, that increasing attention is being paid to the development of alternative LLMs to GPT-3 – foundation models that other tech giants would likely snap up in a heartbeat.
One of these alternatives is Jurassic-1. Numbering some 178 billion parameters in its largest form, the LLM is the brainchild of AI21 Labs, a research start-up-turned-product and platform manager founded in 2017 with the goal of creating models specialising in language generation and comprehension. “We didn’t want to play a pure research game like DeepMind,” says Ori Goshen, AI21 Lab’s chief executive (though it, too, is flirting with the idea of its own superpowered chatbot.) “We admire DeepMind, we think they’re great. But we also wanted to bring real commercial value from the get-go.”
That same instinct is present at Cohere, another AI start-up based out of Toronto. Its latest model, explains its chief operating officer Martin Kon, powers classification, semantic search and content moderation across 160 languages. “That might not be exciting to consumers who enjoy generating poems about their cats or images of dogs in sushi houses, but we feel it’s certainly incredibly exciting to CEOs and executives of companies, organisations, and governments of all sizes everywhere in the world,” he says.
The increasing buzz surrounding generative AI seems to be helping both start-ups hit their stride, with Cohere raising $125m in Series B funding in February last year and AI21 Labs’ Jurassic-1 released on AWS’s machine learning platform Sagemaker in November. This doesn’t mean that AI21 Labs is imitating OpenAI’s close relationship with Microsoft, says Goshen. While he doesn’t rule out the possibility of a much closer partnership with a large platform – “I mean, never say never,” says Goshen, highlighting AI21 Labs’ fruitful partnerships with Amazon and Google – he maintains that the company remaining neutral when it comes to cloud providers has long-term benefits, not least in allowing it to explore using the latest computing hardware whatever the provider.
A similar mindset seems to prevail among start-ups in the LLM space, including BigScience and its BLOOM model, and Anthropic, which just released ChatGPT rival Claude to mixed reviews. As the market for new foundation models continues to heat up, Goshen predicts that demand will inevitably grow for LLMs to move away from being trained on pages scraped en masse from the internet to narrower, proprietary datasets. “That can create these more specialised models, maybe [for] specific domains, or even…models that have an understanding of a specific company,” he says.
An LLM alignment problem
It’s impossible to tell, however, whether we’re seeing the emergence of a dynamic marketplace for LLMs or the beginning of a longer process of consolidation, as Big Tech companies fight to acquire as much talent and capabilities for themselves as possible. What we may see, argues Leahy, is something akin to Goshen’s prediction, with the creation of smaller models designed to accomplish narrower goals. The creation of larger and more complex LLMs, however, will continue to depend on the goodwill and infrastructure of hyperscalers. “There really are only a few actors in the world who are capable of mustering the resources to train something or build something like GPT-4,” he says, referring to OpenAI’s next LLM, due to be released sometime this year. Details of the system have yet to be revealed, but many in the field predict it will surpass most other LLMs in size and complexity.
Even so, OpenAI’s CEO Sam Altman recently told Reuters that the firm wouldn’t release the model until it met strict safeguarding benchmarks. It’s a challenge Leahy sympathises with. Known in AI research as the ‘alignment problem,’ Leahy spends much of his time at Conjecture figuring out how to bind new models to quintessentially human ethics and motivations. While he believes that generative AI has the potential to become “unimaginably positive for the world,” he worries that not enough people realise machine intelligences think very differently to the people they’re serving – and that those who do realise, like OpenAI and other start-ups, don’t know yet how to make them hold to our common set of values. It’s certainly something to consider as more LLMs hit the market in the coming year. As psychiatrist and part-time AI guru Scott Alexander recently put it in his own post about ChatGPT, ‘[t]his thing is an alien that has been beaten into a shape that makes it look vaguely human. But scratch it the slightest bit and the alien comes out.’