ChatGPT will transform search, public and private

Earlier this week, Google published a clap-back. Nine pages long, the essay explained in clear and crisp language why the search engine had invested so heavily in AI research, and how it was using its insights for good – including mapping ‘nearly all known proteins,’ developing a speech model capable of translating 400 languages, and making key breakthroughs in quantum computing. Its search engine was mentioned only twice, as if to say to those critics who mocked the firm for its reluctance to incorporate any ChatGPT-like elements into its core product that no, all this other stuff is a little worthier than a chatbot that can write bad Nick Cave songs.

Worse yet was the news that Microsoft’s Bing – the search engine most people encounter in the fleeting moments before they switch their default browser from Edge to Chrome – was about to incorporate its own ChatGPT integration. At an all-hands meeting in December, engineers at Google privately criticised the company for not doing the same thing. According to a recording obtained by the New York Times, the real reason why the firm hadn’t followed suit was a mixture of fear – chatbots have a long and infamous history of spewing racist garbage – and disinterest. A service like ChatGPT, executives reasoned in public and in private, was not as transformative or lucrative as the many other lines of research Google was pursuing.

Sridhar Ramaswamy begs to disagree. A former chief of Google’s advertising business, Ramaswamy co-founded the search engine Neeva in 2019 with the intention of creating a privacy-friendly, ad-free, subscription-based alternative to his former employer. That premise only carried the business so far, Ramaswamy explains. “It’s not quite been the mass user movement that we would have liked,” he says of the pro-privacy crowd, partly because, for the majority of users, there’s only a superficial difference between Google and Neeva in how search results are actually displayed.

You can’t say that about Neeva AI. An integration of ChatGPT, the service responds to user inquiries with a brief answer summarising, with citations, the most relevant search results supplied by the ranking algorithm. In letting the AI loose on the results in this way, Ramaswamy believes that Neeva breaks the ad model for search engines by providing a straight answer to say, where to go out for a coffee in your local area without being bombarded with links to buy cafetieres or Costa Rican ground beans.

“We think of AI as X-Ray vision for the internet,” he says. “We’ll be able to peer into a page and get a gestalt of what it’s about. All of this, we think, will combine to create a very different search experience powered by AI compared to where it is today.”

Proprietary knowledge

Neeva isn’t the only search engine experimenting with using generative AI in this way. Rival You.com, for example, has launched a similar ChatGPT integration with a UI resembling the original conversational agent. It’s joined by Vectara, a self-described ‘neural search platform’ that uses natural language processing algorithms to provide what it claims are more accurate and salient responses to user queries.

A less sexy, but no less transformative, revolution is poised to take place in the world of private search APIs. Algolia, for example, has raised $334m to use its AI algorithms to sift through reams of unstructured data sitting across multiple corporate siloes. A similar undertaking is underway at Barcelona-based Nuclia. Its team, led by Eudald Camprubi, were inspired by the many users they’d encountered who regularly despaired at the inability of keyword-based searches to find relevant documents across different data sources and languages.

“So, we built something that is able to gather data, in any format and in any language,” he explains. “We extract all the text – if it’s a video, we run an automatic speech-to-text process – and we automatically vectorise information. And, once we have everything, we store it in Nuclia DB, our open-source database. And the result is that, when a user inputs a search query, the results that user gets are based on semantic and key word searches.”

The results obtained using a search engine based on Nuclia’s API are formatted as paragraph snippets extracted from relevant documents, in a similar manner to Google’s ‘featured snippets’ service. Camprubi accepts the comparison, but reiterates the adaptability of Nuclia’s API to firms that hold onto data in a variety of languages and data formats – or, in other words, most large companies on the planet. “We empower companies to build AI search on top of any data, without any effort,” he argues, “and they can have 100% ownership of that.”

Specific use cases abound, explains Camprubi, from making audio and video textually searchable to anonymising documents en-masse. The latter has led to new partnerships with pharma companies, but Nuclia also finds itself working with universities, water companies and, while “we don’t aim to build B2C solutions,” says Camprubi, podcast providers eager to rationalise the knotty search functionality of your common variety listening platform.

Nuclia isn’t a flawless solution to these problems: there’s the challenge, for example, of indexing vast amounts of corporate data, which can prove both slow and expensive depending on the size of the company. One of Nuclia’s clients was previously reported as requiring the start-up to index 100,000 PDF documents per month, when processing just 1,000 takes around a week. It’s why the start-up charges its clients by the amount of data they choose to index. Sometimes, explains Camprubi, it’s difficult to get across to prospective customers that, if they want to run a sophisticated AI-powered search engine, they need to pay for the compute power to run it. “People are quite used to pricing models such as those the traditional search engines are offering today,” he says.

Proceeding with moderation

Towards the end of our interview, a loud klaxon briefly interrupts Camprubi’s train of thought. The alarm sounds like it’s signalling the end of days, but the entrepreneur assures Tech Monitor that it’s just a doorbell being rung by an Amazon delivery driver. This happens minutes after Camprubi rejects the comparison that could be made between submitting queries to a ChatGPT-like search interface and the kind of voice assistants produced by the e-commerce giant and its rivals.

“As far as I understand it, nobody’s using Alexa to search for complicated information,” says Nuclia’s founder. “It’s an assistant, at the end of the day. It’s not something that you rely on to get informed about the world.”

It’s an answer that betrays Camprubi’s wariness about weaving generative AI into publicly available search engines. It’s one thing to offer the opportunity to a corporation to rationalise where and how they find their own data. It’s quite another to place a chatbot in a search engine and make it the effective arbiter of truth for billions of queries. “Solutions like ChatGPT, they offer you an answer,” says Camprubi. “Google is offering you results. So, if you get results, you can choose – you can read more than one opinion.”

Ramaswamy rejects this characterisation of generative AI in search engines, at least in Neeva’s case. For one thing, its model cites its sources: at a basic level, you can scrutinise which articles, documents or images it’s used to build its argument. Neeva’s ranking algorithms also help the AI to prioritise high-quality content, so that if a user does ask it a question about, say, notorious conspiracy theorist Alex Jones, it shouldn’t freely draw on material from Infowars to inform its answers.

“A search engine is a reflection of the internet,” says Ramaswamy, adding that he believes a search should never aspire to pass judgement on the truth (with very obvious exceptions for illegal or harmful content). “Neeva AI answers are not like us saying, ‘This is what we believe.’ It is us distilling the information from sites and really saying, ‘This is what these sites are saying.’ So, that voice is very important.”

It also needs to be reliable, a problem that seems to flow outwards from ChatGPT into the search engines experimenting with the service. You.com’s chatbot carries a disclaimer warning users that ‘I sometimes may get some answers wrong,’ while Ramaswamy concedes that Neeva AI has occasionally confused people with the same names (“The fix is about a week away,” he says.)

While these kinds of kinks and quirks might seem a little ominous to the uninitiated, Ramaswamy is convinced from Neeva’s recent growth rates that generative AI offers the kind of search experience users never knew they wanted. Ultimately, the demand for a quick, practical response to a query will continue to shape the future of search. “It’s clear that Bing and Google are going to be racing to offer these things,” he says. “But to me, the bigger question is whether they will distance themselves sufficiently from the ad model to produce a truly better experience.”

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

Proprietary knowledge

Proceeding with moderation

Read more: OpenAI’s ChatGPT is giving the rest of the world AI FOMO

Sign up for our regular news round-up!

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing