Sign up for our newsletter - Navigating the horizon of business technology​
AI / Deep learning

Why ‘primitive’ enterprise search must change in a remote-first world

"IT teams are drowning in trivial troubleshooting."

Internal search tools across most enterprises are woefully inadequate; reliant as they are on primitive keyword tagging. The issue can undermine productivity and efficiency, as teams struggle to surface the material or data they need. In a world of sophisticated consumer-grade search tools, it’s a nagging anomaly that seemingly just won’t go away.

Tech Monitor spoke with the CTO of Moveworks, Vaibhav Nivargi, who is dedicated to tackling a problem that has grown increasingly prominent in a remote-first world.

enterprise search
‘If you do a quick search in an internal knowledge base at your company, you likely won’t find what you’re looking for.’ (Photo courtesy of Ahmed Zayan/Unsplash)

The tech industry has cracked the code on consumer search, but not so much on the enterprise side. Can you talk to us about the state of things in this arena?

Certainly. Before I dive in, it’s important to remember that consumer search hasn’t always been great. Google and other consumer search engines have made significant progress over the last few years, but it took decades for machine learning models to deeply understand language. Today, you can type almost any question into your search engine and get an answer in seconds, which is a product of several key breakthroughs in natural language understanding (NLU).

Conversely, the enterprise search systems we see at most companies are still broken. They’re still relying on basic NLU techniques like keyword matching, not taking advantage of the breakthroughs we’ve seen in the consumer search space. Case in point: if you do a quick search in an internal knowledge base at your company, you likely won’t find what you’re looking for. In fact, the answers you get probably won’t even come close to answering your question.

Why is enterprise search such a tough nut to crack?

One major challenge is training data. In the world of consumer search, Google benefits from billions of search queries each month — precisely the “big data” you always hear about in the tech world. When you combine extremely powerful machine learning models with a huge number of examples to learn from, the result is the relevance of Google’s search results continuing to improve.

Conversely, a typical company wiki or knowledge base is much smaller in scope, containing anywhere from 50 to 5,000 pages. And at the same time, few companies possess the necessary machine learning infrastructure or expertise to build better search systems themselves. That’s why consumer search is, by and large, so vastly superior to enterprise search. While Google’s algorithms learn from every click and every query, enterprise search systems don’t improve over time — even when those systems manage to produce articles worth clicking on.

What’s truly not working here, from a technology standpoint?

It all comes back to natural language understanding, which has seen a major renaissance over the last few years. Most companies don’t have the time, the infrastructure, the resources, or the in-house machine learning experts to capitalise on these NLU breakthroughs — like BERT in 2018 and GPT-3 this year. And most vendors in this space haven’t figured out how to apply the latest NLU techniques in the enterprise, given the lack of available training data.

Ultimately, that’s why the majority of companies still rely on primitive models like keyword tagging. You know, adding good old-fashioned meta-tags to an article. Needless to say, true reading comprehension requires more than matching keywords. For example, imagine you typed the following query: “Can someone help issue me a new monitor?” A conventional enterprise search system would surface any articles with the tags “help,” “issue,” and “monitor”: for instance, “Help with common monitor issues.” It’s the kind of brainless interaction we’ve unfortunately come to expect.

What’s left out of the equation here is syntax, which is no small oversight. Considering each word in isolation — while ignoring its role in the sentence — makes very different sentences appear identical:

● What is the process to order a monitor?

● I need to monitor the order process.

● I can’t process a monitor order.

Understanding syntax is the only way to make sense of natural language.

Another problem is that keyword tagging ignores semantics—that is, the meaning of the actual word or phrase itself. Take these two sentences:

1. The new intranet is a huge improvement! Use it. You won’t be disappointed.

2. The new intranet is a huge disappointment. I won’t use it!

The two sentences are direct opposites, from a semantic perspective. Yet they’re indistinguishable for the legacy search systems found in the enterprise, which is why we need to use models that account for syntax, semantics, context, and ambiguity in general when building a better solution.

Can’t we throw ML at the problem, in the same way we’ve done with consumer search?

The short answer is that we can, with difficulty. The slightly longer answer is that it requires new approaches to leverage machine learning, specifically the advanced NLU models that have transformed consumer search, in an enterprise environment.

On the one hand, a robust enterprise search system must be rooted in a robust understanding of language in general — in the same way that no one masters calculus before learning multiplication. This foundation of general-purpose NLU is where Transformers like BERT have changed the game.

On the other hand, tackling enterprise-specific language remains quite a nontrivial challenge, even with that solid foundation in place. One of the techniques that has proven critical here is transfer learning, which is just what it sounds like: transferring knowledge gained from solving one problem, like understanding generic language, to a related problem, like understanding how employees communicate at a particular company. But there are many techniques necessary to get enterprise search on par with consumer search, given the lack of training data at most companies.

 Why the urgency to fix it now?

This new normal of remote work has made fixing enterprise search a top priority, if it wasn’t already. Without the ability to get help in the office at the walk-up tech bar, employees have no choice but to either find the answers they need on their own or send an email to IT. And because enterprise search is so unhelpful at most companies, IT teams are drowning in troubleshooting questions, which slows the entire business down.

The other piece of the puzzle — which tends to get overlooked — is that enterprise search is a misleading term. It implies that there’s a single, Google-like interface for enterprise knowledge, when in truth, most companies have a plethora of siloed knowledge repositories, each containing different kinds of articles. So fixing enterprise search is about more than NLU; it’s also about making all knowledge searchable via a single UI. Certainly, this problem is more urgent in a work-from-home economy, in which remote employees are left to help themselves.

What’s going to be the next big breakthrough in this space?

Personalised search. When I ask Google for “restaurants near me,” the results are relevant because its algorithms know my location.

Most enterprise search systems lack this kind of basic contextual awareness today. But in theory, you can push the concept of personalisation even further in a corporate environment — factoring in everything from my office and department, to my seniority and role. Different employees often need different answers, even when they ask the same question. Of course, people already personalise their answers in conversation; a toddler asking his dad why the sky is blue would get a simpler response than a student asking her science professor. The advantage of machine learning is the ability to emulate our knack for personalisation, but with an exponentially larger knowledge base.

See also: Managing risks around machine learning’s top use case