Meet BERT: An NLP Technique that can Distinguish Paris from Paris Hilton

When Google introduced its language model, BERT (Bidirectional Encoder Representations from Transformers) in October 2018, it set a precedent for a new wave of Natural Language Processing (NLP) tools – ones which don’t just look at sequences of words, but intelligently contextualise them, writes Anders Arpteg, Head of Research, Peltarion.

NLP itself is one of the most classical forms of AI, with data scientists starting to experiment with it as far back as the 1950s. Just as AlexNet’s development in 2012 saw an explosion of models used for understanding images, BERT is following in these footsteps when it comes to analysing text. Yet, questions remain about how to adopt BERT and the applications and use cases it might have for businesses. Here’s what they need to know before getting to grips with it.

It’s All in the Meaning

Firstly, the reason that BERT is so much more advanced than other NLP models is its ability for contextualisation, or specifically to understand the meaning, semantics and intention behind words. For instance, say the word ‘Paris’ is used in a sentence being analysed. While other NLP models would only look at that particular word in isolation, BERT looks bi-directionally at it in the context of the whole sentence – recognising, for example, whether the word was referring to Paris the city, or Paris Hilton the socialite.

Because BERT is built using deep learning, it has a far more natural interpretation of text than traditional approaches, which largely only look at the format of the text and have no understanding of meaning. The reason BERT is so well-placed to contextualise text is in part due to how well it’s trained. Once the model has been pre-trained using vast amounts of data (as Google has done), it is then finetuned for a specific task or business process using a small amount of raw text. This not only produces more accurate results and naturalistic textual analysis, but because you don’t need lots of data annotated by humans for the second, finetuning stage, BERT is also far easier to train than other NLP approaches.

A New Era of Business

As with any new technology, though, how you use it can often be the challenge. While Google (at least initially) is using BERT to improve user searches, there are in fact lots of practical uses:

It can help market researchers to quickly and accurately build questionnaires by analysing textual similarities in previous questions.
It can find hard-to-locate answers to questions in large bodies of text – a must in academic research where researchers have to ‘read’ a large volume of papers for a single nugget of insight.
It supports more accurate text classification, where bodies of text are assigned categories based on content, like detecting spam emails by identifying structural features of the text and highlighting those that seem suspicious.
It can also support sentiment analysis, which has great potential in areas like social media, marketing and brand management (though it does require a few thousand extra annotated examples of text).

We’re also seeing apps and platforms using BERT to analyse text in ways never thought of before. An app has recently been developed using BERT to identify message patterns in Slack and alert people when a message is posted in the wrong channel. What’s more, from a website that can decipher whether a song’s genre is pop or rock simply based on its song lyrics to a platform that can identify the author to which your writing style is similar by analysing a sample of written text, innovative practical uses are only set to flourish.

Challenges to Adoption

This said, for all of BERT’s advantages, implementing it is by no means plain sailing. For one, NLP generally has a long way to go before it’s on par with humans at understanding nuances in text. For instance, if you say, ‘a trophy could not be stored in the suitcase because it was too small’, humans are much better at understanding whether it’s the trophy or the suitcase that’s too small because we have that all-important background knowledge. In addition, the complex coding used to build BERT can mean many developers and domain experts aren’t equipped to deal with it, and, despite being open-sourced, it’s hard for many companies to make use of it. BERT was ultimately built by Google, for the likes of Google, and with tech giants having not only access to superior skills, but resources and money, BERT remains inaccessible for the majority of companies.

These challenges shouldn’t deter businesses from deploying BERT though. In fact, the only way for BERT models to improve and become more accessible is by operationalising them – in other words, ensuring more people start experimenting and integrating the model into their systems. To get started, businesses should first identify which specific processes require AI to become more efficient, and then ensure they use it for processes with as much historical data as possible. Finding a use case that has this textual data available will make it far easier to get it up and running with success. The bottom line is experiment, experiment, experiment, as much as you can, even supplementing your use cases by participating in research groups where possible for extra experience.

The BERT of the Future

As more organisations begin realising NLP’s potential, the power BERT boasts to transform business operations is only set to grow. Yet, as a fairly recent development that remains almost exclusively in the hands of a few tech giants, it’s understandable mainstream companies aren’t jumping at the chance to deploy it in the real world. This is why it’s vital as many organisations as possible start putting it to use, using tools to help operationalise AI, so barriers to adoption can be lowered and its potential to transform the world can be realised. For example, an operational AI platform using the BERT model allows users without extensive data science experience to start reaping the benefits of accurate NLP. After all, it may be complex in its design, but BERT is surprisingly generic and something all businesses can – and should – experiment with.