Google’s AI chatbot Bard is getting a major upgrade with a new large language model, extensions and the ability to connect to other Google products including search and maps. The update was announced at the company’s Google I/O developer conference along with a raft of other generative AI news including the release of the massive PaLM 2 large language model.
Bard was launched in March in response to the popularity of OpenAI’s ChatGPT and Microsoft adding AI chat to its Bing search engine. Initially launched with the 137-billion parameter LaMDA large language model, it has now been moved to the new 540-billion parameter PaLM 2.
Following moves by OpenAI and Microsoft, Google is adding extensions to the chatbot giving it additional abilities not possible for a large language model on its own. This includes integration with Google and third-party tools such as Spotify, Adobe, Uber Eats and Khan Academy.
It would, for example, allow a user to ask Bard to plan a trip to the beach along with a suitable playlist, and create a presentation with images to convince family members and a lesson on the insects they might encounter – all using natural language prompts.
As well as having a wider source of content to draw on, Bard is becoming more visual, with support for images as an input. This would allow for analysis of an image such as an organisational chart or convert a hand-drawn graph into an interactive one in sheets.
Google I/O news: loading images into Bard
Adobe’s image generation tool FireFly is also being integrated into Bard, where you can then export generated images to Adobe Express. Export options are getting a major upgrade. Users will be able to use Bard to write code and then export it to services such as Google Colab or Replit.
The entire Google I/O event was heavily influenced by artificial intelligence. Google confirmed foundation model AI would be coming to its full range of products. This includes cloud, Colab and Workspace products such as Docs, Maps and Meet.
Maps will become more immersive, using AI to create a new immersive view for routes that produce a digitally created model of what the route will look like including landmarks. The company’s core Google Search product is also getting an AI refresh, with the ability to directly respond to a query in a conversational way and allow for multi-part queries without follow-up searches.
There will be a “help me write” button in Gmail and Docs. Google gave the example that it could be used to write a reply to an airline asking for a full refund that pulls in flight details from previous emails and using phrasing that gives you the best chance of success.
Image generation is being incorporated into Slides, so the user can type what they want to see and have it generate options in the sidebar that can be pulled into the slide. Sheets will also be able to turn ideas into action points and create text insights from datasets. It can also improve labelling of data to reduce the burden of manual data entry.
Google’s new large language models: PaLM 2 and Gemini
Outside of Bard and generative AI, one of the most important announcements from I/O was the release of PaLM 2 itself, and confirmation Google is working on the next-generation model, codenamed Gemini. This will be fully multi-modal, meaning it has the ability to take a range of media as input and output a similarly diverse selection of formats.
PaLM 2 is being made available as an API to Google Cloud users, along with other enterprise-friendly updates including a dedicated coding model, an image generation model trained on traceable data, and a speech-to-text model.
Google has also fine-tuned PaLM 2 to work better across multiple languages and with coding, giving an example of a multinational project where developers need comments in different languages. One coder can write a function, send it to a colleague in another country and have PaLM 2 add comments in their colleague’s native language to explain how it works.
It has been trained to pass language proficiency exams at a “mastery” level across 100 languages. Google claims it can engage in these languages using nuance including idioms, poems and riddles. It has also been giving more reasoning capabilities, training on scientific papers with mathematical expressions. This, says Google, gives it the ability to show logic, common sense reasoning and mathematical abilities.
Beyond coding and language, PaLM 2 has been further tuned for medical knowledge and even on best practices in cybersecurity for use in securing cloud environments.
“This fine-tuning achieved a 9X reduction in inaccurate reasoning when compared to the base model, approaching the performance of clinician experts who answered the same set of questions,” declared CEO Sundar Pichai. “In fact, Med-PaLM 2 was the first language model to perform at “expert” level on medical licensing exam-style questions, and is currently the state of the art.”
The company also unveiled Gemini, the next-generation large language model. “We’re already at work on Gemini,” declared Zoubin Ghahramani, vice president, Google DeepMind, the British AI research unit recently fully integrated with Google’s AI team.
He did not offer any timescales on when the system might emerge, but said it would be “created from the ground up to be multimodal, highly efficient at tool and API integrations, and built to enable future innovations, like memory and planning.”
It is still at the training stage but comes with capabilities not seen in any previous Google model, according to Ghahramani, who says it will need to be fully tested for safety before being deployed, but once ready will replace PaLM 2 as the engine inside Google products.