Mistral AI has teamed up with NVIDIA to release a new language model, Mistral NeMo 12B, aimed at enterprise applications such as chatbots, multilingual tasks, coding, and summarisation.
According to Mistral AI, the model features a large context window of up to 128k tokens and state-of-the-art reasoning, world knowledge, and coding accuracy for its size category.
Mistral NeMo is based on standard architecture, making it easy to use and a drop-in replacement for systems using Mistral 7B, said the French AI company.
The model was trained on the NVIDIA DGX Cloud AI platform, which provides scalable access to the latest NVIDIA architecture.
Mistral NeMo, which is a 12-billion-parameter model, has been released under the Apache 2.0 licence. It utilises the FP8 data format for model inference, thereby reducing memory size and speeding deployment without compromising accuracy.
NVIDIA stated that this enhances the model’s ability to learn tasks and handle diverse scenarios effectively, making it suitable for enterprise use cases.
Mistral AI cofounder and chief scientist Guillaume Lample said: “We are fortunate to collaborate with the NVIDIA team, leveraging their top-tier hardware and software.
“Together, we have developed a model with unprecedented accuracy, flexibility, high-efficiency and enterprise-grade support and security thanks to NVIDIA AI Enterprise deployment.”
Mistral NeMo also employs NVIDIA TensorRT-LLM for accelerated inference performance on large language models and the NVIDIA NeMo development platform for building custom generative AI models.
The model features a new tokenizer called Tekken, based on Tiktoken that has been trained on over 100 languages.
Mistral AI claimed that the tokeniser compresses natural language text and source code more efficiently than the previous SentencePiece tokeniser. It is approximately 30% more efficient for source code, Chinese, French, Italian, Spanish, German, and Russian, and 2x and 3x more efficient for Korean and Arabic, respectively.