Microsoft launches Phi-3.5 series: new open, high-performance AI models | Generative ai Benefits for Business Examples | | Generative ai use Cases in Sales | Turtles AI
Microsoft has unveiled three new models in the Phi-3.5 series, with a focus on linguistic and multimodal AI. These models, released under an MIT license, offer advanced reasoning and visual processing capabilities, touting themselves as flexible and scalable tools for developers and businesses. Let’s look at the technical features and training of each.
Key points:
1. Three new AI models: Phi-3.5 Mini Instruct, MoE and Vision Instruct.
2. MIT License: Open and modifiable use, ideal for developers.
3. Intensive training: Use of high-performance GPUs.
4. High performance: Competitive in benchmarks against models from other industry leaders.
Microsoft recently announced the expansion of its offerings in the AI field with the launch of three new models in the Phi-3.5 series, called Phi-3.5 Mini Instruct, Phi-3.5 MoE and Phi-3.5 Vision Instruct. These models are designed to meet different needs in the fields of language reasoning and multimodal processing, offering developers versatile tools to create advanced applications. The uniqueness of this new offering lies not only in its performance, but also in Microsoft’s choice to release the models under an open source MIT license, which allows a wide scope for use and customization, with no constraints for commercial use.
The “Phi-3.5 Mini Instruct” model has been optimized to operate in environments with limited computational resources, while maintaining a robust ability to reason and perform complex tasks such as code generation or logical-mathematical problem solving. This model was trained on 3.4 trillion tokens using 512 H100-80G GPUs in a 10-day process.
The “Phi-3.5 MoE” (Mixture of Experts) model is a first in Microsoft’s lineup, leveraging an architecture that activates only a portion of the overall parameters for specific tasks, making it particularly efficient in tasks requiring complex reasoning. Trained on 4.9 trillion tokens with massive use of H100-80G GPUs, the training process took 23 days.
Finally, the “Phi-3.5 Vision Instruct” model is distinguished by its multimodal capability, being able to process and understand images, text and video. This model was trained with 500 billion tokens using GPU A100-80G, completing the training in 6 days. It is particularly suitable for applications in optical character recognition (OCR), graph and table analysis, and video synthesis.
All models are available for download on Hugging Face, with the ability for developers to customize and integrate them into their own solutions. The open source approach chosen by Microsoft underscores its intention to promote innovation and research in the field of AI by offering advanced models that compete with those of major players in the field, such as Google and Meta.
The Phi-3.5 series aims to be a valuable resource for those seeking advanced and flexible AI solutions, in an open licensing context that facilitates their adoption and integration into a wide range of applications.