Pixtral 12B: Mistral introduces a multimodal model that processes images and text | Mistral ai | Ai news | ai model | Turtles AI

Pixtral 12B: Mistral introduces a multimodal model that processes images and text
Available on GitHub and Hugging Face, Pixtral 12B aims to broaden the field of AI applications.
TheFrank

Pixtral 12B: Mistral’s multimodal model integrating images and text. French startup Mistral has launched Pixtral 12B, a 12-billion-parameter model designed to handle both images and text. Available for free on GitHub and Hugging Face, this tool promises to reshape the landscape of AI applications for academic and research purposes.

Highlights:

  • Pixtral 12B: New 12-billion-parameter multimodal model from Mistral, capable of processing both images and text.
  • Free availability: Accessible on GitHub and Hugging Face, with specific licenses for commercial and academic use.
  • Future integration: Soon available for testing on Mistral’s proprietary platforms.
  • Data controversies: Debates over the use of publicly protected data for model training.

 

Mistral, the young French startup in the field of AI, recently announced the release of Pixtral 12B, a deep learning model with 12 billion parameters that can process both images and text. This represents a significant development, considering that many existing models focus exclusively on one modality at a time. Pixtral 12B’s multimodal capability allows it to answer questions related to images of any size and resolution, offering a vast potential for applications in fields such as image recognition, automatic annotation, and visual analysis.

Pixtral 12B is based on a previous model by Mistral, Nemo 12B, optimized for natural language processing. By integrating image processing capabilities, the new model leverages the robustness of the neural network to handle both textual and visual inputs. Its implementation is similar to that of other multimodal models, such as those from Anthropic’s Claude family or OpenAI’s well-known GPT-4, enabling advanced functions such as image captioning and counting objects within a photo. The model’s versatility is supported by a large number of parameters, 12 billion, which ensures greater accuracy and problem-solving capacity compared to models with fewer parameters.

Pixtral 12B is available for download on GitHub and the Hugging Face platform via a torrent link. This allows developers and researchers free access to the model, further optimization, and adaptation to their needs. However, the model is distributed under a development license that requires a paid license for commercial use, while academic and research use remains free of charge. Currently, Mistral has not provided specific details on the license applied to Pixtral 12B; some of its models are released under the Apache 2.0 license, but this has not yet been confirmed for the new model.

At present, there is no working web demo of Pixtral 12B, but the startup has stated that the model will soon be available for testing on Le Chat and Le Platforme, its chatbot and API platforms. According to Sophia Yang, Mistral’s head of developer relations, the integration will be implemented in the near future, allowing users to fully explore the model’s capabilities.

An aspect that remains unclear concerns the data used for training Pixtral 12B. Most generative AI models, including those from Mistral, are trained on vast datasets publicly available online, many of which are copyrighted. This practice has sparked debates on the legitimacy of "fair use," with some model vendors claiming the right to scrape any public data, while copyright holders disagree and have filed lawsuits against some of the major AI developers, such as OpenAI and Midjourney.

The release of Pixtral 12B comes after a significant $645 million funding round led by General Catalyst, which brought Mistral’s valuation to $6 billion. Mistral, founded just over a year ago and partly owned by Microsoft, is seen by many as Europe’s answer to OpenAI. The young company’s strategy so far has involved releasing free "open" models, with the option of paid managed versions and consulting services for corporate clients. This dual strategy could help solidify its prominent position in the global AI landscape.

Considering the broader context of AI, Pixtral 12B represents a further step forward in the convergence of language understanding and image processing. The integration of these capabilities into a single model not only expands the range of possible applications but also offers new perspectives for use in the fields of visual data analysis and multimodal communication.