MiniMax-M1 Challenges AI Giants With 1M Token Open Source LLM | Best large language models | Llm machine learning tutorial geeksforgeeks | Openai | Turtles AI

MiniMax-M1 Challenges AI Giants With 1M Token Open Source LLM
The new Chinese model combines efficiency, long contextual memory and low costs, surpassing DeepSeek and approaching the performance of OpenAI, Google and Anthropic
Isabella V18 June 2025

 

MiniMax-M1 is an open source LLM model released under the Apache2.0 license by MiniMax, a Shanghai startup backed by Alibaba, Tencent and IDG Capital. It boasts a 1 million token context window, efficiency thanks to the Lightning Attention mechanism and low RL costs.

Key Points:

  • Context up to 1M input and 80K output
  • Hybrid Mixture-of-Experts architecture with LightningAttention
  • RL training using CISPO algorithm in 3 weeks on 512 H800 GPUs
  • Training cost: about $535K, vs. 5–6M$ for DeepSeek R1

MiniMax, founded in Shanghai in 2021 and valued at about 2.5BUSD, published the MiniMax-M1 model on June 16, 2025 on GitHub and HuggingFace adopting Apache2.0 license, making it fully open-source. It is a 456 billion total parameter LLM, with 45.9 billion activated per token, based on a Mixture-of-Experts (MoE) architecture combined with LightningAttention, designed to contain computational costs during inference on extremely long contexts. The model manages up to 1 million input tokens — eight times the capacity of DeepSeek R1 — and 80,000 output tokens, exceeding DeepSeek’s 64,000 and approaching the most advanced performances such as Gemini 2.5Pro and OpenAI o3.

This significant increase in the context window makes M1 particularly suitable for dealing with very large texts, code or documents, such as collections of books or complex databases. The LightningAttention mechanism allows for remarkable computational efficiency: generating 100,000 tokens consumes only 25–30% of the FLOPs required by DeepSeek R1. Performance is confirmed by benchmarks comparable or superior to established open source models such as DeepSeek‑R1 and Qwen3‑235B, and approaches proprietary peers (Gemini2.5Pro, Claude4 Opus, OpenAI o3) in mathematical reasoning, software engineering, and tooling tasks.

Training used an innovative approach: a large-scale RL with the Clipped Importance Sampling Policy Optimization (CISPO) algorithm, which modulates sampling weights instead of tokens; this, combined with the hybrid architecture, allowed the RL phase to be completed with 512 H800 GPUs in three weeks at an estimated cost of $534,700, about a tenth of the millions invested by DeepSeek R1 and far less than the hundreds of millions estimated for GPT‑4. MiniMax offers two distinct variants: M1‑40K (up to 40,000 token output) and M1‑80K (up to 80,000 tokens), to support different response length needs.

From an adoption perspective, the model is already supported by infrastructures such as vLLM and Transformers, and includes structured function calls, chatbot APIs, online search tools, image/video generation, speech synthesis, and voice cloning — useful in advanced agentic scenarios. Its open license makes the model attractive to enterprises, developers, and the research community, offering transparency and freedom for commercial adaptation. 

MiniMax‑M1 represents a significant step forward in the Chinese open-source LLM landscape, combining rich context, computational efficiency, and low training costs.