Quick Start
1. Install OpenAI SDK
2. Configure Environment Variables
3. Call API
Python
4. Important Note
In multi-turn function call conversations, the complete model response (i.e., the assistant message) must be append to the conversation history to maintain the continuity of the reasoning chain.- Append the full
response_messageobject (including thetool_callsfield) to the message history- For native OpenAI API with
MiniMax-M3MiniMax-M2.7MiniMax-M2.7-highspeedMiniMax-M2.5MiniMax-M2.5-highspeedMiniMax-M2.1MiniMax-M2.1-highspeedMiniMax-M2models, thecontentfield will contain<think>tag content, which must be preserved completely - In the Interleaved Thinking compatible format, by enabling the additional parameter (
reasoning_split=True), the model’s thinking content is provided separately via thereasoning_detailsfield, which must also be preserved completely
- For native OpenAI API with
Supported Models
When using the OpenAI SDK, the following MiniMax models are supported:| Model Name | Context Window | Description |
|---|---|---|
| MiniMax-M3 | 1,000,000 | Latest M-series language model for agentic reasoning, tool use, coding, and long-context tasks |
| MiniMax-M2.7 | 204,800 | Beginning the journey of recursive self-improvement (output speed approximately 60 tps) |
| MiniMax-M2.7-highspeed | 204,800 | M2.7 Highspeed: Same performance, faster and more agile (output speed approximately 100 tps) |
| MiniMax-M2.5 | 204,800 | Peak Performance. Ultimate Value. Master the Complex (output speed approximately 60 tps) |
| MiniMax-M2.5-highspeed | 204,800 | M2.5 highspeed: Same performance, faster and more agile (output speed approximately 100 tps) |
| MiniMax-M2.1 | 204,800 | Powerful Multi-Language Programming Capabilities with Comprehensively Enhanced Programming Experience (output speed approximately 60 tps) |
| MiniMax-M2.1-highspeed | 204,800 | Faster and More Agile (output speed approximately 100 tps) |
| MiniMax-M2 | 204,800 | Agentic capabilities, Advanced reasoning |
For details on how tps (Tokens Per Second) is calculated, please refer to FAQ > About APIs.
For more model information, please refer to the standard MiniMax API
documentation.
Multimodal Input
OpenAI-compatible Chat Completions support text, image, and video input forMiniMax-M3.
Use image_url content parts for images and video_url content parts for videos. The detail field accepts low, default, or high and defaults to default; max_long_side_pixel can be used to control the longest side. Images support JPEG, PNG, GIF, and WEBP. Videos support MP4, AVI, MOV, and MKV; fps defaults to 1 and accepts values from 0.2 to 5. URL or base64 videos can be up to 50 MB, images can be up to 10 MB, and the request body can be up to 64 MB. For larger videos, upload through the Files API and pass mm_file://{file_id}; Files API videos can be up to 512 MB.
Image token usage depends on image size and content. Use this as a rough single-image heuristic; check response usage or token counting where available for exact usage:
detail | Rough single-image token usage |
|---|---|
low | Usually a few hundred tokens, up to ~600 |
default | Often ~1k-3k tokens, up to ~5k |
high | Often several thousand tokens, up to ~15k+ |
Python
MiniMax-M3 Request Parameters
MiniMax-M3 supports these additional Chat Completions parameters through the OpenAI-compatible API:
| Parameter | Description |
|---|---|
thinking | Controls MiniMax-M3 thinking. type can be disabled or adaptive; when omitted, thinking is on by default. For M2.x models, thinking cannot be disabled. |
stream_options.include_usage | When streaming, set to true to include token usage in the stream. |
max_tokens | Legacy generation length limit. |
max_completion_tokens | Generation length limit; use this field for new integrations. |
temperature | Sampling temperature. Range [0, 2], default 1. |
top_p | Nucleus sampling. Range [0, 1]. Default 0.95 for MiniMax-M3 and 0.9 for M2.x models. |
tools | Function tool definitions. |
reasoning_split | Output-format switch. When enabled, separates thinking content into reasoning_content and reasoning_details. |
Thinking Control
ForMiniMax-M3, the thinking parameter controls whether the model can emit thinking content.
- If
thinkingis omitted, thinking is on by default and the response includes thinking content. - Set
thinking: {"type": "adaptive"}to explicitly keep thinking on. For MiniMax-M3,adaptiveis equivalent to thinking on. - Set
thinking: {"type": "disabled"}to skip thinking and answer directly. - For M2.x models, thinking cannot be disabled;
thinking: {"type": "disabled"}is accepted but thinking remains on.
reasoning_split does not enable or disable thinking. It only controls how thinking content is returned: when true, thinking is exposed through reasoning_content and reasoning_details; when false, native Chat Completions responses keep thinking inside the content field with <think>...</think> tags.
Python
Examples
Streaming Response
Python
Tool Use & Interleaved Thinking
Learn how to use M3 Tool Use and Interleaved Thinking capabilities with OpenAI SDK, please refer to the following documentation.Tool Use & Interleaved Thinking
Learn how to leverage MiniMax-M3 tool calling and interleaved thinking capabilities to enhance performance in complex tasks.