Get API Key
- Coding Plan:Visit API Keys > Create Coding Plan Key to get your API Key
Coding Plan only supports MiniMax text models. See Coding Plan Overview for details.
- Pay-as-you-go:Visit API Keys > Create new secret key to get your API Key
Pay-as-you-go supports all modality models, including Text, Video, Speech, and Image.
Text Generation
The text generation API uses MiniMax M2.1, MiniMax M2.1 lightning, MiniMax M2 to generate conversational content and trigger tool calls based on the provided context. It can be accessed via HTTP requests, the Anthropic SDK (Recommended), or the OpenAI SDK.Supported Models
| Model Name | Context Window | Description |
|---|---|---|
| MiniMax-M2.1 | 204,800 | Powerful Multi-Language Programming Capabilities with Comprehensively Enhanced Programming Experience (output speed approximately 60 tps) |
| MiniMax-M2.1-lightning | 204,800 | Faster and More Agile (output speed approximately 100 tps) |
| MiniMax-M2 | 204,800 | Agentic capabilities, Advanced reasoning |
Anthropic API Compatible (Recommended)
Use Anthropic SDK with MiniMax models
OpenAI API Compatible
Use OpenAI SDK with MiniMax models
Text to Speech (T2A)
This API provides synchronous text-to-speech (T2A) generation, supporting up to 10,000 characters per request. The interface is stateless: each call only processes the provided input without involving business logic, and the model does not store any user data. Key Features- Access to 300+ system voices and custom cloned voices.
- Adjustable volume, pitch, speed, and output formats.
- Support for proportional audio mixing.
- Configurable fixed time intervals.
- Multiple audio formats and specifications supported:
mp3,pcm,flac,wav(wav is supported only in non-streaming mode). - Support for streaming output.
Supported Models
| Model | Description |
|---|---|
| speech-2.6-hd | Latest HD model with outstanding prosody and excellent cloning similarity. |
| speech-2.6-turbo | Latest Turbo model with support for 40 languages. |
| speech-02-hd | Superior rhythm and stability, with outstanding performance in replication similarity and sound quality. |
| speech-02-turbo | Superior rhythm and stability, with enhanced multilingual capabilities and excellent performance. |
Available Interfaces
Synchronous speech synthesis provides two interfaces. Choose based on your needs:- HTTP T2A API
- WebSocket T2A API
Supported Languages
MiniMax speech synthesis models offer robust multilingual capability, supporting 40 widely used languages worldwide.| Support Languages | ||
|---|---|---|
| 1. Chinese | 15. Turkish | 28. Malay |
| 2. Cantonese | 16. Dutch | 29. Persian |
| 3. English | 17. Ukrainian | 30. Slovak |
| 4. Spanish | 18. Thai | 31. Swedish |
| 5. French | 19. Polish | 32. Croatian |
| 6. Russian | 20. Romanian | 33. Filipino |
| 7. German | 21. Greek | 34. Hungarian |
| 8. Portuguese | 22. Czech | 35. Norwegian |
| 9. Arabic | 23. Finnish | 36. Slovenian |
| 10. Italian | 24. Hindi | 37. Catalan |
| 11. Japanese | 25. Bulgarian | 38. Nynorsk |
| 12. Korean | 26. Danish | 39. Tamil |
| 13. Indonesian | 27. Hebrew | 40. Afrikaans |
| 14. Vietnamese |
HTTP T2A API
Synchronous speech synthesis via HTTP
WebSocket T2A API
Streaming speech synthesis via WebSocket
Asynchronous Long-Text Speech Generation (T2A Async)
This API supports asynchronous text-to-speech generation. Each request can handle up to 1 million characters, and the resulting audio can be retrieved asynchronously. Features supported:- Choose from 100+ system voices and cloned voices.
- Customize pitch, speed, volume, bitrate, sample rate, and output format.
- Retrieve audio metadata, such as duration and file size.
- Retrieve precise sentence-level timestamps (subtitles).
- Input text directly as a string or via
file_idafter uploading a text file. - Detect illegal characters:
- If illegal characters are ≤10%, audio is generated normally, with the ratio returned.
- If illegal characters are >10%, no audio will be generated (an error code will be returned).
Supported Models
| Model | Description |
|---|---|
| speech-2.6-hd | Latest HD model with outstanding prosody and excellent cloning similarity. |
| speech-2.6-turbo | Latest Turbo model with support for 40 languages. |
| speech-02-hd | Superior rhythm and stability, with outstanding performance in replication similarity and sound quality. |
| speech-02-turbo | Superior rhythm and stability, with enhanced multilingual capabilities and excellent performance. |
API Overview
This feature includes two APIs:- Create a speech generation task (returns
task_id). - Query the speech generation task status using
task_id. - If the task succeeds, use the returned
file_idwith the File API to view and download the result.
Create Async Task
Create a long-text speech generation task
Query Task Status
Query speech generation task status
Voice Cloning
This API supports cloning voices from user-uploaded audio files along with optional sample audio to enhance cloning quality. Use cases: fast replication of a target timbre (IP voice recreation, voice cloning) where you need to quickly clone a specific voice. The API supports cloning from mono or stereo audio and can rapidly reproduce speech that matches the timbre of a provided reference file.Supported Models
| Model | Description |
|---|---|
| speech-2.6-hd | Latest HD model with real-time response, intelligent parsing, fluent LoRA voice |
| speech-2.6-turbo | Latest Turbo model. Ultimate Value, 40 Languages |
| speech-02-hd | Superior rhythm and stability, with outstanding performance in replication similarity and sound quality. |
| speech-02-turbo | Superior rhythm and stability, with enhanced multilingual capabilities and excellent performance. |
Notes
- Using this API to clone a voice does not immediately incur a cloning fee. The fee is charged the first time you synthesize speech with the cloned voice in a T2A synthesis API.
- Voices produced via this rapid cloning API are temporary. To keep a cloned voice permanently, call any T2A speech synthesis API with that voice within 168 hours (7 days).
Voice Design
This API supports generating personalized custom voices based on user-provided voice description prompts. The generated voices (voice_id) can then be used in the T2A API and the T2A Async API for speech generation.Supported Models
It is recommended to use speech-02-hd for the best results.
| Model | Description |
|---|---|
| speech-2.6-hd | Latest HD model with real-time response, intelligent parsing, fluent LoRA voice |
| speech-2.6-turbo | Latest Turbo model. Ultimate Value, 40 Languages |
| speech-02-hd | Superior rhythm and stability, with outstanding performance in replication similarity and sound quality. |
| speech-02-turbo | Superior rhythm and stability, with enhanced multilingual capabilities and excellent performance. |
Notes
- Using this API to generate a voice does not immediately incur a fee. The generation fee will be charged upon the first use of the generated voice in speech synthesis.
- Voices generated through this API are temporary. If you wish to keep a voice permanently, you must use it in any speech synthesis API within 168 hours (7 days).
Voice Design API
Generate personalized voices from descriptions
Video Generation
This API supports generating videos based on user-provided text, images (including first frame, last frame, or reference images).Supported Models
| Model | Description |
|---|---|
| MiniMax-Hailuo-2.3 | New video generation model, breakthroughs in body movement, facial expressions, physical realism, and prompt adherence. |
| MiniMax-Hailuo-2.3-Fast | New Image-to-video model, for value and efficiency. |
| MiniMax-Hailuo-02 | Video generation model supporting higher resolution (1080P), longer duration (10s), and stronger adherence to prompts. |
API Usage Guide
Video generation is asynchronous and consists of three APIs: Create Video Generation Task, Query Video Generation Task Status, and File Management. Steps are as follows:- Use the Create Video Generation Task API to start a task. On success, it will return a
task_id. - Use the Query Video Generation Task Status API with the
task_idto check progress. When the status issuccess, a file ID (file_id) will be returned. - Use the Download the Video File API with the
file_idto view and download the generated video.
Video Generation Agent
This API supports video generation tasks based on user-selected video agent templates and inputs.Overview
The Video Agent API works asynchronously and includes two endpoints: Create Video Agent Task and Query Video Agent Task Status. Usage steps:- Use the Create Video Agent Task API to create a task and obtain a
task_id. - Use the Query Video Agent Task Status API with the
task_idto check the task status. Once the status isSuccess, you can retrieve the corresponding file download URL.
Template List
For details and examples, refer to the Video Agent Template List.| Template ID | Template Name | Description | media_inputs | text_inputs |
|---|---|---|---|---|
| 392747428568649728 | Diving | Upload a picture to generate a video of the subject in the picture completing a perfect dive | Required | / |
| 393769180141805569 | Run for Life | Upload a photo of your pet and enter a type of wild beast to generate a survival video of your pet in the wilderness. | Required | Required |
| 397087679467597833 | Transformers | Upload a photo of a car to generate a transforming car mecha video. | Required | / |
| 393881433990066176 | Still rings routine | Upload your photo to generate a video of the subject performing a perfect still rings routine. | Required | / |
| 393498001241890824 | Weightlifting | Upload a photo of your pet to generate a video where the subject performs a perfect weightlifting move. | Required | / |
| 393488336655310850 | Climbing | Upload a picture to generate a video of the subject in the picture completing a perfect sport climbing | Required | / |
Image Generation
This API supports images generations from text or references, allowing custom aspect ratios and resolutions for diverse needs.API Description
You can generate images by creating an image generation task using text prompts and/or reference images.Model List
| Model | Description |
|---|---|
| image-01 | A high-quality image generation model that produces fine-grained details. Supports both text-to-image and image-to-image generation (with subject reference for people). |
Music Generation
This API generates a vocal song based on a music description (prompt) and lyrics.Models
| Model | Usage |
|---|---|
| music-2.0 | The latest music generation model. Supports user-provided musical inspiration and lyrics to create AI-generated music. |
Music Generation API
Generate music from description and lyrics
File Management
This API is for file management and is used with other MiniMax APIs.API Description
This API includes 5 endpoints: Upload, List, Retrieve, Retrieve Content, Delete.Supported File Formats
| Type | Format |
|---|---|
| Document | pdf, docx, txt, jsonl |
| Audio | mp3, m4a, wav |
Capacity and Limits
| Item | Limit |
|---|---|
| Total Capacity | 100GB |
| Single Document Size | 512MB |





