Skip to main content

Get API Key


Text Generation

The text generation API uses MiniMax M2.1, MiniMax M2.1 lightning, MiniMax M2 to generate conversational content and trigger tool calls based on the provided context. It can be accessed via HTTP requests, the Anthropic SDK (Recommended), or the OpenAI SDK.

Supported Models

Model NameContext WindowDescription
MiniMax-M2.1204,800Powerful Multi-Language Programming Capabilities with Comprehensively Enhanced Programming Experience (output speed approximately 60 tps)
MiniMax-M2.1-lightning204,800Faster and More Agile (output speed approximately 100 tps)
MiniMax-M2204,800Agentic capabilities, Advanced reasoning
Please note: The maximum token count refers to the total number of input and output tokens.

Text to Speech (T2A)

This API provides synchronous text-to-speech (T2A) generation, supporting up to 10,000 characters per request. The interface is stateless: each call only processes the provided input without involving business logic, and the model does not store any user data. Key Features
  1. Access to 300+ system voices and custom cloned voices.
  2. Adjustable volume, pitch, speed, and output formats.
  3. Support for proportional audio mixing.
  4. Configurable fixed time intervals.
  5. Multiple audio formats and specifications supported: mp3, pcm, flac, wav (wav is supported only in non-streaming mode).
  6. Support for streaming output.
Typical Use Cases: short text generation, voice chat, online social interactions.

Supported Models

ModelDescription
speech-2.6-hdLatest HD model with outstanding prosody and excellent cloning similarity.
speech-2.6-turboLatest Turbo model with support for 40 languages.
speech-02-hdSuperior rhythm and stability, with outstanding performance in replication similarity and sound quality.
speech-02-turboSuperior rhythm and stability, with enhanced multilingual capabilities and excellent performance.

Available Interfaces

Synchronous speech synthesis provides two interfaces. Choose based on your needs:
  • HTTP T2A API
  • WebSocket T2A API

Supported Languages

MiniMax speech synthesis models offer robust multilingual capability, supporting 40 widely used languages worldwide.
Support Languages
1. Chinese15. Turkish28. Malay
2. Cantonese16. Dutch29. Persian
3. English17. Ukrainian30. Slovak
4. Spanish18. Thai31. Swedish
5. French19. Polish32. Croatian
6. Russian20. Romanian33. Filipino
7. German21. Greek34. Hungarian
8. Portuguese22. Czech35. Norwegian
9. Arabic23. Finnish36. Slovenian
10. Italian24. Hindi37. Catalan
11. Japanese25. Bulgarian38. Nynorsk
12. Korean26. Danish39. Tamil
13. Indonesian27. Hebrew40. Afrikaans
14. Vietnamese

Asynchronous Long-Text Speech Generation (T2A Async)

This API supports asynchronous text-to-speech generation. Each request can handle up to 1 million characters, and the resulting audio can be retrieved asynchronously. Features supported:
  1. Choose from 100+ system voices and cloned voices.
  2. Customize pitch, speed, volume, bitrate, sample rate, and output format.
  3. Retrieve audio metadata, such as duration and file size.
  4. Retrieve precise sentence-level timestamps (subtitles).
  5. Input text directly as a string or via file_id after uploading a text file.
  6. Detect illegal characters:
    • If illegal characters are ≤10%, audio is generated normally, with the ratio returned.
    • If illegal characters are >10%, no audio will be generated (an error code will be returned).
Note: The returned audio URL is valid for 9 hours (32,400 seconds) from the time it is issued. After expiration, the URL becomes invalid and the generated data will be lost. Use Case: Converting entire books or other long texts into audio.

Supported Models

ModelDescription
speech-2.6-hdLatest HD model with outstanding prosody and excellent cloning similarity.
speech-2.6-turboLatest Turbo model with support for 40 languages.
speech-02-hdSuperior rhythm and stability, with outstanding performance in replication similarity and sound quality.
speech-02-turboSuperior rhythm and stability, with enhanced multilingual capabilities and excellent performance.

API Overview

This feature includes two APIs:
  1. Create a speech generation task (returns task_id).
  2. Query the speech generation task status using task_id.
  3. If the task succeeds, use the returned file_id with the File API to view and download the result.

Voice Cloning

This API supports cloning voices from user-uploaded audio files along with optional sample audio to enhance cloning quality. Use cases: fast replication of a target timbre (IP voice recreation, voice cloning) where you need to quickly clone a specific voice. The API supports cloning from mono or stereo audio and can rapidly reproduce speech that matches the timbre of a provided reference file.

Supported Models

ModelDescription
speech-2.6-hdLatest HD model with real-time response, intelligent parsing, fluent LoRA voice
speech-2.6-turboLatest Turbo model. Ultimate Value, 40 Languages
speech-02-hdSuperior rhythm and stability, with outstanding performance in replication similarity and sound quality.
speech-02-turboSuperior rhythm and stability, with enhanced multilingual capabilities and excellent performance.

Notes

  • Using this API to clone a voice does not immediately incur a cloning fee. The fee is charged the first time you synthesize speech with the cloned voice in a T2A synthesis API.
  • Voices produced via this rapid cloning API are temporary. To keep a cloned voice permanently, call any T2A speech synthesis API with that voice within 168 hours (7 days).

Voice Design

This API supports generating personalized custom voices based on user-provided voice description prompts. The generated voices (voice_id) can then be used in the T2A API and the T2A Async API for speech generation.

Supported Models

It is recommended to use speech-02-hd for the best results.
ModelDescription
speech-2.6-hdLatest HD model with real-time response, intelligent parsing, fluent LoRA voice
speech-2.6-turboLatest Turbo model. Ultimate Value, 40 Languages
speech-02-hdSuperior rhythm and stability, with outstanding performance in replication similarity and sound quality.
speech-02-turboSuperior rhythm and stability, with enhanced multilingual capabilities and excellent performance.

Notes

  • Using this API to generate a voice does not immediately incur a fee. The generation fee will be charged upon the first use of the generated voice in speech synthesis.
  • Voices generated through this API are temporary. If you wish to keep a voice permanently, you must use it in any speech synthesis API within 168 hours (7 days).

Voice Design API

Generate personalized voices from descriptions

Video Generation

This API supports generating videos based on user-provided text, images (including first frame, last frame, or reference images).

Supported Models

ModelDescription
MiniMax-Hailuo-2.3New video generation model, breakthroughs in body movement, facial expressions, physical realism, and prompt adherence.
MiniMax-Hailuo-2.3-FastNew Image-to-video model, for value and efficiency.
MiniMax-Hailuo-02Video generation model supporting higher resolution (1080P), longer duration (10s), and stronger adherence to prompts.

API Usage Guide

Video generation is asynchronous and consists of three APIs: Create Video Generation Task, Query Video Generation Task Status, and File Management. Steps are as follows:
  1. Use the Create Video Generation Task API to start a task. On success, it will return a task_id.
  2. Use the Query Video Generation Task Status API with the task_id to check progress. When the status is success, a file ID (file_id) will be returned.
  3. Use the Download the Video File API with the file_id to view and download the generated video.

Video Generation Agent

This API supports video generation tasks based on user-selected video agent templates and inputs.

Overview

The Video Agent API works asynchronously and includes two endpoints: Create Video Agent Task and Query Video Agent Task Status. Usage steps:
  1. Use the Create Video Agent Task API to create a task and obtain a task_id.
  2. Use the Query Video Agent Task Status API with the task_id to check the task status. Once the status is Success, you can retrieve the corresponding file download URL.

Template List

For details and examples, refer to the Video Agent Template List.
Template IDTemplate NameDescriptionmedia_inputstext_inputs
392747428568649728DivingUpload a picture to generate a video of the subject in the picture completing a perfect diveRequired/
393769180141805569Run for LifeUpload a photo of your pet and enter a type of wild beast to generate a survival video of your pet in the wilderness.RequiredRequired
397087679467597833TransformersUpload a photo of a car to generate a transforming car mecha video.Required/
393881433990066176Still rings routineUpload your photo to generate a video of the subject performing a perfect still rings routine.Required/
393498001241890824WeightliftingUpload a photo of your pet to generate a video where the subject performs a perfect weightlifting move.Required/
393488336655310850ClimbingUpload a picture to generate a video of the subject in the picture completing a perfect sport climbingRequired/

Image Generation

This API supports images generations from text or references, allowing custom aspect ratios and resolutions for diverse needs.

API Description

You can generate images by creating an image generation task using text prompts and/or reference images.

Model List

ModelDescription
image-01A high-quality image generation model that produces fine-grained details. Supports both text-to-image and image-to-image generation (with subject reference for people).

Music Generation

This API generates a vocal song based on a music description (prompt) and lyrics.

Models

ModelUsage
music-2.0The latest music generation model. Supports user-provided musical inspiration and lyrics to create AI-generated music.

Music Generation API

Generate music from description and lyrics

File Management

This API is for file management and is used with other MiniMax APIs.

API Description

This API includes 5 endpoints: Upload, List, Retrieve, Retrieve Content, Delete.

Supported File Formats

TypeFormat
Documentpdf, docx, txt, jsonl
Audiomp3, m4a, wav

Capacity and Limits

ItemLimit
Total Capacity100GB
Single Document Size512MB

Official MCP

MiniMax provides official Model Context Protocol (MCP) server implementations: Both support speech synthesis, voice cloning, video generation, and music generation. For details, refer to the MiniMax MCP User Guide.