> ## Documentation Index > Fetch the complete documentation index at: https://platform.minimax.io/docs/llms.txt > Use this file to discover all available pages before exploring further. # API Overview > Overview of MiniMax API capabilities including language, speech, video, image, music, and file management. ## Get API Key * **Pay-as-you-go**：Visit [API Keys > Create new secret key](https://platform.minimax.io/user-center/basic-information/interface-key) to get your **API Key** Pay-as-you-go supports all modality models, including language, Video, Speech, and Image. * **Token Plan**：Visit [Billing > Token Plan](https://platform.minimax.io/user-center/payment/token-plan) to view your **Subscription Key** The Subscription Key is used for Token Plan subscriptions and purchased Credits. It is separate from pay-as-you-go API Keys. See [Token Plan Overview](https://platform.minimax.io/docs/token-plan/intro) for details. *** ## LLM The LLM API uses **MiniMax M3**, **MiniMax M2.7**, **MiniMax M2.7 highspeed**, **MiniMax M2.5**, **MiniMax M2.5 highspeed**, **MiniMax M2.1**, **MiniMax M2.1 highspeed**, and **MiniMax M2** to generate conversational content and trigger tool calls based on the provided context. It can be accessed via **HTTP requests**, the **Anthropic SDK** (Recommended), or the **OpenAI SDK**. ### Supported Models | Model Name | Context Window | Description | | :--------------------- | :------------- | :-------------------------------------------------------------------------------------------------------------------------------------------- | | MiniMax-M3 | 1,000,000 | **Latest M-series language model for agentic reasoning, tool use, coding, and long-context tasks** | | MiniMax-M2.7 | 204,800 | **Beginning the journey of recursive self-improvement. (output speed approximately 60 tps)** | | MiniMax-M2.7-highspeed | 204,800 | **M2.7 highspeed: Same performance, faster and more agile (output speed approximately 100 tps)** | | MiniMax-M2.5 | 204,800 | **Peak Performance. Ultimate Value. Master the Complex (output speed approximately 60 tps)** | | MiniMax-M2.5-highspeed | 204,800 | **M2.5 highspeed: Same performance, faster and more agile (output speed approximately 100 tps)** | | MiniMax-M2.1 | 204,800 | **Powerful Multi-Language Programming Capabilities with Comprehensively Enhanced Programming Experience (output speed approximately 60 tps)** | | MiniMax-M2.1-highspeed | 204,800 | **Faster and More Agile (output speed approximately 100 tps)** | | MiniMax-M2 | 204,800 | **Agentic capabilities, Advanced reasoning** | Please note: The maximum token count refers to the total number of input and output tokens. Use Anthropic SDK with MiniMax models Use OpenAI SDK with MiniMax models *** ## Text to Speech (T2A) This API provides synchronous text-to-speech (T2A) generation, supporting up to **10,000** characters per request. The interface is stateless: each call only processes the provided input without involving business logic, and the model does not store any user data. **Key Features** 1. Access to 300+ system voices and custom cloned voices. 2. Adjustable volume, pitch, speed, and output formats. 3. Support for proportional audio mixing. 4. Configurable fixed time intervals. 5. Multiple audio formats and specifications supported: `mp3`, `pcm`, `flac`, `wav`. 6. Support for streaming output. **Typical Use Cases:** short text generation, voice chat, online social interactions. ### Supported Models | Model | Description | | :--------------- | :------------------------------------------------------------------------------------------------------- | | speech-2.8-hd | Latest HD model. Ultra-realistic quality featuring sound tags. | | speech-2.8-turbo | Latest Turbo model. Seamless speed meets natural flow. | | speech-2.6-hd | HD model with outstanding prosody and excellent cloning similarity. | | speech-2.6-turbo | Turbo model with support for 40 languages. | | speech-02-hd | Superior rhythm and stability, with outstanding performance in replication similarity and sound quality. | | speech-02-turbo | Superior rhythm and stability, with enhanced multilingual capabilities and excellent performance. | ### Available Interfaces Synchronous speech synthesis provides two interfaces. Choose based on your needs: * HTTP T2A API * WebSocket T2A API ### Supported Languages MiniMax speech synthesis models offer robust multilingual capability, supporting **40 widely used languages** worldwide. | Support Languages | | | | ----------------- | ------------- | ------------- | | 1. Chinese | 15. Turkish | 28. Malay | | 2. Cantonese | 16. Dutch | 29. Persian | | 3. English | 17. Ukrainian | 30. Slovak | | 4. Spanish | 18. Thai | 31. Swedish | | 5. French | 19. Polish | 32. Croatian | | 6. Russian | 20. Romanian | 33. Filipino | | 7. German | 21. Greek | 34. Hungarian | | 8. Portuguese | 22. Czech | 35. Norwegian | | 9. Arabic | 23. Finnish | 36. Slovenian | | 10. Italian | 24. Hindi | 37. Catalan | | 11. Japanese | 25. Bulgarian | 38. Nynorsk | | 12. Korean | 26. Danish | 39. Tamil | | 13. Indonesian | 27. Hebrew | 40. Afrikaans | | 14. Vietnamese | | | Synchronous speech synthesis via HTTP Streaming speech synthesis via WebSocket *** ## Asynchronous Long-Text Speech Generation (T2A Async) This API supports asynchronous text-to-speech generation. Each request can handle up to **1 million characters**, and the resulting audio can be retrieved asynchronously. Features supported: 1. Choose from 100+ system voices and cloned voices. 2. Customize pitch, speed, volume, bitrate, sample rate, and output format. 3. Retrieve audio metadata, such as duration and file size. 4. Retrieve precise sentence-level timestamps (subtitles). 5. Input text directly as a string or via `file_id` after uploading a text file. 6. Detect illegal characters: * If illegal characters are **≤10%**, audio is generated normally, with the ratio returned. * If illegal characters are **>10%**, no audio will be generated (an error code will be returned). **Note:** The returned audio URL is valid for **9 hours** (32,400 seconds) from the time it is issued. After expiration, the URL becomes invalid and the generated data will be lost. **Use Case:** Converting entire books or other long texts into audio. ### Supported Models | Model | Description | | :--------------- | :------------------------------------------------------------------------------------------------------- | | speech-2.8-hd | Latest HD model. Ultra-realistic quality featuring sound tags. | | speech-2.8-turbo | Latest Turbo model. Seamless speed meets natural flow. | | speech-2.6-hd | HD model with outstanding prosody and excellent cloning similarity. | | speech-2.6-turbo | Turbo model with support for 40 languages. | | speech-02-hd | Superior rhythm and stability, with outstanding performance in replication similarity and sound quality. | | speech-02-turbo | Superior rhythm and stability, with enhanced multilingual capabilities and excellent performance. | ### API Overview This feature includes **two APIs**: 1. Create a speech generation task (returns `task_id`). 2. Query the speech generation task status using `task_id`. 3. If the task succeeds, use the returned `file_id` with the **File API** to view and download the result. Create a long-text speech generation task Query speech generation task status *** ## Voice Cloning This API supports cloning voices from user-uploaded audio files along with optional sample audio to enhance cloning quality. **Use cases:** fast replication of a target timbre (IP voice recreation, voice cloning) where you need to quickly clone a specific voice. The API supports cloning from mono or stereo audio and can rapidly reproduce speech that matches the timbre of a provided reference file. ### Supported Models | Model | Description | | :--------------- | :------------------------------------------------------------------------------------------------------- | | speech-2.8-hd | Latest HD model. Ultra-realistic quality featuring sound tags. | | speech-2.8-turbo | Latest Turbo model. Seamless speed meets natural flow. | | speech-2.6-hd | HD model with real-time response, intelligent parsing, fluent LoRA voice | | speech-2.6-turbo | Turbo model. Ultimate Value, 40 Languages | | speech-02-hd | Superior rhythm and stability, with outstanding performance in replication similarity and sound quality. | | speech-02-turbo | Superior rhythm and stability, with enhanced multilingual capabilities and excellent performance. | ### Notes * Using this API to clone a voice **does not** immediately incur a cloning fee. The fee is charged the **first time** you synthesize speech with the cloned voice in a T2A synthesis API. * Voices produced via this rapid cloning API are **temporary**. To keep a cloned voice permanently, call **any** T2A speech synthesis API with that voice **within 168 hours (7 days)**. Upload audio file to clone Execute voice cloning *** ## Voice Design This API supports generating personalized custom voices based on user-provided voice description prompts. The generated voices (voice\_id) can then be used in the T2A API and the T2A Async API for speech generation. ### Supported Models > It is recommended to use **speech-02-hd** for the best results. | Model | Description | | :--------------- | :------------------------------------------------------------------------------------------------------- | | speech-2.8-hd | Latest HD model. Ultra-realistic quality featuring sound tags. | | speech-2.8-turbo | Latest Turbo model. Seamless speed meets natural flow. | | speech-2.6-hd | HD model with real-time response, intelligent parsing, fluent LoRA voice | | speech-2.6-turbo | Turbo model. Ultimate Value, 40 Languages | | speech-02-hd | Superior rhythm and stability, with outstanding performance in replication similarity and sound quality. | | speech-02-turbo | Superior rhythm and stability, with enhanced multilingual capabilities and excellent performance. | ### Notes > * Using this API to generate a voice does not immediately incur a fee. The generation fee will be charged upon the first use of the generated voice in speech synthesis. > * Voices generated through this API are temporary. If you wish to keep a voice permanently, you must use it in any speech synthesis API within 168 hours (7 days). Generate personalized voices from descriptions *** ## Video Generation This API supports generating videos based on user-provided text, images (including first frame, last frame, or reference images). ### Supported Models | Model | Description | | :---------------------- | :---------------------------------------------------------------------------------------------------------------------- | | MiniMax-Hailuo-2.3 | New video generation model, breakthroughs in body movement, facial expressions, physical realism, and prompt adherence. | | MiniMax-Hailuo-2.3-Fast | New Image-to-video model, for value and efficiency. | | MiniMax-Hailuo-02 | Video generation model supporting higher resolution (1080P), longer duration (10s), and stronger adherence to prompts. | ### API Usage Guide Video generation is asynchronous and consists of three APIs: **Create Video Generation Task**, **Query Video Generation Task Status**, and **File Management**. Steps are as follows: 1. Use the **Create Video Generation Task API** to start a task. On success, it will return a `task_id`. 2. Use the **Query Video Generation Task Status API** with the `task_id` to check progress. When the status is `success`, a file ID (`file_id`) will be returned. 3. Use the **Download the Video File API** with the `file_id` to view and download the generated video. Generate video from text description Generate video from image *** ## Video Generation Agent This API supports video generation tasks based on user-selected video agent templates and inputs. ### Overview The Video Agent API works asynchronously and includes two endpoints: **Create Video Agent Task** and **Query Video Agent Task Status**. **Usage steps:** 1. Use the **Create Video Agent Task** API to create a task and obtain a `task_id`. 2. Use the **Query Video Agent Task Status** API with the `task_id` to check the task status. Once the status is `Success`, you can retrieve the corresponding file download URL. ### Template List For details and examples, refer to the [Video Agent Template List](/faq/video-agent-templates). | Template ID | Template Name | Description | media\_inputs | text\_inputs | | :----------------- | :------------------ | :-------------------------------------------------------------------------------------------------------------------- | :------------ | :----------- | | 392747428568649728 | Diving | Upload a picture to generate a video of the subject in the picture completing a perfect dive | Required | / | | 393769180141805569 | Run for Life | Upload a photo of your pet and enter a type of wild beast to generate a survival video of your pet in the wilderness. | Required | Required | | 397087679467597833 | Transformers | Upload a photo of a car to generate a transforming car mecha video. | Required | / | | 393881433990066176 | Still rings routine | Upload your photo to generate a video of the subject performing a perfect still rings routine. | Required | / | | 393498001241890824 | Weightlifting | Upload a photo of your pet to generate a video where the subject performs a perfect weightlifting move. | Required | / | | 393488336655310850 | Climbing | Upload a picture to generate a video of the subject in the picture completing a perfect sport climbing | Required | / | Create a video agent task Query video agent task status *** ## Image Generation This API supports images generations from text or references, allowing custom aspect ratios and resolutions for diverse needs. ### API Description You can generate images by creating an image generation task using text prompts and/or reference images. ### Model List | Model | Description | | :------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | image-01 | A high-quality image generation model that produces fine-grained details. Supports both text-to-image and image-to-image generation (with subject reference for people). | Generate image from text description Generate image from reference image *** ## Music Generation This API generates a vocal song based on a music description (prompt) and lyrics. ### Models | Model | Usage | | :-------- | :--------------------------------------------------------------------------------------------------------------------- | | music-2.6 | The latest music generation model. Supports user-provided musical inspiration and lyrics to create AI-generated music. | Generate music from description and lyrics *** ## File Management This API is for file management and is used with other MiniMax APIs. ### API Description This API includes 5 endpoints: **Upload**, **List**, **Retrieve**, **Retrieve Content**, **Delete**. ### Supported File Formats | Type | Format | | :------- | :---------------------------- | | Document | `pdf`, `docx`, `txt`, `jsonl` | | Audio | `mp3`, `m4a`, `wav` | ### Capacity and Limits | Item | Limit | | :------------------- | :---- | | Total Capacity | 100GB | | Single Document Size | 512MB | Upload files to the platform Get list of uploaded files *** ## Official MCP MiniMax provides official Model Context Protocol (MCP) server implementations: * [Python version](https://github.com/MiniMax-AI/MiniMax-MCP) * [JavaScript version](https://github.com/MiniMax-AI/MiniMax-MCP-JS) Both support speech synthesis, voice cloning, video generation, and music generation. For details, refer to the [MiniMax MCP User Guide](/guides/mcp-guide).