Asynchronous Long-Text Speech Generation (T2A Async)

This API supports asynchronous text-to-speech generation. Each request can handle up to 1 million characters, and the resulting audio can be retrieved asynchronously. Features supported:

Choose from 100+ system voices and cloned voices.
Customize pitch, speed, volume, bitrate, sample rate, and output format.
Retrieve audio metadata, such as duration and file size.
Retrieve precise sentence-level timestamps (subtitles).
Input text directly as a string or via file_id after uploading a text file.
Detect illegal characters:
- If illegal characters are ≤10%, audio is generated normally, with the ratio returned.
- If illegal characters are >10%, no audio will be generated (an error code will be returned).
  Illegal characters are defined as ASCII control characters (excluding tab and newline).

After submitting a long-text TTS request, a file_id will be generated. Once the task is complete, the result can be downloaded using the File Retrieve API. Note: The returned audio URL is valid for 9 hours (32,400 seconds) from the time it is issued. After expiration, the URL becomes invalid and the generated data will be lost. Please ensure timely download. Use Case: Converting entire books or other long texts into audio.

Supported Models

Below are the speech models provided by MiniMax and their characteristics:

Model	Description
speech-2.8-hd	Latest HD model. Perfecting Tonal Nuances. Maximizing Timbre Similarity.
speech-2.8-turbo	Latest Turbo model. Perfecting Tonal Nuances. Maximizing Timbre Similarity.
speech-2.6-hd	HD model with outstanding prosody and excellent cloning similarity.
speech-2.6-turbo	Turbo model with support for 40 languages.
speech-02-hd	Superior rhythm and stability, with outstanding performance in replication similarity and sound quality.
speech-02-turbo	Superior rhythm and stability, with enhanced multilingual capabilities and excellent performance.

API Overview

This feature includes two APIs:

Create a speech generation task (returns task_id).
- If using a file_id as input, you must first upload the file via File(Upload).
Query the speech generation task status using task_id.
If the task succeeds, use the returned file_id with the File API to view and download the result.

Supported Languages

MiniMax TTS models support 40 major global languages, delivering strong cross-lingual capabilities.

Support Languages
1. Chinese	15. Turkish	28. Malay
2. Cantonese	16. Dutch	29. Persian
3. English	17. Ukrainian	30. Slovak
4. Spanish	18. Thai	31. Swedish
5. French	19. Polish	32. Croatian
6. Russian	20. Romanian	33. Filipino
7. German	21. Greek	34. Hungarian
8. Portuguese	22. Czech	35. Norwegian
9. Arabic	23. Finnish	36. Slovenian
10. Italian	24. Hindi	37. Catalan
11. Japanese	25. Bulgarian	38. Nynorsk
12. Korean	26. Danish	39. Tamil
13. Indonesian	27. Hebrew	40. Afrikaans
14. Vietnamese

Official MCP

MiniMax provides official Model Context Protocol (MCP) server implementations:

Both support speech synthesis. For details, refer to the MiniMax MCP User Guide.

Using the API

Text

Speech

Video

Image

Music

File

Asynchronous Long-Text Speech Generation (T2A Async)

Supported Models

API Overview

Supported Languages

Official MCP

Using the API

Text

Speech

Video

Image

Music

File

​Supported Models

​API Overview

​Supported Languages

​Official MCP

Supported Models

API Overview

Supported Languages

Official MCP