Skip to main content
This API supports asynchronous text-to-speech generation. Each request can handle up to 1 million characters, and the resulting audio can be retrieved asynchronously. Features supported:
  1. Choose from 100+ system voices and cloned voices.
  2. Customize pitch, speed, volume, bitrate, sample rate, and output format.
  3. Retrieve audio metadata, such as duration and file size.
  4. Retrieve precise sentence-level timestamps (subtitles).
  5. Input text directly as a string or via file_id after uploading a text file.
  6. Detect illegal characters:
    • If illegal characters are ≤10%, audio is generated normally, with the ratio returned.
    • If illegal characters are >10%, no audio will be generated (an error code will be returned).
      Illegal characters are defined as ASCII control characters (excluding tab and newline).
After submitting a long-text TTS request, a file_id will be generated. Once the task is complete, the result can be downloaded using the File Retrieve API. Note: The returned audio URL is valid for 9 hours (32,400 seconds) from the time it is issued. After expiration, the URL becomes invalid and the generated data will be lost. Please ensure timely download. Use Case: Converting entire books or other long texts into audio.

Supported Models

Below are the speech models provided by MiniMax and their characteristics:
ModelDescription
speech-2.5-hd-previewLatest HD model with outstanding prosody and excellent cloning similarity.
speech-2.5-turbo-previewLatest Turbo model with support for 40 languages.
speech-02-hdSuperior rhythm and stability, with outstanding performance in replication similarity and sound quality.
speech-02-turboSuperior rhythm and stability, with enhanced multilingual capabilities and excellent performance.
speech-01-hdRich Voices, Expressive Emotions, Authentic Languages.
speech-01-turboExcellent performance and low latency.

API Overview

This feature includes two APIs:
  1. Create a speech generation task (returns task_id).
    • If using a file_id as input, you must first upload the file via File(Upload).
  2. Query the speech generation task status using task_id.
  3. If the task succeeds, use the returned file_id with the File API to view and download the result.

Supported Languages

MiniMax TTS models support 40 major global languages, delivering strong cross-lingual capabilities.
Support Languages                                                     
1. Chinese15. Turkish28. Malay
2. Cantonese16. Dutch29. Persian
3. English17. Ukrainian30. Slovak
4. Spanish18. Thai31. Swedish
5. French19. Polish32. Croatian
6. Russian20. Romanian33. Filipino
7. German21. Greek34. Hungarian
8. Portuguese22. Czech35. Norwegian
9. Arabic23. Finnish36. Slovenian
10. Italian24. Hindi37. Catalan
11. Japanese25. Bulgarian38. Nynorsk
12. Korean26. Danish39. Tamil
13. Indonesian27. Hebrew40. Afrikaans
14. Vietnamese

Official MCP

MiniMax provides official Model Context Protocol (MCP) server implementations: Both support speech synthesis. For details, refer to the MiniMax MCP User Guide.