- Supports 100+ system voices and custom cloned voices.
- Supports adjustment of pitch, speech rate, volume, bitrate, sample rate, and output format.
- Supports returning parameters such as audio duration and audio size.
- Supports timestamp (subtitles) return, accurate to the sentence level.
- Supports two input methods for text to be synthesized: direct string input and uploading a text file via file_id.
- Supports detection of invalid characters: if invalid characters do not exceed 10% (including 10%), audio will be generated normally and the proportion of invalid characters will be returned; if invalid characters exceed 10%, the interface will not return a result (error code will be returned), please check and submit the request again. [Invalid characters definition: ASCII control characters in ASCII code (excluding tabs (
\t) and newlines (\n))].
Models
| model | Description |
|---|---|
| speech-2.5-hd-preview | The brand new HD model. Ultimate Similarity, Ultra-High Quality |
| speech-2.5-turbo-preview | The brand new Turbo model. Ultimate Value, 40 Languages |
| speech-02-hd | Superior rhythm and stability, with outstanding performance in replication similarity and sound quality. |
| speech-02-turbo | Superior rhythm and stability, with enhanced multilingual capabilities and excellent performance. |
| speech-01-hd | Rich Voices, Expressive Emotions, Authentic Languages |
| speech-01-turbo | Excellent performance and low latency |
Supported Languages
MiniMax’s speech synthesis models offer outstanding multilingual capabilities, with full support for 40 widely used languages worldwide. Our goal is to break down language barriers and build a truly universal AI model. Currently supported languages include:| Support Languages | ||
|---|---|---|
| 1. Chinese | 15. Turkish | 28. Malay |
| 2. Cantonese | 16. Dutch | 29. Persian |
| 3. English | 17. Ukrainian | 30. Slovak |
| 4. Spanish | 18. Thai | 31. Swedish |
| 5. French | 19. Polish | 32. Croatian |
| 6. Russian | 20. Romanian | 33. Filipino |
| 7. German | 21. Greek | 34. Hungarian |
| 8. Portuguese | 22. Czech | 35. Norwegian |
| 9. Arabic | 23. Finnish | 36. Slovenian |
| 10. Italian | 24. Hindi | 37. Catalan |
| 11. Japanese | 25. Bulgarian | 38. Nynorsk |
| 12. Korean | 26. Danish | 39. Tamil |
| 13. Indonesian | 27. Hebrew | 40. Afrikaans |
| 14. Vietnamese |
Usage Workflow
- File Input (Optional):
If you are using a file as input, first call the File Upload API to upload the text and obtain afile_id.
If you are passing raw text as input, you can skip this step. - Create a Speech Generation Task:
Call the Create Speech Generation Task to create a task and retrieve atask_id. - Check Task Status:
Use the Query Speech Generation Task Status with thetask_idto check the task progress. - Retrieve the Audio File:
Once the task is complete, the returnedfile_idcan be used with the File Retrieve API to download the audio result.
Note: The download URL is valid for 9 hours (32,400 seconds) from the time it is generated. After expiration, the file becomes unavailable and the generated audio will be lost. Please make sure to download the file in time.
Use Case
Get file_id
Python
Create Speech Generation Task
Python
Query of Generation Status
Python
Retrieve the Download URL of the Audio File
Python