Voice Cloning

Use cases: fast replication of a target timbre (IP voice recreation, voice cloning) where you need to quickly clone a specific voice. The API supports cloning from mono or stereo audio and can rapidly reproduce speech that matches the timbre of a provided reference file. Notes

Using this API to clone a voice does not immediately incur a cloning fee. The fee is charged the first time you synthesize speech with the cloned voice in a T2A synthesis API (trial preview within this API is excluded).
Voices produced via this rapid cloning API are temporary. To keep a cloned voice permanently, call any T2A speech synthesis API with that voice within 168 hours (7 days) (previews within this API do not count). After the time limit, the voice will be deleted.

Supported Models

Model	Description
speech-2.6-hd	Latest HD model with real-time response, intelligent parsing, fluent LoRA voice
speech-2.6-turbo	Latest Turbo model. Ultimate Value, 40 Languages
speech-02-hd	Superior rhythm and stability, with outstanding performance in replication similarity and sound quality.
speech-02-turbo	Superior rhythm and stability, with enhanced multilingual capabilities and excellent performance.

Official MCP

MiniMax provides official Model Context Protocol (MCP) server implementations for both Python and JavaScript version, with support for voice cloning. For details, see the MiniMax MCP User Guide.

Text

Speech

Video

Image

Music

File

Error Code

Supported Models

Official MCP

Text

Speech

Video

Image

Music

File

Error Code

​Supported Models

​Official MCP

Supported Models

Official MCP