Voice Clone

Voice Clone

curl --request POST \
  --url https://api.minimax.io/v1/voice_clone \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "file_id": 123456789,
  "voice_id": "<voice_id>",
  "clone_prompt": {
    "prompt_audio": 987654321,
    "prompt_text": "This voice sounds natural and pleasant."
  },
  "text": "A gentle breeze sweeps across the soft grass(breath), carrying the fresh scent along with the songs of birds.",
  "model": "speech-2.8-hd",
  "need_noise_reduction": false,
  "need_volume_normalization": false
}
'

{
  "input_sensitive": false,
  "input_sensitive_type": 0,
  "demo_audio": "",
  "base_resp": {
    "status_code": 0,
    "status_msg": "success"
  }
}

POST

voice_clone

Voice Clone

curl --request POST \
  --url https://api.minimax.io/v1/voice_clone \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "file_id": 123456789,
  "voice_id": "<voice_id>",
  "clone_prompt": {
    "prompt_audio": 987654321,
    "prompt_text": "This voice sounds natural and pleasant."
  },
  "text": "A gentle breeze sweeps across the soft grass(breath), carrying the fresh scent along with the songs of birds.",
  "model": "speech-2.8-hd",
  "need_noise_reduction": false,
  "need_volume_normalization": false
}
'

{
  "input_sensitive": false,
  "input_sensitive_type": 0,
  "demo_audio": "",
  "base_resp": {
    "status_code": 0,
    "status_msg": "success"
  }
}

Authorizations

Authorization

string

header

required

HTTP: Bearer Auth

Security Scheme Type: http
HTTP Authorization Scheme: Bearer API_key, can be found in Account Management>API Keys.

Headers

Content-Type

enum<string>

default:application/json

required

The media type of the request body. Must be set to application/json to ensure the data is sent in JSON format.

Available options:

application/json

Body

application/json

Voice clone request parameters

file_id

integer<int64>

required

The file_id of the audio to be cloned, obtained through the File Upload API.

Uploaded files must comply with the following rules:

Accepted audio formats: mp3, m4a, wav
Audio duration: at least 10 seconds, no longer than 5 minutes
File size: no larger than 20 MB
If this parameter is used, both child attributes (prompt_audio, prompt_text) are required

voice_id

string

required

The voice_id of the cloned voice. Example: "MiniMax001". When defining a custom voice_id, note the following rules:

Length range: [8, 256]
Must start with an English letter
Can contain letters, digits, -, and _
Cannot end with - or _
Must not duplicate an existing voice_id, otherwise an error will occur

clone_prompt

object

Voice cloning parameters. Providing this field helps improve the similarity and stability of synthesized voice. If used, you must also upload a short sample audio clip (less than 8s, supported formats: mp3, m4a, wav) along with its corresponding transcript.

Show child attributes

text

string

Optional preview text, up to 1000 characters. The cloned voice will be used to read the text, and an audio preview link will be returned. Note: Preview requests are charged based on character count, consistent with T2A pricing.

Interjection tags: Only supported when using speech-2.8-hd or speech-2.8-turbo models. Supported interjections: (laughs), (chuckle), (coughs), (clear-throat), (groans), (breath), (pant), (inhale), (exhale), (gasps), (sniffs), (sighs), (snorts), (burps), (lip-smacking), (humming), (hissing), (emm), (whistles), (sneezes), (crying), (applause).

model

enum<string>

Specifies which voice synthesis model to use for generating the preview audio. Required when the text field is provided.

Available options:

speech-2.8-hd,

speech-2.8-turbo,

speech-2.6-hd,

speech-2.6-turbo,

speech-02-hd,

speech-02-turbo,

speech-01-hd,

speech-01-turbo

language_boost

enum<string>

Controls whether recognition for specific minority languages and dialects is enhanced. Default is null. If the language type is unknown, set to "auto" and the model will automatically detect it.

Available options:

Chinese,

Chinese,Yue,

English,

Arabic,

Russian,

Spanish,

French,

Portuguese,

German,

Turkish,

Dutch,

Ukrainian,

Vietnamese,

Indonesian,

Japanese,

Italian,

Korean,

Thai,

Polish,

Romanian,

Greek,

Czech,

Finnish,

Hindi,

Bulgarian,

Danish,

Hebrew,

Malay,

Persian,

Slovak,

Swedish,

Croatian,

Filipino,

Hungarian,

Norwegian,

Slovenian,

Catalan,

Nynorsk,

Tamil,

Afrikaans,

auto

need_noise_reduction

boolean

default:false

Indicates whether to enable noise reduction.

need_volume_normalization

boolean

default:false

Indicates whether to enable volume normalization.

Response

200 - application/json

Successful response

input_sensitive

object

Content safety check result

Show child attributes

demo_audio

string

If both text and model are provided, this field returns a URL to the preview audio. Otherwise, it will be empty.

base_resp

object

Show child attributes

Upload Prompt Audio Voice Design

Using the API

Text

Speech

Video

Image

Music

File

Authorizations

Headers

Body

Response