Skip to main content
POST
/
v1
/
voice_clone
Voice Clone
curl --request POST \
  --url https://api.minimax.io/v1/voice_clone \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: <content-type>' \
  --data '{
  "file_id": "<file_id of cloned voice>",
  "voice_id": "<voice_id>",
  "clone_prompt": {
    "prompt_audio": "<file_id of the prompt audio>",
    "prompt_text": "This voice sounds natural and pleasant."
  },
  "text": "A gentle breeze sweeps across the soft grass, carrying the fresh scent along with the songs of birds.",
  "model": "speech-2.5-hd-preview",
  "need_noise_reduction": false,
  "need_volumn_normalization": false
}'
{
  "input_sensitive": false,
  "input_sensitive_type": 0,
  "demo_audio": "",
  "base_resp": {
    "status_code": 0,
    "status_msg": "success"
  }
}

Authorizations

Authorization
string
header
required

HTTP: Bearer Auth

Headers

Content-Type
enum<string>
default:application/json
required

The media type of the request body. Must be set to application/json to ensure the data is sent in JSON format.

Available options:
application/json

Body

application/json

Voice clone request parameters

file_id
integer
required

The file_id of the audio to be cloned, obtained through the File Upload API.

Uploaded files must comply with the following rules:

  • Accepted audio formats: mp3, m4a, wav
  • Audio duration: at least 10 seconds, no longer than 5 minutes
  • File size: no larger than 20 MB
voice_id
string
required

The voice_id of the cloned voice. Example: "MiniMax001". When defining a custom voice_id, note the following rules:

  • Length range: [8, 256]
  • Must start with an English letter
  • Can contain letters, digits, -, and _
  • Cannot end with - or _
  • Must not duplicate an existing voice_id, otherwise an error will occur
clone_prompt
object

Voice cloning parameters. Providing this field helps improve the similarity and stability of synthesized voice. If used, you must also upload a short sample audio clip (less than 8s, supported formats: mp3, m4a, wav) along with its corresponding transcript.

text
string

Optional preview text, up to 2000 characters. The cloned voice will be used to read the text, and an audio preview link will be returned. Note: Preview requests are charged based on character count, consistent with T2A pricing.

model
enum<string>

Specifies which voice synthesis model to use for generating the preview audio. Required when the text field is provided.

Available options:
speech-2.5-hd-preview,
speech-2.5-turbo-preview,
speech-02-hd,
speech-02-turbo,
speech-01-hd,
speech-01-turbo
language_boost
string

Controls whether recognition for specific minority languages and dialects is enhanced. Default is null. If the language type is unknown, set to "auto" and the model will automatically detect it. Supported values: ['Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto']

need_noise_reduction
boolean
default:false

Indicates whether to enable noise reduction.

need_volume_normalization
boolean
default:false

Indicates whether to enable volume normalization.

Response

200 - application/json

Successful response

input_sensitive
object

Content safety check result

demo_audio
string

If both text and model are provided, this field returns a URL to the preview audio. Otherwise, it will be empty.

base_resp
object