Skip to main content
POST
/
v1
/
t2a_v2
curl --request POST \
--url https://api.minimax.io/v1/t2a_v2 \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: <content-type>' \
--data '{
"model": "speech-2.5-hd-preview",
"text": "Omg, the real danger is not that computers start thinking like people, but that people start thinking like computers. Computers can only help us with simple tasks.",
"stream": false,
"language_boost": "auto",
"output_format": "hex",
"voice_setting": {
"voice_id": "English_expressive_narrator",
"speed": 1,
"vol": 1,
"pitch": 0
},
"pronunciation_dict": {
"tone": [
"Omg/Oh my god"
]
},
"audio_setting": {
"sample_rate": 32000,
"bitrate": 128000,
"format": "mp3",
"channel": 1
},
"voice_modify": {
"pitch": 0,
"intensity": 0,
"timbre": 0,
"sound_effects": "spacious_echo"
}
}'
{
"data": {
"audio": "<hex encoded audio>",
"status": 2
},
"extra_info": {
"audio_length": 11124,
"audio_sample_rate": 32000,
"audio_size": 179926,
"bitrate": 128000,
"word_count": 163,
"invisible_character_ratio": 0,
"usage_characters": 163,
"audio_format": "mp3",
"audio_channel": 1
},
"trace_id": "01b8bf9bb7433cc75c18eee6cfa8fe21",
"base_resp": {
"status_code": 0,
"status_msg": "success"
}
}

Authorizations

Authorization
string
header
required

HTTP: Bearer Auth

Headers

Content-Type
enum<string>
default:application/json
required

The media type of the request body. Must be set to application/json to ensure the data is sent in JSON format.

Available options:
application/json

Body

application/json
model
enum<string>
required

The speech synthesis model version to use. Options include: speech-2.5-hd-preview, speech-2.5-turbo-preview, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo.

Available options:
speech-2.5-hd-preview,
speech-2.5-turbo-preview,
speech-02-hd,
speech-02-turbo,
speech-01-hd,
speech-01-turbo
text
string
required

The text to be converted into speech. Must be less than 10,000 characters.

  • For texts over 3,000 characters, streaming output is recommended.

  • Paragraph breaks should be marked with newline characters.

  • Pause control: You can customize speech pauses by adding markers in the form <#x#>, where x is the pause duration in seconds. Valid range: [0.01, 99.99], up to two decimal places. Pause markers must be placed between speakable text segments and cannot be used consecutively.

stream
boolean

Whether to enable streaming output. Defaults to false.

stream_options
object
voice_setting
object
audio_setting
object
pronunciation_dict
object
timber_weights
object[]

Timbre weights (legacy field)

language_boost
enum<string>

Controls whether recognition for specific minority languages and dialects is enhanced. Default is null. If the language type is unknown, set to "auto" and the model will automatically detect it.

Available options:
Chinese,
Chinese,Yue,
English,
Arabic,
Russian,
Spanish,
French,
Portuguese,
German,
Turkish,
Dutch,
Ukrainian,
Vietnamese,
Indonesian,
Japanese,
Italian,
Korean,
Thai,
Polish,
Romanian,
Greek,
Czech,
Finnish,
Hindi,
Bulgarian,
Danish,
Hebrew,
Malay,
Persian,
Slovak,
Swedish,
Croatian,
Filipino,
Hungarian,
Norwegian,
Slovenian,
Catalan,
Nynorsk,
Tamil,
Afrikaans,
auto
voice_modify
object

Voice effects configuration.

Supported audio formats:

  • Non-streaming: mp3, wav, flac
  • Streaming: mp3
subtitle_enable
boolean
default:false

Controls whether subtitles are enabled. Default is false. This parameter only takes effect in non-streaming scenarios. Available for models: speech-2.5-hd-preview, speech-2.5-turbo-preview, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo.

output_format
enum<string>
default:hex

Controls the output format. Options: [url, hex]. Default is hex. Only effective in non-streaming scenarios. In streaming, only hex is supported. Returned url is valid for 24 hours.

Available options:
url,
hex

Response

data
object

The synthesized audio data object. The returned data object may be null, so a null check is required.

trace_id
string

The session ID, used for troubleshooting and support.

extra_info
object

Additional audio information.

base_resp
object

Status code and details of this request.