> ## Documentation Index
> Fetch the complete documentation index at: https://platform.minimax.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Voice Clone

> Use this API for rapid voice cloning.
If a cloned voice is not used within 7 days, the system will delete it.


## OpenAPI

````yaml api-reference/speech/voice-cloning/api/openapi.json POST /v1/voice_clone
openapi: 3.1.0
info:
  title: MiniMax Voice Cloning API
  description: MiniMax Voice Cloning API with support for voice cloning and file upload
  license:
    name: MIT
  version: 1.0.0
servers:
  - url: https://api.minimax.io
security:
  - bearerAuth: []
paths:
  /v1/voice_clone:
    post:
      tags:
        - Voice
      summary: Voice Clone
      operationId: voiceClone
      parameters:
        - name: Content-Type
          in: header
          required: true
          description: >-
            The media type of the request body. Must be set to
            `application/json` to ensure the data is sent in JSON format.
          schema:
            type: string
            enum:
              - application/json
            default: application/json
      requestBody:
        description: Voice clone request parameters
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/VoiceCloneReq'
        required: true
      responses:
        '200':
          description: Successful response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/VoiceCloneResp'
components:
  schemas:
    VoiceCloneReq:
      type: object
      required:
        - file_id
        - voice_id
      properties:
        file_id:
          type: integer
          format: int64
          description: >-
            The `file_id` of the audio to be cloned, obtained through the [File
            Upload API](/api-reference/file-management-upload).


            Uploaded files must comply with the following rules:

            - Accepted audio formats: mp3, m4a, wav

            - Audio duration: at least 10 seconds, no longer than 5 minutes

            - File size: no larger than 20 MB

            - If this parameter is used, both child attributes (prompt_audio,
            prompt_text) are required
        voice_id:
          type: string
          description: >-
            The `voice_id` of the cloned voice. Example: `"MiniMax001"`. When
            defining a custom `voice_id`, note the following rules:

            - Length range: [8, 256]

            - Must start with an English letter

            - Can contain letters, digits, `-`, and `_`

            - Cannot end with `-` or `_`

            - Must not duplicate an existing `voice_id`, otherwise an error will
            occur
        clone_prompt:
          $ref: '#/components/schemas/ClonePrompt'
          description: >-
            Voice cloning parameters. Providing this field helps improve the
            similarity and stability of synthesized voice. If used, you must
            also upload a short sample audio clip (less than 8s, supported
            formats: mp3, m4a, wav) along with its corresponding transcript.
        text:
          type: string
          description: >-
            Optional preview text, up to 1000 characters. The cloned voice will
            be used to read the text, and an audio preview link will be
            returned.

            Note: Preview requests are charged based on character count,
            consistent with T2A pricing.

            - **Interjection tags**: Only supported when using `speech-2.8-hd`
            or `speech-2.8-turbo` models. Supported interjections: `(laughs)`,
            `(chuckle)`, `(coughs)`, `(clear-throat)`, `(groans)`, `(breath)`,
            `(pant)`, `(inhale)`, `(exhale)`, `(gasps)`, `(sniffs)`, `(sighs)`,
            `(snorts)`, `(burps)`, `(lip-smacking)`, `(humming)`, `(hissing)`,
            `(emm)`, `(whistles)`, `(sneezes)`, `(crying)`, `(applause)`.
        model:
          type: string
          description: >-
            Specifies which voice synthesis model to use for generating the
            preview audio. Required when the `text` field is provided.
          enum:
            - speech-2.8-hd
            - speech-2.8-turbo
            - speech-2.6-hd
            - speech-2.6-turbo
            - speech-02-hd
            - speech-02-turbo
            - speech-01-hd
            - speech-01-turbo
        language_boost:
          type: string
          description: >-
            Controls whether recognition for specific minority languages and
            dialects is enhanced. Default is `null`. If the language type is
            unknown, set to `"auto"` and the model will automatically detect it.
          enum:
            - Chinese
            - Chinese,Yue
            - English
            - Arabic
            - Russian
            - Spanish
            - French
            - Portuguese
            - German
            - Turkish
            - Dutch
            - Ukrainian
            - Vietnamese
            - Indonesian
            - Japanese
            - Italian
            - Korean
            - Thai
            - Polish
            - Romanian
            - Greek
            - Czech
            - Finnish
            - Hindi
            - Bulgarian
            - Danish
            - Hebrew
            - Malay
            - Persian
            - Slovak
            - Swedish
            - Croatian
            - Filipino
            - Hungarian
            - Norwegian
            - Slovenian
            - Catalan
            - Nynorsk
            - Tamil
            - Afrikaans
            - auto
        text_validation:
          type: string
          description: >-
            Optional. The expected transcript of the cloning sample audio
            (matching the content of `file_id` or `clone_prompt.prompt_audio`).
            When provided, the audio is sent to ASR and the recognized text is
            compared against `text_validation`. If the similarity is below
            `accuracy`, the request is rejected with status code `1043` (`The
            asr similarity check failed`). Maximum length: 200 characters.
          maxLength: 200
        accuracy:
          type: number
          format: double
          description: >-
            Optional. Similarity threshold used by the ASR validation triggered
            by `text_validation`. Valid range: `[0, 1]`. When omitted or set to
            `0`, defaults to `0.7`.
          minimum: 0
          maximum: 1
          default: 0.7
        need_noise_reduction:
          type: boolean
          description: Indicates whether to enable noise reduction.
          default: false
        need_volume_normalization:
          type: boolean
          description: Indicates whether to enable volume normalization.
          default: false
        aigc_watermark:
          type: boolean
          description: >-
            Indicates whether to append an AIGC watermark tone to the end of the
            synthesized preview audio.
          default: false
      example:
        file_id: 123456789
        voice_id: <voice_id>
        clone_prompt:
          prompt_audio: 987654321
          prompt_text: This voice sounds natural and pleasant.
        text: >-
          A gentle breeze sweeps across the soft grass(breath), carrying the
          fresh scent along with the songs of birds.
        model: speech-2.8-hd
        text_validation: This voice sounds natural and pleasant.
        accuracy: 0.7
        need_noise_reduction: false
        need_volume_normalization: false
        aigc_watermark: false
    VoiceCloneResp:
      type: object
      properties:
        input_sensitive:
          type: object
          description: Content safety check result
          properties:
            type:
              type: integer
              description: |-
                The category of the content safety trigger, one of:
                - `0`: Normal
                - `1`: Severe violation
                - `2`: Pornographic
                - `3`: Advertisement
                - `4`: Prohibited content
                - `5`: Abusive language
                - `6`: Terror/violence
                - `7`: Other
        demo_audio:
          type: string
          description: >-
            If both text and model are provided, this field returns a URL to the
            preview audio. Otherwise, it will be empty.
        extra_info:
          type: object
          description: >-
            Preview audio metadata and billing info. Returned only when `text`
            is provided (i.e. preview synthesis happened and was billed). Field
            shape is aligned with `/v1/t2a_v2`.
          properties:
            audio_length:
              type: integer
              format: int64
              description: Preview audio duration in milliseconds.
            audio_sample_rate:
              type: integer
              format: int64
              description: Preview audio sample rate.
            audio_size:
              type: integer
              format: int64
              description: Preview audio file size in bytes.
            bitrate:
              type: integer
              format: int64
              description: Preview audio bitrate.
            word_count:
              type: integer
              format: int64
              description: >-
                Word count of spoken content (includes Chinese characters,
                digits, letters; excludes punctuation).
            usage_characters:
              type: integer
              format: int64
              description: >-
                Number of billable characters consumed by the preview synthesis.
                Use this for reconciliation against your account billing.
        base_resp:
          $ref: '#/components/schemas/VoiceCloneBaseResponse'
      example:
        input_sensitive: false
        input_sensitive_type: 0
        demo_audio: ''
        extra_info:
          audio_length: 11124
          audio_sample_rate: 32000
          audio_size: 179926
          bitrate: 128000
          word_count: 18
          usage_characters: 18
        base_resp:
          status_code: 0
          status_msg: success
    ClonePrompt:
      type: object
      properties:
        prompt_audio:
          type: integer
          format: int64
          description: >-
            The `file_id` of the sample audio, obtained through the [File Upload
            API](/api-reference/file-management-upload). The sample audio must
            be less than 8 seconds.
        prompt_text:
          type: string
          description: >-
            The transcript corresponding to the sample audio. It must match the
            audio content, and end with punctuation.
    VoiceCloneBaseResponse:
      type: object
      required:
        - status_code
      properties:
        status_code:
          type: integer
          format: int64
          description: >-
            Status code.


            - 0: Success

            - 1000: Unknown error

            - 1001: Timeout

            - 1002: Rate limit triggered

            - 1004: Authentication failed

            - 1013: Internal service error

            - 2013: Invalid input format

            - 2038: No cloning permission, please check account verification
            status


            For more information, please refer to the [Error Code
            Reference](/api-reference/errorcode).
        status_msg:
          type: string
          description: The status message
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      bearerFormat: JWT
      description: >-
        `HTTP: Bearer Auth`

        - Security Scheme Type: http

        - HTTP Authorization Scheme: `Bearer API_key`, can be found in [Account
        Management>API
        Keys](https://platform.minimax.io/user-center/basic-information/interface-key).

````