Audio Generating
This API allows cloning of a real human voice and using that cloned voice to read specified text aloud.
POST https://api.vidu.com/ent/v2/audio-clone
Field | value | Description |
|---|---|---|
Content-Type | application/json | Data Exchange Format |
Authorization | Token {your api key} | Replace {} with your API key |
Parameter Name | Type | Required | Description |
|---|---|---|---|
audio_url | String | Required | The original audio URL (must be accessible). The model will use the audio specified in this parameter for cloning. Notes: 1. Supported formats: mp3, m4a, wav. 2. Duration: minimum 10 seconds, maximum 5 minutes. 3. File size must not exceed 20 MB. 4. Audio content must be copyright-free, otherwise it may be removed or destroyed. |
voice_id | String | Required | Custom voice ID, e.g., "vidu01". When creating a custom voice_id should notice: 1. Length range: [8, 256]. 2. Must start with an English letter. 3. Allows numbers, letters, -, _. 4. Must not end with -, _. 5. voice_id must be unique; duplicates will result in an error. |
prompt_audio_url | String | Optional | Example audio for cloning. Providing this parameter helps improve timbre similarity and stability. If used, an additional short sample audio must be uploaded. Notes: 1. Supported formats: mp3, m4a, wav. 2. Duration: less than 8 seconds. 3. File size must not exceed 20 MB. |
prompt_text | String | Optional | Text content corresponding to the sample audio. Must match the sample audio and end with punctuation. |
text | String | Required | Preview text for cloned-voice synthesis. Limit: within 1,000 characters. The model will use the cloned voice to read this text and return an audio preview URL. Note: The preview will incur standard TTS synthesis charges based on character count. |
payload | String | Optional | Pass-through parameter. No processing, used for data transfer only.Max length: 1,048,576 characters. |
Parameter Name | Type | Description |
|---|---|---|
task_id | String | Task ID generated by Vidu |
state | String | Processing status. Possible values: - queueing: in queue - success: completed - failed: failed |
voice_id | String | The custom voice ID provided by the user (if the task fails, this field is not returned) |
demo_audio | String | If the text parameter was provided in the request, this field returns the audio preview URL; otherwise it will be empty |
payload | String | Pass-through parameter value from the request |
created_at | String | Task creation timestamp |
{ "task_id": "your_task_id_here", "state": "success", "voice_id": "your_voice_id_here", "demo_audio": "your_demo_audio_here", "payload":"", "created_at": "2025-01-01T15:41:31.968916Z" }
The generated cloned voice is temporary.
To permanently retain a cloned voice, you must call this voice in any Tex to Speech API within 168 hours (7 days) — excluding preview behavior within this API.
If unused beyond this period, the voice will be automatically deleted and the credits cost for cloning will not be returned.