Audio Generating

This API allows cloning of a real human voice and using that cloned voice to read specified text aloud.

POST https://api.vidu.com/ent/v2/audio-clone
Field
value
Description
Content-Type
application/json
Data Exchange Format
Authorization
Token {your api key}
Replace {} with your API key
Parameter Name
Type
Required
Description
audio_url
String
Required
The original audio URL (must be accessible).
The model will use the audio specified in this parameter for cloning.
Notes:
1. Supported formats: mp3, m4a, wav.
2. Duration: minimum 10 seconds, maximum 5 minutes.
3. File size must not exceed 20 MB.
4. Audio content must be copyright-free, otherwise it may be removed or destroyed.
voice_id
String
Required
Custom voice ID, e.g., "vidu01".
When creating a custom voice_id should notice:
1. Length range: [8, 256].
2. Must start with an English letter.
3. Allows numbers, letters, -, _.
4. Must not end with -, _.
5. voice_id must be unique; duplicates will result in an error.
prompt_audio_url
String
Optional
Example audio for cloning.
Providing this parameter helps improve timbre similarity and stability. If used, an additional short sample audio must be uploaded.
Notes:
1. Supported formats: mp3, m4a, wav.
2. Duration: less than 8 seconds.
3. File size must not exceed 20 MB.
prompt_text
String
Optional
Text content corresponding to the sample audio.
Must match the sample audio and end with punctuation.
text
String
Required
Preview text for cloned-voice synthesis.
Limit: within 1,000 characters. The model will use the cloned voice to read this text and return an audio preview URL.
Note: The preview will incur standard TTS synthesis charges based on character count.
payload
String
Optional
Pass-through parameter.
No processing, used for data transfer only.Max length: 1,048,576 characters.
Parameter Name
Type
Description
task_id
String
Task ID generated by Vidu
state
String
Processing status.
Possible values:
- queueing: in queue
- success: completed
- failed: failed
voice_id
String
The custom voice ID provided by the user (if the task fails, this field is not returned)
demo_audio
String
If the text parameter was provided in the request, this field returns the audio preview URL; otherwise it will be empty
payload
String
Pass-through parameter value from the request
created_at
String
Task creation timestamp
{
  "task_id": "your_task_id_here",
  "state": "success",
  "voice_id": "your_voice_id_here",
  "demo_audio": "your_demo_audio_here",
  "payload":"",
  "created_at": "2025-01-01T15:41:31.968916Z"
}

The generated cloned voice is temporary.
To permanently retain a cloned voice, you must call this voice in any Tex to Speech API within 168 hours (7 days) — excluding preview behavior within this API.
If unused beyond this period, the voice will be automatically deleted and the credits cost for cloning will not be returned.