Other Generating
POST https://api.vidu.com/ent/v2/lip-sync
Field | value | Description |
|---|---|---|
Content-Type | application/json | Data Exchange Format |
Authorization | Token {your api key} | Replace {} with your API key |
Field | Type | Required | Description |
|---|---|---|---|
video_url | String | Required | The URL of the original video (must be accessible). The model will use this video to match the lip sync. Note: 1. Supported video formats: mp4, mov, avi. 2. Duration should be between 1 and 600 seconds, recommended duration is between 10 and 120 seconds. 3. File size should not exceed 5GB, the requirement for single-sided pixels is between 360-4096. 4. The video itself requires an encoding format of H.264. If not, it can be converted using the following methods. Please refer toEncoding Format Conversion; 5. Video content is exempt from portrait rights, otherwise it will be taken down or destroyed 6. Video content must meet the following criteria: - Face must be human (if it's a cartoon, the facial features should be similar to a human). - The face should be facing the camera, with a horizontal rotation of no more than 45 degrees and a vertical rotation of no more than 15 degrees. Avoid covering the face, and ensure stable lighting on the face. - There are no restrictions on the audio. |
audio_url | String | Optional | The URL of the audio file. The text and voice tone used in the lip sync video will be based on the content of the audio file. Note: 1. Supported formats: wav, mp3, wma, m4a, aac, ogg. 2. Duration should be greater than 1 second and less than 600 seconds. 3. File size should not exceed 100MB. |
text | String | Optional | The text content used to generate the lip sync video. Note: 1. Text content must be at least 4 characters and no more than 2000 characters (2-1000 Chinese characters or 4-2000 English characters). 2. If both audio_url and text are provided, the content from audio_url will be used to generate the video. 3. Paragraph breaks are marked by newline characters. 4. Pause control: supports custom time intervals between text segments to achieve customized speech pauses. - Usage: insert <#x#> in the text, where x is the pause duration (unit: seconds), range [0.01, 99.99], up to two decimal places.Pause markers must be placed between two pronounceable text segments and cannot be used consecutively. - Example: Hello<#2#>I am vidu<#2#>Nice to meet you |
speed | Float | Optional | The speech rate, default is 1.0. - 1.0 is the normal speed, the range is [0.5-1.5]. When set to 0.5, the speech is slowest; when set to 1.5, the speech is fastest. - Only effective for text generation. |
voice_id | String | Optional | The Voice ID. The system provides a variety of voice types. For detailed voice effects, voice IDs, and corresponding languages, refer to the Voice List. - Only effective for text generation |
ref_photo_url | String | Optional | User-uploaded face reference image URL - When the input video contains multiple faces, the lip-sync API can only select one face as the target for lip synchronization. This parameter specifies which person should be used as the target. - If no face reference image is provided, the system will automatically select the person with the largest face area in the first video frame that contains a detectable face. Note: 1. Supported formats: jpg, jpeg, png, bmp, webp. 2. Image resolution per side must be between 192 – 4096 px. 3. File size must not exceed 10 MB. 4. The image must contain one clear frontal face of a person appearing in the video. |
volume | Int | Optional | Volume level. - The range is 0 - 10, default is 0, representing normal volume. The higher the value, the higher the volume. - Only effective for text generation. |
callback_url | String | Optional | Callback When creating a task, you need to actively set the callback_url with a POST request. When the video generation task changes its status, Vidu will send a callback request to this URL, containing the latest status of the task. The structure of the callback request content will be the same as the return body of the GET Generation API. The "status" in the callback response includes the following states: - processing: Task is being processed. - success: Task is completed (if sending fails, it will retry the callback three times). - failed: Task failed (if sending fails, it will retry the callback three times). Vidu uses a callback signature algorithm for verification, check out the details here: Callback Signature |
Audio driver call example
Text driver call example
Field | Value | Description |
|---|---|---|
task_id | String | Task ID |
state | String | It will be returned to a specific processing state: - created created task successfully - queueing task in queue - processing processing - success generation successful - failed task failed |
payload | String | The payload parameter used for this call |
created_at | String | Task creation time |
{ "task_id": "your_task_id_here", "state": "created", "payload":"", "created_at": "2025-01-01T15:41:31.968916Z" }
On this page
Overview
- Request Header
- Request Body
- Response Body