Lip Sync
POST https://api.vidu.com/ent/v2/lip-sync
Request Header
Field | value | Description |
---|---|---|
Content-Type | application/json | Data Exchange Format |
Authorization | Token {your api key} | Replace {} with your API key |
Request Body
Field | Type | Required | Description |
---|---|---|---|
video_url | String | Required | The URL of the original video (must be accessible). The model will use this video to match the lip sync. Note: 1. Supported video formats: mp4, mov, avi. 2. Duration should be between 1 and 600 seconds, recommended duration is between 10 and 120 seconds. 3. File size should not exceed 5GB. 4. The video itself requires an encoding format of H.264. If not, it can be converted using the following methods. Please refer to Encoding Format Conversion; 5. Video content is exempt from portrait rights, otherwise it will be taken down or destroyed 6. Video content must meet the following criteria: - Face must be human (if it's a cartoon, the facial features should be similar to a human). The face should be facing the camera, with a horizontal rotation of no more than 45 degrees and a vertical rotation of no more than 15 degrees. Avoid covering the face, and ensure stable lighting on the face. - There are no restrictions on the audio. |
audio_url | String | Optional | The URL of the audio file. The text and voice tone used in the lip sync video will be based on the content of the audio file. Note: 1. Supported formats: wav, mp3, wma, m4a, aac, ogg. 2. Duration should be greater than 1 second and less than 600 seconds. 3. File size should not exceed 100MB. |
text | String | Optional | The text content used to generate the lip sync video. Note: Text content must be at least 4 characters and no more than 2000 characters (2-1000 Chinese characters or 4-2000 English characters). - If both audio_url and text are provided, the content from audio_url will be used to generate the video. |
speed | Float | Optional | The speech rate, default is 1.0. - 1.0 is the normal speed, the range is [0.5-1.5]. When set to 0.5, the speech is slowest; when set to 1.5, the speech is fastest. - Only effective for text generation. |
character_id | String | Optional | The Character ID. - The system provides a variety of voice types. For detailed voice effects, voice IDs, and corresponding languages, refer to the Character List. - Only effective for text generation |
volume int | Int | Optional | Volume level. - The range is 0 - 10, default is 0, representing normal volume. The higher the value, the higher the volume. - For male voices 1-20 and female voices 1-23, volume adjustment is not supported. refer to the Character List. - Only effective for text generation. |
language | String | Optional | Language. - Multi-language voices must specify the corresponding language during generation. - Available languages are listed in the Character List. - Only effective for text generation. |
callback_url | String | Optional | Callback When creating a task, you need to actively set the callback_url with a POST request. When the video generation task changes its status, Vidu will send a callback request to this URL, containing the latest status of the task. The structure of the callback request content will be the same as the return body of the GET Generation API. The "status" in the callback response includes the following states: - processing: Task is being processed. - success: Task is completed (if sending fails, it will retry the callback three times). - failed: Task failed (if sending fails, it will retry the callback three times). |
Audio driver call example
curl -X POST -H "Authorization: Token {your_api_key}" -H "Content-Type: application/json" -d '{"video_url":"your_video_url","audio_url":"your_audio_url"}' https://api.vidu.com/ent/v2/lip-sync
Text driver call example
curl -X POST -H "Authorization: Token {your_api_key}" -H "Content-Type: application/json" -d '{"video_url":"your_video_url","text":"hello,welcome to use Vidu platform","character_id":"tts_micro_person_xiaoshuai","language":"en-US"}' https://api.vidu.com/ent/v2/lip-sync
Response Body
When creating a lip sync task, the credit usage will not be returned. Please check the credit usage in the task query interface.
Field | Value | Description |
---|---|---|
task_id | String | Task ID |
state | String | It will be returned to a specific processing state: - created created task successfully - queueing task in queue - processing processing - success generation successful - failed task failed |
payload | String | The payload parameter used for this call |
created_at | String | Task creation time |
{"task_id": "your_task_id_here","state": "created","payload":"","created_at": "2025-01-01T15:41:31.968916Z"}