TTS(语音合成)能力对接说明 #
中控 WebSocket 全双工接口 TTS 调用方式的说明,链接方式为 WebSocket 协议,报文皆为使用 UTF-8 编码的 JSON 文本。
调用流程 #
- 建立 WebSocket 连接(无需鉴权),如果在本地启动中控,地址通常为
ws://localhost:8070/v1; - 发送 Starter 包,内容为后续 TTS 请求的通用配置信息,如果格式错误或超过 10 秒未发送会被断开 WebSocket 连接;
- 收到响应,表示鉴权成功或失败;
- 发送 Task 包,内容为特定需要合成的文字和格式信息;
- 收到对应 Task 的数据包;
- 如果当前没有更多语音合成任务,可以直接断开(没有链接断开报文的设计);
sequenceDiagram
participant Client
participant CS as Control System
Client-->>CS: 1. Establish Connection
activate CS
Client->>CS: 2. Request: Starter
CS-->>Client: 3. Response: Authentication
loop
Note right of Client: 重复4、5步至全部请求发送完毕
Client->>CS: 4. Request: Task
loop
Note right of Client: 返回多条合成结果
CS-->>Client: 5. Response
end
end
Client-->>CS: 6. Close Connection
deactivate CS
请求报文格式 #
Starter #
每次建立连接后发送的第一个包,表示此连接的目的和后续数据包的解析方式。格式为 JSON 文本,包含以下字段:
| 字段 | 名称 | 类型 | 默认值 | 说明 |
|---|---|---|---|---|
auth | AuthN Token | string | 空字符串 | 设备鉴权 Token,如服务端开启鉴权则必填 |
type | Workflow Type | string | 必填 | 填写能力对应的服务引擎编号,例如:“TTS3”,完整列表参见快速参考的服务引擎列表部分 |
device | Device ID | string | 空字符串 | 设备 ID,建议填写,以便追溯和定位问题 |
session | Session ID | string | 随机 UUIDv4 | 建议调用者自行生成 Session ID 并填写,以便追溯和定位问题 |
tts | TTS Config | object | 必填 | TTS 专属配置,具体信息见下 |
TTS Config 配置见下:
| 字段 | 名称 | 类型 | 默认值 | 说明 |
|---|---|---|---|---|
qid | Quick ID | string | 空 | 可选字段,表示预配置 QuickID。填写此字段会覆盖 voice, language, style, conversion_id, conversion_transform 等字段,且会覆盖 Starter 包设定的 type 引擎编号。此字段的长度较长,建议以 VARCHAR(512) 类型进行存储 |
language | Language Code | string | zh-CN | 可选字段,待合成的语言,需发音人支持 |
voice | Voice ID | string | 服务引擎不同,默认发音人不同 | 可选字段,可选发音人见快速参考的发音人列表部分 |
pitch_offset | Pitch Offset | float | 0.0 | 可选字段,音调,数值越大越尖锐,越低越低沉,支持范围 [-10, 10] |
style | Style | string | 空 | 可选字段,表示发音人的情感 |
speed_ratio | Speed Ratio | float | 1.0 | 可选字段,语速,数值越大语速越慢,支持范围 [0.5, 2] |
sample_rate | Sample Rate | int | 16000 | 可选字段,采样率,支持:8000, 11025, 16000, 22050, 24000, 32000, 44100, 48000 |
volume | Volume | int | 100 | 可选字段,音量,数值越大声音越大,支持范围 [1, 400] |
format | File Format | string | pcm | 可选字段,音频文件和内容,可能支持 pcm, wav, mp3, silk,但只有 pcm 和 silk 支持流式返回 |
omit_error | Omit Error Message in Response | bool | false | 可选字段,是否删去报错信息,即默认会返回 |
audio | Return Audio Data | bool | true | 可选字段,是否返回音频,默认会返回 |
phone | Return Phonetic Symbols | bool | false | 可选字段,是否返回音素,默认不返回 |
polyphone | Return Polyphone | bool | false | 可选字段,是否返回 query 中的多音字,默认不返回 |
facefeature | Face Feature ID | string | 空字符串 | 可选字段,返回 Face Feature 的对应模型的 ID,空表示不返回 |
conversion_id | Voice Conversion ID | string | 空字符串 | 可选字段,音色迁移模型的 ID,空表示不使用音色迁移 |
conversion_transform | Voice Conversion Transform | int | 0 | 可选字段,音色迁移变调,数值越大越尖锐,越低越低沉,支持范围[-12, +12]。男转女,请调整为+12;女转男,请调整为-12,默认为0 |
subtitle | Subtitle Format | string | 空字符串 | 可选字段,返回格式字幕的格式,空表示不返回,支持:srt |
subtitle_max_length | Subtitle Max Length | int | 0 | 可选字段,返回每行字幕/句级别时间戳的最大字数,0表示不限制字数,仅在返回字幕或句级别时间戳时有效 |
subtitle_cut_by_punc | Subtitle Cut by Punctuation | bool | false | 可选字段,是否根据标点符号对字幕/句级别时间戳进行换行并去掉标点,仅在返回字幕或句级别时间戳时有效。标点符号范围见 字幕换行标点符号 |
subtitle_custom_punc | Custom Subtitle Punctuation | string list | 字幕换行标点符号 | 可选字段,使用自定义用于换行的标点符号,而不使用默认的标点符号,仅在返回字幕或句级别时间戳且 subtitle_cut_by_punc 为 true 时有效。 |
subtitle_punc_keep | Keep Subtitle Punctuation | bool | false | 可选字段,是否保留换行的标点符号,仅在返回字幕或句级别时间戳且 subtitle_cut_by_punc 为 true 时有效。 |
sentence_time | Return Sentence-Level Timestamp | bool | false | 可选字段,是否返回句级别时间戳 |
word_time | Return Word-Level Timestamp | bool | false | 可选字段,是否返回字级别时间戳 |
cache_url | Return Cache URL for Data | bool | false | 可选字段,是否将音频、音素、字幕文件上传 Object Store 存储并返回缓存 URL |
stream_mode | Use Stream Mode | bool | false | 可选字段,是否使用流式模式。当此字段为 true 时,累积 Task 文本直到文本中出现指定的分隔符号后向 TTS 引擎发送请求。TTS 结果数据将按顺序返回,不会返回每个分隔请求的 EOF 包,仅在收到 signal eof 后,返回 EOF 包。默认为 false,收到 Task 后立刻发送请求。请注意,如需在流式模式下使用 SSML,必须保证每个分隔的请求为一个完整的 SSML 语法。目前只有 format 为 pcm 的请求支持使用流式模式。 |
stream_separator | Separator for Stream Mode | string list | ["。", "!", "?", ":", ". ", "!", "?", ": "] | 可选字段,当使用 Stream Mode 时,指定用于表示分隔累积文本的符号,默认为中英文的句号、叹号、问号、冒号 |
Task #
Starter 包发送并成功建立连接后,后续可重复发送多个 Task 来提交合成任务。Task 包格式为 JSON 文本,包含以下字段:
| 字段 | 名称 | 类型 | 默认值 | 说明 |
|---|---|---|---|---|
id | Task ID | string | 随机 UUIDv4 | 可选字段,建议调用者自行生成并填写,用于区分并发请求时不同请求的返回 |
query | Query | string | 必填 | 待合成语音的文本内容。 |
signal | Signal | string | query | 可选字段,使用流式模式时生效。为 eof 时,表示文本发送完毕,将合成累积的未合成文本并返回 EOF 包。默认为 query,表示此 Task 为文本请求。使用流式模式时,必须发送 eof,使用非流式模式时,默认会返回 EOF 包,不需要额外发送 eof。 |
ssml | Use SSML | bool | false | 可选字段,是否使用 SSML 来对合成文本进行标记,写法参考 ONES 使用文档 |
no_cache | Disable Cache | bool | false | 可选字段,是否为当前请求关闭结果缓存,开启后针对当前请求既不会使用缓存结果,也不会将结果存入缓存 |
override | TTS Config | object | 空 | 可选字段,单条 TTS 请求的独立配置,仅为为当前任务完整替换 Starter 报文中的 TTS 配置(注意:是直接替换,而不是将两者合并) |
返回报文格式 #
鉴权结果 #
发送 Starter 请求后会返回包含鉴权结果的报文。格式为 JSON 文本,包含以下字段:
| 字段 | 名称 | 类型 | 是否必现 | 说明 |
|---|---|---|---|---|
service | Service Name | string | Yes | 当前请求对应的服务模块,即auth |
session | Session ID | string | Yes | 当前连接的 Session ID |
status | Status Name | enum | Yes | 当前会话的状态,正常为 ok,失败为 fail |
error | Error Message | string | No | 如果失败,返回的错误信息 |
TTS 结果数据 #
每个成功的 Task 持续返回多个数据包,分别为音频、音频文件地址、音素、音素文件地址、字幕、和字幕文件地址包。同类型数据包按照逻辑顺序依次返回,不保证不同类型数据包的返回顺序。如果在 Starter 请求中未要求返回音素、字幕、Cache URL,则仅返回音频。
返回报文的格式为 JSON 文本,包含以下字段:
| 字段 | 名称 | 类型 | 是否必现 | 说明 |
|---|---|---|---|---|
service | Service Name | string | Yes | 当前请求对应的服务模块,即tts |
session | Session ID | string | Yes | 当前连接的 Session ID |
trace | Trace ID | string | Yes | 当前 Task 对应的 Trace ID |
status | Status Name | enum | Yes | 当前 Task 的状态,正常为 ok,失败为 fail |
error | Error Message | string | No | 如果失败,返回的错误信息 |
tts | TTS Content | object | No | 如果成功,返回的合成结果,具体字段含义见下 |
具体合成结果位于 TTS Content 中:
| 字段 | 名称 | 类型 | 是否必现 | 说明 |
|---|---|---|---|---|
id | Task ID | string | Yes | 当前 Task 对应的 ID |
index | Index No. | int | Yes | 返回音频包、音素包序列号 |
type | Package Type | enum | Yes | 音频包为 audio,音频地址包为 audio_url,音素包为 phone,音素地址包为 phone_url,字幕包为 subtitle,字幕地址包为 subtitle_url,多音字包为 polyphone,Face Feature 包为 facefeature,时间戳包为 timestamp,表示全部发送完毕为eof |
audio_data | Base64-encoded Audio Data | string | No | 音频数据,仅在音频包中有 |
phone_data | Base64-encoded Phonetic Symbols | string | No | 音素数据,仅在音素包中有 |
polyphones | Polyphone Data | object | No | 多音字数据,仅在多音字包中有 |
subtitle_data | Base64-encoded Subtitles | string | No | 字幕数据,仅在字幕包中有 |
sentence_time | Sentence-Level Timestamp | object | No | 句子级别时间戳,仅在时间戳包中有 |
word_times | Word-Level Timestamp | object | No | 字级别时间戳,仅在时间戳包中有 |
facefeature_data | Base64-encoded Face Feature | string | No | Face Feature 数据,仅在 Face Feature 包中有 |
audio_url | URL of Audio File | string | No | 音频文件 URL,仅在音频包中有(使用 TTS2 单次返回接口时,必会返回此字段) |
phone_url | URL of JSON for Phonetic Symbols | string | No | 音素文件 URL,仅在音素包中有(使用 TTS2 单次返回接口时,必会返回此字段) |
subtitle_url | URL of Subtitle File | string | No | 字幕文件 URL,仅在字幕包中有 |
resource | Phonetic Symbols Info | object | No | 音素相关信息,仅在音素包中有(使用 TTS2 单次返回接口时,必会返回此字段) |
音频包 #
包含 Base64 编码的合成音频数据结果。
当请求音频格式为 pcm 时,分为多包流式返回,其他格式会在音频合成后单包返回。
音频包样例:
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "851ab562-ec51-4ad7-bd21-4f4af19875cb",
"tts": {
"id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
"index": 1,
"type": "audio",
"audio_data": "AAAAAAA...DYBfQHDAeMBEwI8AlQCRAIaAu0BpAFCAckARQCv...AAAAAAAAAAAAAAAAAAAAAAAAAAA=="
}
}
音频地址 #
包含上传至 Object Store 存储的音频文件缓存 URL。
音频地址样例:
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "851ab562-ec51-4ad7-bd21-4f4af19875cb",
"tts": {
"id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
"index": 6,
"type": "audio_url",
"audio_url": "https://aigc.blob.core.chinacloudapi.cn/audio/tts-audio/02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb_851ab562-ec51-4ad7-bd21-4f4af19875cb.pcm"
}
}
音素包 #
包含 Base64 编码的合成音素数据结果。
音素包样例:
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "08a5785a-a6a2-4140-b587-a6cead592531",
"tts": {
"id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
"index": 2,
"type": "phone",
"phone_data": "biBuIGkgaSBpIGkgaSBpIDIgMiAjMSAjMSAjMSAjMSAjMSAjMSBoIGggaCBoIGggaCBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBlbmQgZW5kIGVuZCBlbmQgZW5kIGVuZCBlbmQgZW5kIGVuZCBlbmQgZW5kIGVuZCBlbmQgZW5k"
}
}
音素地址 #
包含上传至 Object Store 存储的音素文件缓存 URL。
音素地址样例:
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "08a5785a-a6a2-4140-b587-a6cead592531",
"tts": {
"id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
"index": 7,
"type": "phone_url",
"phone_url": "https://aigc.blob.core.chinacloudapi.cn/audio/tts-phone/02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb_08a5785a-a6a2-4140-b587-a6cead592531.phone"
}
}
字幕 #
包含 Base64 编码的合成字幕数据结果。
字幕包样例:
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "d923181d-9d9b-4be1-9370-40f456be3771",
"tts": {
"id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
"index": 4,
"type": "subtitle",
"subtitle_data": "MQowMDowMDowMCwwMDAgLS0+IDAwOjAwOjAwLDUyOArkvaDlpb3jgIIKCg=="
}
}
字幕地址 #
包含上传至 Object Store 存储的字幕文件缓存 URL。
字幕地址样例:
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "d923181d-9d9b-4be1-9370-40f456be3771",
"tts": {
"id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
"index": 5,
"type": "subtitle_url",
"subtitle_url": "https://aigc.blob.core.chinacloudapi.cn/audio/tts-srt/02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb_d923181d-9d9b-4be1-9370-40f456be3771.srt"
}
}
时间戳包 #
包含句子级别和字级别的时间戳信息。
时间戳样例:
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "d923181d-9d9b-4be1-9370-40f456be3771",
"tts": {
"id": "4d5ed90f-2188-42a0-b4d7-db00f1a2a944",
"index": 7,
"type": "timestamp",
"sentence_time": {
"begin_ms": 7770,
"end_ms": 9140,
"text": "新人起步很不容易"
},
"word_times": [
{"begin_ms": 7770, "end_ms": 7960, "text": "新"},
{"begin_ms": 7960, "end_ms": 8120, "text": "人"},
{"begin_ms": 8120, "end_ms": 8310, "text": "起"},
{"begin_ms": 8310, "end_ms": 8430, "text": "步"},
{"begin_ms": 8430, "end_ms": 8630, "text": "很"},
{"begin_ms": 8630, "end_ms": 8720, "text": "不"},
{"begin_ms": 8720, "end_ms": 8920, "text": "容"},
{"begin_ms": 8920, "end_ms": 9140, "text": "易"}
]
}
}
多音字包 #
包含多音字信息,推荐读音在前,其他读音在后。
多音字样例:
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "b2b2b2b2-b2b2-b2b2-b2b2-b2b2b2b2b2b2",
"tts": {
"id": "a5e8b592-f0b1-46ad-bc97-836cbb010310",
"index": 4,
"type": "polyphone",
"polyphones": [
{
"word": "好",
"phones": ["hao3", "hao4"]
}
]
}
}
Face Feature #
包含 Base64 编码的 Face Feature 数据结果。
Face Feature 样例:
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "b8557f0e-5cb1-44d9-b658-0960adacf906",
"tts": {
"id": "f41fb486-0055-473e-be49-e5de729aecc4",
"index": 5,
"type": "facefeature",
"facefeature_data": "AAAAAAAAAAAAAAAAAAAAAAAAAAA...651AxPPQIKYzzcsng8J7IBPfdKwT24LrI9"
}
}
EOF #
EOF 结果包,表示结果全部发送完毕。
EOF 样例:
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "d923181d-9d9b-4be1-9370-40f456be3771",
"tts": {
"id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
"index": 8,
"type": "eof"
}
}
实际流程样例解析 #
Case 1: 最小配置流程 #
Request: Starter
{
"type": "TTS3",
"tts": {}
}
Response: 1
{
"service": "auth",
"status": "ok",
"session": "49d3af81-f344-4ccf-8231-574ceac1a260"
}
Request: Task
{
"query": "大家好!"
}
Response: 2
{
"service": "tts",
"status": "ok",
"session": "49d3af81-f344-4ccf-8231-574ceac1a260",
"trace": "f2e13c02-c629-4db8-a942-4393583a5182",
"tts": {
"id": "4b69geebj4septyxh72qy885f",
"index": 1,
"type": "audio",
"audio_data": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAA...//wkA/P8CAP//AAD4/wMA0v8oAL3/...AAAAAAAAAAAAA=="
}
}
Response: 3 EOF
{
"service": "tts",
"status": "ok",
"session": "49d3af81-f344-4ccf-8231-574ceac1a260",
"trace": "f2e13c02-c629-4db8-a942-4393583a5182",
"tts": {
"id": "4b69geebj4septyxh72qy885f",
"index": 2,
"type": "eof"
}
}
Case 2: 完整配置流程 #
Request: Starter
{
"auth": "XSMLTGKQVVCPJCQHJZ4VEDMGIY",
"type": "TTS3",
"device": "device-wei",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"tts": {
"language": "zh-CN",
"voice": "xiaoling",
"speed_ratio": 1.05,
"sample_rate": 16000,
"volume": 200,
"phone": true,
"polyphone": true,
"subtitle": "srt",
"sentence_time": true,
"word_time": true,
"cache_url": true,
"facefeature": "0404_jiaboyang_s1",
"conversion_id": "nina",
"conversion_transform": -2
}
}
Response: 1
{
"service": "auth",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a"
}
Request: Task
{
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"query": "你好。",
"ssml": false
}
Response: 2 音频
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
"tts": {
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"index": 1,
"type": "audio",
"audio_data": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAA...//wkA/P8CAP//AAD4/wMA0v8oAL3/...AAAAAAAAAAAAA=="
}
}
Response: 3 音素
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
"tts": {
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"index": 2,
"type": "phone",
"phone_data": "aiBqIGluIGluIGluIGluIGluIG...ZCBlbmQgZW5k"
}
}
Response: 4 时间戳
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
"tts": {
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"index": 3,
"type": "timestamp",
"sentence_time": {
"begin_ms": 500,
"end_ms": 1010,
"text": "你好。"
},
"word_times": [
{"begin_ms": 500, "end_ms": 590, "text": "你"},
{"begin_ms": 590, "end_ms": 1010, "text": "好"}
]
}
}
Response: 5 多音字
```json
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
"tts": {
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"index": 4,
"type": "polyphone",
"polyphones": [
{
"word": "好",
"phones": ["hao3", "hao4"]
}
]
}
}
Response: 6 字幕
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
"tts": {
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"index": 5,
"type": "subtitle",
"subtitle_data": "MQowMDowMDowMCwwMDAgLS0+IDAwOjAwOjAwLDUyOArkvaDlpb3jgIIKCg=="
}
}
Response: 7 字幕地址
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
"tts": {
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"index": 6,
"type": "subtitle_url",
"subtitle_url": "https://aigc.blob.core.chinacloudapi.cn/audio/tts-srt/02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb_d923181d-9d9b-4be1-9370-40f456be3771.srt"
}
}
Response: 8 音频地址
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
"tts": {
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"index": 7,
"type": "audio_url",
"audio_url": "https://aigc.blob.core.chinacloudapi.cn/audio/tts-audio/02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb_851ab562-ec51-4ad7-bd21-4f4af19875cb.pcm"
}
}
Response: 9 音素地址
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "08a5785a-a6a2-4140-b587-a6cead592531",
"tts": {
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"index": 8,
"type": "phone_url",
"phone_url": "https://aigc.blob.core.chinacloudapi.cn/audio/tts-phone/02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb_08a5785a-a6a2-4140-b587-a6cead592531.phone"
}
}
Response: 10 Face Feature
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "08a5785a-a6a2-4140-b587-a6cead592531",
"tts": {
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"index": 9,
"type": "facefeature",
"facefeature_data": "AAAAAAAAAAAAAAAAAAAAAAAAAAA...651AxPPQIKYzzcsng8J7IBPfdKwT24LrI9"
}
}
Response: 11 EOF
{
"service": "tts",
"status": "ok",
"session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
"trace": "08a5785a-a6a2-4140-b587-a6cead592531",
"tts": {
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"index": 10,
"type": "eof"
}
}
Case 3: 使用 TTS 单次返回接口 #
Request: Starter
{
"auth": "XSMLTGKQVVCPJCQHJZ4VEDMGIY",
"type": "TTS2",
"device": "device-wei",
"session": "d6bad1c4-eda9-4abd-8294-7ebb4e72cf89",
"tts": {
"language": "zh-CN",
"voice": "aningfp",
"speed_ratio": 1.05,
"sample_rate": 16000,
"volume": 200,
"format": "mp3",
"phone": true
}
}
Response: 1
{
"service": "auth",
"status": "ok",
"session": "d6bad1c4-eda9-4abd-8294-7ebb4e72cf89"
}
Request: Task
{
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"query": "大家好!",
"ssml": false
}
Response: 2
{
"service": "tts",
"status": "ok",
"session": "d6bad1c4-eda9-4abd-8294-7ebb4e72cf89",
"trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
"tts": {
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"index": 1,
"type": "audio",
"audio_url": "http://tts.dui.ai/runtime/v1/cache/91891217-e1f6-4a91-9a31-1f2ce402b243?productId=914011300"
}
}
Response: 3
{
"service": "tts",
"status": "ok",
"session": "d6bad1c4-eda9-4abd-8294-7ebb4e72cf89",
"trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
"tts": {
"id": "bf3qmpuuk18ktv7cv4b6kzhs9",
"index": 2,
"type": "phone",
"phone_url": "http://tts.dui.ai/runtime/v1/cache/91891217-e1f6-4a91-9a31-1f2ce402b243.json?productId=914011300",
"resource": {
"phonetic_duration_msec": [125, 125, 130, 90, 350, 95, 395],
"phonetic_symbols": ["d", "a4", "j", "ia1", "h", "ao3", "sil"],
"total_duration_msec": 1310
}
}
}
Case 4: 使用流式文字 TTS 接口 #
Request: Starter
{
"type": "TTS3",
"tts": {
"qid": "JQb7Qv:AEA_Z10Mqp9GYwDGdLzMvPzEzIqwo",
"stream_mode": true
}
}
Response: 1
{
"service": "auth",
"status": "ok",
"session": "49d3af81-f344-4ccf-8231-574ceac1a260"
}
Request: 1 Task
{
"query": "大"
}
Request: 2 Task
{
"query": "家"
}
Request: 3 Task
{
"query": "好"
}
Request: 4 EOF
{
"signal": "eof"
}
Response: 2
{
"service": "tts",
"status": "ok",
"session": "49d3af81-f344-4ccf-8231-574ceac1a260",
"trace": "f2e13c02-c629-4db8-a942-4393583a5182",
"tts": {
"id": "4b69geebj4septyxh72qy885f",
"index": 1,
"type": "audio",
"audio_data": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAA...//wkA/P8CAP//AAD4/wMA0v8oAL3/...AAAAAAAAAAAAA=="
}
}
Response: 3 EOF
{
"service": "tts",
"status": "ok",
"session": "49d3af81-f344-4ccf-8231-574ceac1a260",
"trace": "f2e13c02-c629-4db8-a942-4393583a5182",
"tts": {
"id": "4b69geebj4septyxh72qy885f",
"index": 2,
"type": "eof"
}
}