TTS 语音合成

TTS(语音合成)能力对接说明 #

中控 WebSocket 全双工接口 TTS 调用方式的说明,链接方式为 WebSocket 协议,报文皆为使用 UTF-8 编码的 JSON 文本。

调用流程 #

  1. 建立 WebSocket 连接(无需鉴权),如果在本地启动中控,地址通常为 ws://localhost:8070/v1
  2. 发送 Starter 包,内容为后续 TTS 请求的通用配置信息,如果格式错误或超过 10 秒未发送会被断开 WebSocket 连接;
  3. 收到响应,表示鉴权成功或失败;
  4. 发送 Task 包,内容为特定需要合成的文字和格式信息;
  5. 收到对应 Task 的数据包;
  6. 如果当前没有更多语音合成任务,可以直接断开(没有链接断开报文的设计);
sequenceDiagram
    participant Client
    participant CS as Control System
    Client-->>CS: 1. Establish Connection
    activate CS
    Client->>CS: 2. Request: Starter
    CS-->>Client: 3. Response: Authentication
    loop
        Note right of Client: 重复4、5步至全部请求发送完毕
        Client->>CS: 4. Request: Task
        loop
          Note right of Client: 返回多条合成结果
          CS-->>Client: 5. Response
        end
    end
    Client-->>CS: 6. Close Connection
    deactivate CS

请求报文格式 #

Starter #

每次建立连接后发送的第一个包,表示此连接的目的和后续数据包的解析方式。格式为 JSON 文本,包含以下字段:

字段名称类型默认值说明
authAuthN Tokenstring空字符串设备鉴权 Token,如服务端开启鉴权则必填
typeWorkflow Typestring必填填写能力对应的服务引擎编号,例如:“TTS3”,完整列表参见快速参考服务引擎列表部分
deviceDevice IDstring空字符串设备 ID,建议填写,以便追溯和定位问题
sessionSession IDstring随机 UUIDv4建议调用者自行生成 Session ID 并填写,以便追溯和定位问题
ttsTTS Configobject必填TTS 专属配置,具体信息见下

TTS Config 配置见下:

字段名称类型默认值说明
qidQuick IDstring可选字段,表示预配置 QuickID。填写此字段会覆盖 voice, language, style, conversion_id, conversion_transform 等字段,且会覆盖 Starter 包设定的 type 引擎编号。此字段的长度较长,建议以 VARCHAR(512) 类型进行存储
languageLanguage Codestringzh-CN可选字段,待合成的语言,需发音人支持
voiceVoice IDstring服务引擎不同,默认发音人不同可选字段,可选发音人见快速参考发音人列表部分
pitch_offsetPitch Offsetfloat0.0可选字段,音调,数值越大越尖锐,越低越低沉,支持范围 [-10, 10]
styleStylestring可选字段,表示发音人的情感
speed_ratioSpeed Ratiofloat1.0可选字段,语速,数值越大语速越慢,支持范围 [0.5, 2]
sample_rateSample Rateint16000可选字段,采样率,支持:8000, 11025, 16000, 22050, 24000, 32000, 44100, 48000
volumeVolumeint100可选字段,音量,数值越大声音越大,支持范围 [1, 400]
formatFile Formatstringpcm可选字段,音频文件和内容,可能支持 pcm, wav, mp3, silk,但只有 pcm 和 silk 支持流式返回
omit_errorOmit Error Message in Responseboolfalse可选字段,是否删去报错信息,即默认会返回
audioReturn Audio Databooltrue可选字段,是否返回音频,默认会返回
phoneReturn Phonetic Symbolsboolfalse可选字段,是否返回音素,默认不返回
polyphoneReturn Polyphoneboolfalse可选字段,是否返回 query 中的多音字,默认不返回
facefeatureFace Feature IDstring空字符串可选字段,返回 Face Feature 的对应模型的 ID,空表示不返回
conversion_idVoice Conversion IDstring空字符串可选字段,音色迁移模型的 ID,空表示不使用音色迁移
conversion_transformVoice Conversion Transformint0可选字段,音色迁移变调,数值越大越尖锐,越低越低沉,支持范围[-12, +12]。男转女,请调整为+12;女转男,请调整为-12,默认为0
subtitleSubtitle Formatstring空字符串可选字段,返回格式字幕的格式,空表示不返回,支持:srt
subtitle_max_lengthSubtitle Max Lengthint0可选字段,返回每行字幕/句级别时间戳的最大字数,0表示不限制字数,仅在返回字幕或句级别时间戳时有效
subtitle_cut_by_puncSubtitle Cut by Punctuationboolfalse可选字段,是否根据标点符号对字幕/句级别时间戳进行换行并去掉标点,仅在返回字幕或句级别时间戳时有效。标点符号范围见 字幕换行标点符号
subtitle_custom_puncCustom Subtitle Punctuationstring list字幕换行标点符号可选字段,使用自定义用于换行的标点符号,而不使用默认的标点符号,仅在返回字幕或句级别时间戳且 subtitle_cut_by_punc 为 true 时有效。
subtitle_punc_keepKeep Subtitle Punctuationboolfalse可选字段,是否保留换行的标点符号,仅在返回字幕或句级别时间戳且 subtitle_cut_by_punc 为 true 时有效。
sentence_timeReturn Sentence-Level Timestampboolfalse可选字段,是否返回句级别时间戳
word_timeReturn Word-Level Timestampboolfalse可选字段,是否返回字级别时间戳
cache_urlReturn Cache URL for Databoolfalse可选字段,是否将音频、音素、字幕文件上传 Object Store 存储并返回缓存 URL
stream_modeUse Stream Modeboolfalse可选字段,是否使用流式模式。当此字段为 true 时,累积 Task 文本直到文本中出现指定的分隔符号后向 TTS 引擎发送请求。TTS 结果数据将按顺序返回,不会返回每个分隔请求的 EOF 包,仅在收到 signal eof 后,返回 EOF 包。默认为 false,收到 Task 后立刻发送请求。请注意,如需在流式模式下使用 SSML,必须保证每个分隔的请求为一个完整的 SSML 语法。目前只有 format 为 pcm 的请求支持使用流式模式。
stream_separatorSeparator for Stream Modestring list["。", "!", "?", ":", ". ", "!", "?", ": "]可选字段,当使用 Stream Mode 时,指定用于表示分隔累积文本的符号,默认为中英文的句号、叹号、问号、冒号

Task #

Starter 包发送并成功建立连接后,后续可重复发送多个 Task 来提交合成任务。Task 包格式为 JSON 文本,包含以下字段:

字段名称类型默认值说明
idTask IDstring随机 UUIDv4可选字段,建议调用者自行生成并填写,用于区分并发请求时不同请求的返回
queryQuerystring必填待合成语音的文本内容。
signalSignalstringquery可选字段,使用流式模式时生效。为 eof 时,表示文本发送完毕,将合成累积的未合成文本并返回 EOF 包。默认为 query,表示此 Task 为文本请求。使用流式模式时,必须发送 eof,使用非流式模式时,默认会返回 EOF 包,不需要额外发送 eof
ssmlUse SSMLboolfalse可选字段,是否使用 SSML 来对合成文本进行标记,写法参考 ONES 使用文档
no_cacheDisable Cacheboolfalse可选字段,是否为当前请求关闭结果缓存,开启后针对当前请求既不会使用缓存结果,也不会将结果存入缓存
overrideTTS Configobject可选字段,单条 TTS 请求的独立配置,仅为为当前任务完整替换 Starter 报文中的 TTS 配置(注意:是直接替换,而不是将两者合并)

返回报文格式 #

鉴权结果 #

发送 Starter 请求后会返回包含鉴权结果的报文。格式为 JSON 文本,包含以下字段:

字段名称类型是否必现说明
serviceService NamestringYes当前请求对应的服务模块,即auth
sessionSession IDstringYes当前连接的 Session ID
statusStatus NameenumYes当前会话的状态,正常为 ok,失败为 fail
errorError MessagestringNo如果失败,返回的错误信息

TTS 结果数据 #

每个成功的 Task 持续返回多个数据包,分别为音频、音频文件地址、音素、音素文件地址、字幕、和字幕文件地址包。同类型数据包按照逻辑顺序依次返回,不保证不同类型数据包的返回顺序。如果在 Starter 请求中未要求返回音素、字幕、Cache URL,则仅返回音频。

返回报文的格式为 JSON 文本,包含以下字段:

字段名称类型是否必现说明
serviceService NamestringYes当前请求对应的服务模块,即tts
sessionSession IDstringYes当前连接的 Session ID
traceTrace IDstringYes当前 Task 对应的 Trace ID
statusStatus NameenumYes当前 Task 的状态,正常为 ok,失败为 fail
errorError MessagestringNo如果失败,返回的错误信息
ttsTTS ContentobjectNo如果成功,返回的合成结果,具体字段含义见下

具体合成结果位于 TTS Content 中:

字段名称类型是否必现说明
idTask IDstringYes当前 Task 对应的 ID
indexIndex No.intYes返回音频包、音素包序列号
typePackage TypeenumYes音频包为 audio,音频地址包为 audio_url,音素包为 phone,音素地址包为 phone_url,字幕包为 subtitle,字幕地址包为 subtitle_url,多音字包为 polyphone,Face Feature 包为 facefeature,时间戳包为 timestamp,表示全部发送完毕为eof
audio_dataBase64-encoded Audio DatastringNo音频数据,仅在音频包中有
phone_dataBase64-encoded Phonetic SymbolsstringNo音素数据,仅在音素包中有
polyphonesPolyphone DataobjectNo多音字数据,仅在多音字包中有
subtitle_dataBase64-encoded SubtitlesstringNo字幕数据,仅在字幕包中有
sentence_timeSentence-Level TimestampobjectNo句子级别时间戳,仅在时间戳包中有
word_timesWord-Level TimestampobjectNo字级别时间戳,仅在时间戳包中有
facefeature_dataBase64-encoded Face FeaturestringNoFace Feature 数据,仅在 Face Feature 包中有
audio_urlURL of Audio FilestringNo音频文件 URL,仅在音频包中有(使用 TTS2 单次返回接口时,必会返回此字段)
phone_urlURL of JSON for Phonetic SymbolsstringNo音素文件 URL,仅在音素包中有(使用 TTS2 单次返回接口时,必会返回此字段)
subtitle_urlURL of Subtitle FilestringNo字幕文件 URL,仅在字幕包中有
resourcePhonetic Symbols InfoobjectNo音素相关信息,仅在音素包中有(使用 TTS2 单次返回接口时,必会返回此字段)

音频包 #

包含 Base64 编码的合成音频数据结果。

当请求音频格式为 pcm 时,分为多包流式返回,其他格式会在音频合成后单包返回。

音频包样例:

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "851ab562-ec51-4ad7-bd21-4f4af19875cb",
  "tts": {
    "id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
    "index": 1,
    "type": "audio",
    "audio_data": "AAAAAAA...DYBfQHDAeMBEwI8AlQCRAIaAu0BpAFCAckARQCv...AAAAAAAAAAAAAAAAAAAAAAAAAAA=="
  }
}

音频地址 #

包含上传至 Object Store 存储的音频文件缓存 URL。

音频地址样例:

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "851ab562-ec51-4ad7-bd21-4f4af19875cb",
  "tts": {
    "id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
    "index": 6,
    "type": "audio_url",
    "audio_url": "https://aigc.blob.core.chinacloudapi.cn/audio/tts-audio/02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb_851ab562-ec51-4ad7-bd21-4f4af19875cb.pcm"
  }
}

音素包 #

包含 Base64 编码的合成音素数据结果。

音素包样例:

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "08a5785a-a6a2-4140-b587-a6cead592531",
  "tts": {
    "id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
    "index": 2,
    "type": "phone",
    "phone_data": "biBuIGkgaSBpIGkgaSBpIDIgMiAjMSAjMSAjMSAjMSAjMSAjMSBoIGggaCBoIGggaCBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBhbyBlbmQgZW5kIGVuZCBlbmQgZW5kIGVuZCBlbmQgZW5kIGVuZCBlbmQgZW5kIGVuZCBlbmQgZW5k"
  }
}

音素地址 #

包含上传至 Object Store 存储的音素文件缓存 URL。

音素地址样例:

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "08a5785a-a6a2-4140-b587-a6cead592531",
  "tts": {
    "id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
    "index": 7,
    "type": "phone_url",
    "phone_url": "https://aigc.blob.core.chinacloudapi.cn/audio/tts-phone/02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb_08a5785a-a6a2-4140-b587-a6cead592531.phone"
  }
}

字幕 #

包含 Base64 编码的合成字幕数据结果。

字幕包样例:

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "d923181d-9d9b-4be1-9370-40f456be3771",
  "tts": {
    "id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
    "index": 4,
    "type": "subtitle",
    "subtitle_data": "MQowMDowMDowMCwwMDAgLS0+IDAwOjAwOjAwLDUyOArkvaDlpb3jgIIKCg=="
  }
}

字幕地址 #

包含上传至 Object Store 存储的字幕文件缓存 URL。

字幕地址样例:

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "d923181d-9d9b-4be1-9370-40f456be3771",
  "tts": {
    "id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
    "index": 5,
    "type": "subtitle_url",
    "subtitle_url": "https://aigc.blob.core.chinacloudapi.cn/audio/tts-srt/02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb_d923181d-9d9b-4be1-9370-40f456be3771.srt"
  }
}

时间戳包 #

包含句子级别和字级别的时间戳信息。

时间戳样例:

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "d923181d-9d9b-4be1-9370-40f456be3771",
  "tts": {
    "id": "4d5ed90f-2188-42a0-b4d7-db00f1a2a944", 
    "index": 7, 
    "type": "timestamp", 
    "sentence_time": {
        "begin_ms": 7770,
        "end_ms": 9140, 
        "text": "新人起步很不容易"
    }, 
    "word_times": [
        {"begin_ms": 7770, "end_ms": 7960, "text": "新"}, 
        {"begin_ms": 7960, "end_ms": 8120, "text": "人"}, 
        {"begin_ms": 8120, "end_ms": 8310, "text": "起"}, 
        {"begin_ms": 8310, "end_ms": 8430, "text": "步"}, 
        {"begin_ms": 8430, "end_ms": 8630, "text": "很"}, 
        {"begin_ms": 8630, "end_ms": 8720, "text": "不"}, 
        {"begin_ms": 8720, "end_ms": 8920, "text": "容"}, 
        {"begin_ms": 8920, "end_ms": 9140, "text": "易"}
    ]
  }
}

多音字包 #

包含多音字信息,推荐读音在前,其他读音在后。

多音字样例:

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "b2b2b2b2-b2b2-b2b2-b2b2-b2b2b2b2b2b2",
  "tts": {
    "id": "a5e8b592-f0b1-46ad-bc97-836cbb010310",
    "index": 4,
    "type": "polyphone",
    "polyphones": [
        {
            "word": "好",
            "phones": ["hao3", "hao4"]
        }
    ]
  }
}

Face Feature #

包含 Base64 编码的 Face Feature 数据结果。

Face Feature 样例:

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "b8557f0e-5cb1-44d9-b658-0960adacf906",
  "tts": {
    "id": "f41fb486-0055-473e-be49-e5de729aecc4",
    "index": 5,
    "type": "facefeature",
    "facefeature_data": "AAAAAAAAAAAAAAAAAAAAAAAAAAA...651AxPPQIKYzzcsng8J7IBPfdKwT24LrI9"
  }
}

EOF #

EOF 结果包,表示结果全部发送完毕。

EOF 样例:

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "d923181d-9d9b-4be1-9370-40f456be3771",
  "tts": {
    "id": "02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb",
    "index": 8,
    "type": "eof"
  }
}

实际流程样例解析 #

Case 1: 最小配置流程 #

Request: Starter

{
  "type": "TTS3",
  "tts": {}
}

Response: 1

{
  "service": "auth",
  "status": "ok",
  "session": "49d3af81-f344-4ccf-8231-574ceac1a260"
}

Request: Task

{
  "query": "大家好!"
}

Response: 2

{
  "service": "tts",
  "status": "ok",
  "session": "49d3af81-f344-4ccf-8231-574ceac1a260",
  "trace": "f2e13c02-c629-4db8-a942-4393583a5182",
  "tts": {
    "id": "4b69geebj4septyxh72qy885f",
    "index": 1,
    "type": "audio",
    "audio_data": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAA...//wkA/P8CAP//AAD4/wMA0v8oAL3/...AAAAAAAAAAAAA=="
  }
}

Response: 3 EOF

{
  "service": "tts",
  "status": "ok",
  "session": "49d3af81-f344-4ccf-8231-574ceac1a260",
  "trace": "f2e13c02-c629-4db8-a942-4393583a5182",
  "tts": {
    "id": "4b69geebj4septyxh72qy885f",
    "index": 2,
    "type": "eof"
  }
}

Case 2: 完整配置流程 #

Request: Starter

{
  "auth": "XSMLTGKQVVCPJCQHJZ4VEDMGIY",
  "type": "TTS3",
  "device": "device-wei",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "tts": {
    "language": "zh-CN",
    "voice": "xiaoling",
    "speed_ratio": 1.05,
    "sample_rate": 16000,
    "volume": 200,
    "phone": true,
    "polyphone": true,
    "subtitle": "srt",
    "sentence_time": true,
    "word_time": true,
    "cache_url": true,
    "facefeature": "0404_jiaboyang_s1",
    "conversion_id": "nina",
    "conversion_transform": -2
  }
}

Response: 1

{
  "service": "auth",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a"
}

Request: Task

{
  "id": "bf3qmpuuk18ktv7cv4b6kzhs9",
  "query": "你好。",
  "ssml": false
}

Response: 2 音频

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
  "tts": {
    "id": "bf3qmpuuk18ktv7cv4b6kzhs9",
    "index": 1,
    "type": "audio",
    "audio_data": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAA...//wkA/P8CAP//AAD4/wMA0v8oAL3/...AAAAAAAAAAAAA=="
  }
}

Response: 3 音素

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
  "tts": {
    "id": "bf3qmpuuk18ktv7cv4b6kzhs9",
    "index": 2,
    "type": "phone",
    "phone_data": "aiBqIGluIGluIGluIGluIGluIG...ZCBlbmQgZW5k"
  }
}

Response: 4 时间戳

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
  "tts": {
    "id": "bf3qmpuuk18ktv7cv4b6kzhs9", 
    "index": 3, 
    "type": "timestamp", 
    "sentence_time": {
      "begin_ms": 500, 
      "end_ms": 1010, 
      "text": "你好。"
    }, 
    "word_times": [
      {"begin_ms": 500, "end_ms": 590, "text": "你"}, 
      {"begin_ms": 590, "end_ms": 1010, "text": "好"}
    ]
  }
}

Response: 5 多音字

```json
{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
  "tts": {
    "id": "bf3qmpuuk18ktv7cv4b6kzhs9",
    "index": 4,
    "type": "polyphone",
    "polyphones": [
        {
            "word": "好",
            "phones": ["hao3", "hao4"]
        }
    ]
}
}

Response: 6 字幕

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
  "tts": {
    "id": "bf3qmpuuk18ktv7cv4b6kzhs9",
    "index": 5,
    "type": "subtitle",
    "subtitle_data": "MQowMDowMDowMCwwMDAgLS0+IDAwOjAwOjAwLDUyOArkvaDlpb3jgIIKCg=="
  }
}

Response: 7 字幕地址

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
  "tts": {
    "id": "bf3qmpuuk18ktv7cv4b6kzhs9",
    "index": 6,
    "type": "subtitle_url",
    "subtitle_url": "https://aigc.blob.core.chinacloudapi.cn/audio/tts-srt/02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb_d923181d-9d9b-4be1-9370-40f456be3771.srt"
  }
}

Response: 8 音频地址

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
  "tts": {
    "id": "bf3qmpuuk18ktv7cv4b6kzhs9",
    "index": 7,
    "type": "audio_url",
    "audio_url": "https://aigc.blob.core.chinacloudapi.cn/audio/tts-audio/02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb_851ab562-ec51-4ad7-bd21-4f4af19875cb.pcm"
  }
}

Response: 9 音素地址

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "08a5785a-a6a2-4140-b587-a6cead592531",
  "tts": {
    "id": "bf3qmpuuk18ktv7cv4b6kzhs9",
    "index": 8,
    "type": "phone_url",
    "phone_url": "https://aigc.blob.core.chinacloudapi.cn/audio/tts-phone/02261a7e-7df8-4554-b2a1-ad2fa8bf2cbb_08a5785a-a6a2-4140-b587-a6cead592531.phone"
  }
}

Response: 10 Face Feature

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "08a5785a-a6a2-4140-b587-a6cead592531",
  "tts": {
    "id": "bf3qmpuuk18ktv7cv4b6kzhs9",
    "index": 9,
    "type": "facefeature",
    "facefeature_data": "AAAAAAAAAAAAAAAAAAAAAAAAAAA...651AxPPQIKYzzcsng8J7IBPfdKwT24LrI9"
  }
}

Response: 11 EOF

{
  "service": "tts",
  "status": "ok",
  "session": "5ef8b534-3b54-47e2-94d9-ff165864ad4a",
  "trace": "08a5785a-a6a2-4140-b587-a6cead592531",
  "tts": {
    "id": "bf3qmpuuk18ktv7cv4b6kzhs9",
    "index": 10,
    "type": "eof"
  }
}

Case 3: 使用 TTS 单次返回接口 #

Request: Starter

{
  "auth": "XSMLTGKQVVCPJCQHJZ4VEDMGIY",
  "type": "TTS2",
  "device": "device-wei",
  "session": "d6bad1c4-eda9-4abd-8294-7ebb4e72cf89",
  "tts": {
    "language": "zh-CN",
    "voice": "aningfp",
    "speed_ratio": 1.05,
    "sample_rate": 16000,
    "volume": 200,
    "format": "mp3",
    "phone": true
  }
}

Response: 1

{
  "service": "auth",
  "status": "ok",
  "session": "d6bad1c4-eda9-4abd-8294-7ebb4e72cf89"
}

Request: Task

{
  "id": "bf3qmpuuk18ktv7cv4b6kzhs9",
  "query": "大家好!",
  "ssml": false
}

Response: 2

{
  "service": "tts",
  "status": "ok",
  "session": "d6bad1c4-eda9-4abd-8294-7ebb4e72cf89",
  "trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
  "tts": {
    "id": "bf3qmpuuk18ktv7cv4b6kzhs9",
    "index": 1,
    "type": "audio",
    "audio_url": "http://tts.dui.ai/runtime/v1/cache/91891217-e1f6-4a91-9a31-1f2ce402b243?productId=914011300"
  }
}

Response: 3

{
  "service": "tts",
  "status": "ok",
  "session": "d6bad1c4-eda9-4abd-8294-7ebb4e72cf89",
  "trace": "335d508e-688f-4fc1-b057-4a4aa78b9ee7",
  "tts": {
    "id": "bf3qmpuuk18ktv7cv4b6kzhs9",
    "index": 2,
    "type": "phone",
    "phone_url": "http://tts.dui.ai/runtime/v1/cache/91891217-e1f6-4a91-9a31-1f2ce402b243.json?productId=914011300",
    "resource": {
      "phonetic_duration_msec": [125, 125, 130, 90, 350, 95, 395],
      "phonetic_symbols": ["d", "a4", "j", "ia1", "h", "ao3", "sil"],
      "total_duration_msec": 1310
    }
  }
}

Case 4: 使用流式文字 TTS 接口 #

Request: Starter

{
  "type": "TTS3",
  "tts": {
    "qid": "JQb7Qv:AEA_Z10Mqp9GYwDGdLzMvPzEzIqwo",
    "stream_mode": true
  }
}

Response: 1

{
  "service": "auth",
  "status": "ok",
  "session": "49d3af81-f344-4ccf-8231-574ceac1a260"
}

Request: 1 Task

{
  "query": "大"
}

Request: 2 Task

{
  "query": "家"
}

Request: 3 Task

{
  "query": "好"
}

Request: 4 EOF

{
  "signal": "eof"
}

Response: 2

{
  "service": "tts",
  "status": "ok",
  "session": "49d3af81-f344-4ccf-8231-574ceac1a260",
  "trace": "f2e13c02-c629-4db8-a942-4393583a5182",
  "tts": {
    "id": "4b69geebj4septyxh72qy885f",
    "index": 1,
    "type": "audio",
    "audio_data": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAA...//wkA/P8CAP//AAD4/wMA0v8oAL3/...AAAAAAAAAAAAA=="
  }
}

Response: 3 EOF

{
  "service": "tts",
  "status": "ok",
  "session": "49d3af81-f344-4ccf-8231-574ceac1a260",
  "trace": "f2e13c02-c629-4db8-a942-4393583a5182",
  "tts": {
    "id": "4b69geebj4septyxh72qy885f",
    "index": 2,
    "type": "eof"
  }
}