Build next-generation voice applications with Tencent Cloud's Flow TTS API. Powered by flow_01_turbo model with ultra-low latency (as low as 300ms), high-quality voices, streaming support, and voice cloning capabilities.
Learn the basics and make your first request with the Flow TTS API in minutes.
Get API Keyimport requests
url = "https://api.realtime-ai.chat/api/tts/synthesize"
headers = {
"Authorization": "Bearer YOUR_JWT_TOKEN",
"Content-Type": "application/json"
}
data = {
"text": "你好,世界!这是 Flow TTS。",
"ttsConfig": {
"TTSType": "flow",
"VoiceId": "v-female-R2s4N9qJ",
"Model": "flow_01_turbo",
"Language": "zh"
}
}
response = requests.post(url, headers=headers, json=data)
audio = response.content
with open("output.wav", "wb") as f:
f.write(audio)
Learn how to authenticate API requests with JWT tokens
Explore all available TTS and voice cloning endpoints
Browse 100+ high-quality pre-built voices
Understand API quotas and upgrade options
Check real-time API uptime and performance
Get help from our developer support team
All API requests require authentication using a JWT (JSON Web Token). You can obtain a JWT token by logging into your account at app.realtime-ai.chat.
To authenticate your API requests:
console.log(window.SupabaseAuthInject.getSession()?.access_token)Authorization headerInclude the following headers in all authenticated requests:
Authorization: Bearer YOUR_JWT_TOKEN
Content-Type: application/json
JWT tokens expire after 1 hour. If you receive a 401 Unauthorized error, refresh your browser and obtain a new token.
The Flow TTS API provides four main endpoints for text-to-speech synthesis and voice management.
https://api.realtime-ai.chat
Convert text to speech with high-quality neural voices. Returns complete audio file.
| Parameter | Type | Required | Description |
|---|---|---|---|
text |
string | Required | Text to synthesize (max 5000 characters) |
ttsConfig |
object | Required | TTS configuration object (see below) |
ttsConfig.TTSType |
string | Required | Fixed value: "flow" |
ttsConfig.VoiceId |
string | Required | Voice ID from voice library |
ttsConfig.Model |
string | Optional | TTS model name (default: "flow_01_turbo") |
ttsConfig.Speed |
number | Optional | Speech speed (0.5-2.0, default: 1.0) |
ttsConfig.Volume |
number | Optional | Volume level (0-10, default: 1.0) |
ttsConfig.Pitch |
number | Optional | Pitch adjustment (-12 to 12, default: 0) |
ttsConfig.Language |
string | Optional | Language code (zh/en/ja/ko/yue), strongly recommended |
{
"code": "success",
"message": "TTS synthesis completed successfully",
"data": {
"audio": "base64_encoded_audio_data...",
"format": "wav",
"sampleRate": 24000,
"duration": 3.5
},
"quota": {
"daily": 100,
"used": 42,
"remaining": 58
}
}
curl -X POST https://api.realtime-ai.chat/api/tts/synthesize \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, this is a test.",
"ttsConfig": {
"TTSType": "flow",
"VoiceId": "v-female-R2s4N9qJ",
"Model": "flow_01_turbo",
"Speed": 1.0,
"Volume": 5.0,
"Language": "en"
}
}'
Stream audio synthesis in real-time using Server-Sent Events (SSE). Get audio chunks as they're generated.
| Parameter | Type | Required | Description |
|---|---|---|---|
text |
string | Required | Text to synthesize (max 5000 characters) |
ttsConfig |
object | Required | TTS configuration (same structure as /api/tts/synthesize) |
data: {
"Type": "audio",
"ChunkId": 1,
"Audio": "base64_audio_chunk...",
"IsEnd": false
}
data: {
"Type": "audio",
"ChunkId": 2,
"Audio": "base64_audio_chunk...",
"IsEnd": true
}
const response = await fetch('https://api.realtime-ai.chat/api/tts/synthesize-stream', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_JWT_TOKEN',
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: 'Hello, this is streaming TTS.',
ttsConfig: {
TTSType: 'flow',
VoiceId: 'v-female-R2s4N9qJ',
Model: 'flow_01_turbo',
Language: 'en'
}
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
// Process SSE events
console.log(chunk);
}
Clone a custom voice by uploading a 4-12 second audio sample. Returns a unique voice ID for future synthesis.
| Parameter | Type | Required | Description |
|---|---|---|---|
audio |
string | Required | Base64 encoded audio (WAV, 16kHz mono) |
voiceId |
string | Optional | Custom voice ID (auto-generated if not provided) |
name |
string | Optional | Display name for the cloned voice |
{
"code": "success",
"message": "Voice cloned successfully",
"data": {
"voiceId": "clone-abc123xyz",
"name": "My Custom Voice",
"duration": 8.5,
"createdAt": "2025-01-15T10:30:00Z"
},
"quota": {
"daily": 100,
"used": 52,
"remaining": 48
}
}
Note: Voice cloning consumes 10 quota points per request. Audio must be 4-12 seconds long and contain clear, single-speaker speech.
Retrieve the list of available pre-built voices. No authentication required.
{
"voices": [
{
"id": "v-female-R2s4N9qJ",
"name": "小芮",
"description": "女声客服",
"language": "zh-CN",
"gender": "female"
},
{
"id": "male-qn-qingse",
"name": "青涩青年音色",
"description": "男声",
"language": "zh-CN",
"gender": "male"
}
]
}
curl https://api.realtime-ai.chat/api/tts/voices
Flow TTS is powered by the flow_01_turbo model, specifically optimized for conversational scenarios.
The flow_01_turbo model supports the following configuration options:
{
"TTSType": "flow", // Required: Fixed value "flow"
"VoiceId": "xxxx", // Required: Premium voice ID or cloned voice ID
"Model": "flow_01_turbo", // Optional: Default is flow_01_turbo
"Speed": 1.0, // Optional: Speech speed [0.5, 2.0], default 1.0
"Volume": 1.0, // Optional: Volume (0, 10], default 1.0
"Pitch": 0, // Optional: Pitch [-12, 12], default 0
"Language": "zh" // Strongly recommended: ISO 639-1 language code
}
💡 Tip: Always specify the Language parameter to ensure optimal pronunciation and natural pauses. Use "yue" for Cantonese.
Flow TTS offers a diverse collection of high-quality neural voices across multiple languages and speaking styles. All voices are powered by the flow_01_turbo model for natural, expressive speech.
Our most popular voices for common use cases:
| Voice ID | Name | Language | Gender | Description |
|---|---|---|---|---|
v-female-R2s4N9qJ |
小芮 | Chinese | Female | Professional customer service voice |
male-qn-qingse |
青涩青年 | Chinese | Male | Young, energetic male voice |
v-en-female-amy |
Amy | English (US) | Female | Warm, friendly American accent |
v-en-male-brian |
Brian | English (UK) | Male | Professional British narrator |
To explore the complete voice library with audio samples, visit the TTS Studio or call the GET /api/tts/voices endpoint.
Flow TTS uses a quota-based pricing model. Each API request consumes quota points based on the operation type.
| Operation | Quota Cost | Notes |
|---|---|---|
| Text to Speech | 1 point | Per synthesis request (max 5000 chars) |
| Streaming TTS | 1 point | Per SSE session |
| Voice Clone | 10 points | Per voice cloning request |
| List Voices | 0 points | Free, no authentication required |
The API uses standard HTTP status codes and returns detailed error information in JSON format.
| Status Code | Error Code | Description |
|---|---|---|
| 400 | invalid_request | Missing or invalid request parameters |
| 401 | unauthorized | Missing or invalid JWT token |
| 429 | quota_exceeded | Daily quota limit reached |
| 500 | internal_error | Server error, please retry or contact support |
{
"code": "quota_exceeded",
"message": "Daily quota limit reached. Resets at 00:00 UTC.",
"quota": {
"daily": 100,
"used": 100,
"remaining": 0,
"resetAt": "2025-01-16T00:00:00Z"
}
}
To ensure fair usage and system stability, the following rate limits apply:
Rate limit information is included in response headers:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1642348800
Official SDKs and community libraries to accelerate your development:
Official Python library with async support
NPM package for server-side JavaScript
High-performance Go module
Direct HTTP integration for any language
More SDKs and examples available in our GitHub repository. Complete examples for multiple languages and scenarios are provided.
/api/tts/synthesize-stream) for long texts to reduce latencyNeed help with integration? Our support team is here to assist you.