中文
API Version 2.0

Flow TTS API Documentation

Build next-generation voice applications with Tencent Cloud's Flow TTS API. Powered by flow_01_turbo model with ultra-low latency (as low as 300ms), high-quality voices, streaming support, and voice cloning capabilities.

Developer Quickstart

Learn the basics and make your first request with the Flow TTS API in minutes.

Get API Key
1 2 3 4 5 6 7 8 9 10
import requests

url = "https://api.realtime-ai.chat/api/tts/synthesize"
headers = {
    "Authorization": "Bearer YOUR_JWT_TOKEN",
    "Content-Type": "application/json"
}
data = {
    "text": "你好,世界!这是 Flow TTS。",
    "ttsConfig": {
        "TTSType": "flow",
        "VoiceId": "v-female-R2s4N9qJ",
        "Model": "flow_01_turbo",
        "Language": "zh"
    }
}

response = requests.post(url, headers=headers, json=data)
audio = response.content

with open("output.wav", "wb") as f:
    f.write(audio)

🔐 Authentication

All API requests require authentication using a JWT (JSON Web Token). You can obtain a JWT token by logging into your account at app.realtime-ai.chat.

Getting Your JWT Token

To authenticate your API requests:

  1. Log in to your account at app.realtime-ai.chat
  2. Open browser DevTools (F12) and run: console.log(window.SupabaseAuthInject.getSession()?.access_token)
  3. Copy the JWT token from the console output
  4. Include the token in all API requests using the Authorization header

Request Headers

Include the following headers in all authenticated requests:

Authorization: Bearer YOUR_JWT_TOKEN
Content-Type: application/json

Token Expiration

JWT tokens expire after 1 hour. If you receive a 401 Unauthorized error, refresh your browser and obtain a new token.

📡 API Endpoints

The Flow TTS API provides four main endpoints for text-to-speech synthesis and voice management.

Base URL

https://api.realtime-ai.chat
POST /api/tts/synthesize

Convert text to speech with high-quality neural voices. Returns complete audio file.

Request Body

Parameter Type Required Description
text string Required Text to synthesize (max 5000 characters)
ttsConfig object Required TTS configuration object (see below)
ttsConfig.TTSType string Required Fixed value: "flow"
ttsConfig.VoiceId string Required Voice ID from voice library
ttsConfig.Model string Optional TTS model name (default: "flow_01_turbo")
ttsConfig.Speed number Optional Speech speed (0.5-2.0, default: 1.0)
ttsConfig.Volume number Optional Volume level (0-10, default: 1.0)
ttsConfig.Pitch number Optional Pitch adjustment (-12 to 12, default: 0)
ttsConfig.Language string Optional Language code (zh/en/ja/ko/yue), strongly recommended

Response

{
  "code": "success",
  "message": "TTS synthesis completed successfully",
  "data": {
    "audio": "base64_encoded_audio_data...",
    "format": "wav",
    "sampleRate": 24000,
    "duration": 3.5
  },
  "quota": {
    "daily": 100,
    "used": 42,
    "remaining": 58
  }
}

Example Request

curl -X POST https://api.realtime-ai.chat/api/tts/synthesize \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test.",
    "ttsConfig": {
      "TTSType": "flow",
      "VoiceId": "v-female-R2s4N9qJ",
      "Model": "flow_01_turbo",
      "Speed": 1.0,
      "Volume": 5.0,
      "Language": "en"
    }
  }'
POST /api/tts/synthesize-stream

Stream audio synthesis in real-time using Server-Sent Events (SSE). Get audio chunks as they're generated.

Request Body

Parameter Type Required Description
text string Required Text to synthesize (max 5000 characters)
ttsConfig object Required TTS configuration (same structure as /api/tts/synthesize)

SSE Event Format

data: {
  "Type": "audio",
  "ChunkId": 1,
  "Audio": "base64_audio_chunk...",
  "IsEnd": false
}

data: {
  "Type": "audio",
  "ChunkId": 2,
  "Audio": "base64_audio_chunk...",
  "IsEnd": true
}

Example Request (JavaScript)

const response = await fetch('https://api.realtime-ai.chat/api/tts/synthesize-stream', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_JWT_TOKEN',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    text: 'Hello, this is streaming TTS.',
    ttsConfig: {
      TTSType: 'flow',
      VoiceId: 'v-female-R2s4N9qJ',
      Model: 'flow_01_turbo',
      Language: 'en'
    }
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  // Process SSE events
  console.log(chunk);
}
POST /api/voice/clone

Clone a custom voice by uploading a 4-12 second audio sample. Returns a unique voice ID for future synthesis.

Request Body

Parameter Type Required Description
audio string Required Base64 encoded audio (WAV, 16kHz mono)
voiceId string Optional Custom voice ID (auto-generated if not provided)
name string Optional Display name for the cloned voice

Response

{
  "code": "success",
  "message": "Voice cloned successfully",
  "data": {
    "voiceId": "clone-abc123xyz",
    "name": "My Custom Voice",
    "duration": 8.5,
    "createdAt": "2025-01-15T10:30:00Z"
  },
  "quota": {
    "daily": 100,
    "used": 52,
    "remaining": 48
  }
}

Note: Voice cloning consumes 10 quota points per request. Audio must be 4-12 seconds long and contain clear, single-speaker speech.

GET /api/tts/voices

Retrieve the list of available pre-built voices. No authentication required.

Response

{
  "voices": [
    {
      "id": "v-female-R2s4N9qJ",
      "name": "小芮",
      "description": "女声客服",
      "language": "zh-CN",
      "gender": "female"
    },
    {
      "id": "male-qn-qingse",
      "name": "青涩青年音色",
      "description": "男声",
      "language": "zh-CN",
      "gender": "male"
    }
  ]
}

Example Request

curl https://api.realtime-ai.chat/api/tts/voices

🚀 TTS Model: flow_01_turbo

Flow TTS is powered by the flow_01_turbo model, specifically optimized for conversational scenarios.

Key Features

Configuration Parameters

The flow_01_turbo model supports the following configuration options:

{
  "TTSType": "flow",          // Required: Fixed value "flow"
  "VoiceId": "xxxx",          // Required: Premium voice ID or cloned voice ID
  "Model": "flow_01_turbo",   // Optional: Default is flow_01_turbo
  "Speed": 1.0,               // Optional: Speech speed [0.5, 2.0], default 1.0
  "Volume": 1.0,              // Optional: Volume (0, 10], default 1.0
  "Pitch": 0,                 // Optional: Pitch [-12, 12], default 0
  "Language": "zh"            // Strongly recommended: ISO 639-1 language code
}

💡 Tip: Always specify the Language parameter to ensure optimal pronunciation and natural pauses. Use "yue" for Cantonese.

🎙️ Voice Library

Flow TTS offers a diverse collection of high-quality neural voices across multiple languages and speaking styles. All voices are powered by the flow_01_turbo model for natural, expressive speech.

Featured Voices

Our most popular voices for common use cases:

Voice ID Name Language Gender Description
v-female-R2s4N9qJ 小芮 Chinese Female Professional customer service voice
male-qn-qingse 青涩青年 Chinese Male Young, energetic male voice
v-en-female-amy Amy English (US) Female Warm, friendly American accent
v-en-male-brian Brian English (UK) Male Professional British narrator

To explore the complete voice library with audio samples, visit the TTS Studio or call the GET /api/tts/voices endpoint.

💎 Quota & Pricing

Flow TTS uses a quota-based pricing model. Each API request consumes quota points based on the operation type.

Quota Consumption

Operation Quota Cost Notes
Text to Speech 1 point Per synthesis request (max 5000 chars)
Streaming TTS 1 point Per SSE session
Voice Clone 10 points Per voice cloning request
List Voices 0 points Free, no authentication required

Pricing Tiers

Free
100
quota per day
  • All TTS voices
  • Streaming support
  • Voice cloning
  • Email support
  • Daily quota reset
Max
2000
quota per day
  • Everything in Pro
  • Dedicated support
  • Custom integration
  • On-premise option
  • Volume discounts

Contact Sales for Enterprise

⚠️ Error Handling

The API uses standard HTTP status codes and returns detailed error information in JSON format.

Common Error Codes

Status Code Error Code Description
400 invalid_request Missing or invalid request parameters
401 unauthorized Missing or invalid JWT token
429 quota_exceeded Daily quota limit reached
500 internal_error Server error, please retry or contact support

Error Response Format

{
  "code": "quota_exceeded",
  "message": "Daily quota limit reached. Resets at 00:00 UTC.",
  "quota": {
    "daily": 100,
    "used": 100,
    "remaining": 0,
    "resetAt": "2025-01-16T00:00:00Z"
  }
}

🚦 Rate Limits

To ensure fair usage and system stability, the following rate limits apply:

Rate limit information is included in response headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1642348800

📦 SDKs & Libraries

Official SDKs and community libraries to accelerate your development:

More SDKs and examples available in our GitHub repository. Complete examples for multiple languages and scenarios are provided.

✨ Best Practices

Performance Optimization

Voice Cloning Tips

Security

💬 Support

Need help with integration? Our support team is here to assist you.