API Documentation - Flow TTS Developer Hub

Developer Quickstart

Learn the basics and make your first request with the Flow TTS API in minutes.

Get API Key

1 2 3 4 5 6 7 8 9 10

import requests

url = "https://api.realtime-ai.chat/api/tts/synthesize"
headers = {
    "Authorization": "Bearer YOUR_JWT_TOKEN",
    "Content-Type": "application/json"
}
data = {
    "text": "你好，世界！这是 Flow TTS。",
    "ttsConfig": {
        "TTSType": "flow",
        "VoiceId": "v-female-R2s4N9qJ",
        "Model": "flow_01_turbo",
        "Language": "zh"
    }
}

response = requests.post(url, headers=headers, json=data)
audio = response.content

with open("output.wav", "wb") as f:
    f.write(audio)

Authentication

Learn how to authenticate API requests with JWT tokens

API Endpoints

Explore all available TTS and voice cloning endpoints

Voice Library

Browse 100+ high-quality pre-built voices

Quota & Pricing

Understand API quotas and upgrade options

API Status

Check real-time API uptime and performance

Support

Get help from our developer support team

🔐 Authentication

All API requests require authentication using a JWT (JSON Web Token). You can obtain a JWT token by logging into your account at app.realtime-ai.chat.

Getting Your JWT Token

To authenticate your API requests:

Log in to your account at app.realtime-ai.chat
Open browser DevTools (F12) and run: console.log(window.SupabaseAuthInject.getSession()?.access_token)
Copy the JWT token from the console output
Include the token in all API requests using the Authorization header

Request Headers

Include the following headers in all authenticated requests:

Authorization: Bearer YOUR_JWT_TOKEN
Content-Type: application/json

Token Expiration

JWT tokens expire after 1 hour. If you receive a 401 Unauthorized error, refresh your browser and obtain a new token.

📡 API Endpoints

The Flow TTS API provides four main endpoints for text-to-speech synthesis and voice management.

Base URL

https://api.realtime-ai.chat

POST /api/tts/synthesize

Convert text to speech with high-quality neural voices. Returns complete audio file.

Request Body

Parameter	Type	Required	Description
`text`	string	Required	Text to synthesize (max 5000 characters)
`ttsConfig`	object	Required	TTS configuration object (see below)
`ttsConfig.TTSType`	string	Required	Fixed value: "flow"
`ttsConfig.VoiceId`	string	Required	Voice ID from voice library
`ttsConfig.Model`	string	Optional	TTS model name (default: "flow_01_turbo")
`ttsConfig.Speed`	number	Optional	Speech speed (0.5-2.0, default: 1.0)
`ttsConfig.Volume`	number	Optional	Volume level (0-10, default: 1.0)
`ttsConfig.Pitch`	number	Optional	Pitch adjustment (-12 to 12, default: 0)
`ttsConfig.Language`	string	Optional	Language code (zh/en/ja/ko/yue), strongly recommended

Response

{
  "code": "success",
  "message": "TTS synthesis completed successfully",
  "data": {
    "audio": "base64_encoded_audio_data...",
    "format": "wav",
    "sampleRate": 24000,
    "duration": 3.5
  },
  "quota": {
    "daily": 100,
    "used": 42,
    "remaining": 58
  }
}

Example Request

curl -X POST https://api.realtime-ai.chat/api/tts/synthesize \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test.",
    "ttsConfig": {
      "TTSType": "flow",
      "VoiceId": "v-female-R2s4N9qJ",
      "Model": "flow_01_turbo",
      "Speed": 1.0,
      "Volume": 5.0,
      "Language": "en"
    }
  }'

POST /api/tts/synthesize-stream

Stream audio synthesis in real-time using Server-Sent Events (SSE). Get audio chunks as they're generated.

Request Body

Parameter	Type	Required	Description
`text`	string	Required	Text to synthesize (max 5000 characters)
`ttsConfig`	object	Required	TTS configuration (same structure as /api/tts/synthesize)

SSE Event Format

data: {
  "Type": "audio",
  "ChunkId": 1,
  "Audio": "base64_audio_chunk...",
  "IsEnd": false
}

data: {
  "Type": "audio",
  "ChunkId": 2,
  "Audio": "base64_audio_chunk...",
  "IsEnd": true
}

Example Request (JavaScript)

const response = await fetch('https://api.realtime-ai.chat/api/tts/synthesize-stream', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_JWT_TOKEN',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    text: 'Hello, this is streaming TTS.',
    ttsConfig: {
      TTSType: 'flow',
      VoiceId: 'v-female-R2s4N9qJ',
      Model: 'flow_01_turbo',
      Language: 'en'
    }
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  // Process SSE events
  console.log(chunk);
}

POST /api/voice/clone

Clone a custom voice by uploading a 4-12 second audio sample. Returns a unique voice ID for future synthesis.

Request Body

Parameter	Type	Required	Description
`audio`	string	Required	Base64 encoded audio (WAV, 16kHz mono)
`voiceId`	string	Optional	Custom voice ID (auto-generated if not provided)
`name`	string	Optional	Display name for the cloned voice

Response

{
  "code": "success",
  "message": "Voice cloned successfully",
  "data": {
    "voiceId": "clone-abc123xyz",
    "name": "My Custom Voice",
    "duration": 8.5,
    "createdAt": "2025-01-15T10:30:00Z"
  },
  "quota": {
    "daily": 100,
    "used": 52,
    "remaining": 48
  }
}

Note: Voice cloning consumes 10 quota points per request. Audio must be 4-12 seconds long and contain clear, single-speaker speech.

GET /api/tts/voices

Retrieve the list of available pre-built voices. No authentication required.

Response

{
  "voices": [
    {
      "id": "v-female-R2s4N9qJ",
      "name": "小芮",
      "description": "女声客服",
      "language": "zh-CN",
      "gender": "female"
    },
    {
      "id": "male-qn-qingse",
      "name": "青涩青年音色",
      "description": "男声",
      "language": "zh-CN",
      "gender": "male"
    }
  ]
}

Example Request

curl https://api.realtime-ai.chat/api/tts/voices

🚀 TTS Model: flow_01_turbo

Flow TTS is powered by the flow_01_turbo model, specifically optimized for conversational scenarios.

Key Features

Ultra-Low Latency: As low as 300ms, ideal for real-time conversations and live dubbing
High Quality: Natural, human-like speech with strong expressiveness and colloquial tone
Multi-Language Support: Chinese (zh), English (en), Japanese (ja), Korean (ko), Cantonese (yue)
Streaming Support: SSE-based streaming for edge-to-edge playback with minimal delay
Voice Cloning: Create custom voices from 4-12 second audio samples

Configuration Parameters

The flow_01_turbo model supports the following configuration options:

{
  "TTSType": "flow",          // Required: Fixed value "flow"
  "VoiceId": "xxxx",          // Required: Premium voice ID or cloned voice ID
  "Model": "flow_01_turbo",   // Optional: Default is flow_01_turbo
  "Speed": 1.0,               // Optional: Speech speed [0.5, 2.0], default 1.0
  "Volume": 1.0,              // Optional: Volume (0, 10], default 1.0
  "Pitch": 0,                 // Optional: Pitch [-12, 12], default 0
  "Language": "zh"            // Strongly recommended: ISO 639-1 language code
}

💡 Tip: Always specify the Language parameter to ensure optimal pronunciation and natural pauses. Use "yue" for Cantonese.

🎙️ Voice Library

Flow TTS offers a diverse collection of high-quality neural voices across multiple languages and speaking styles. All voices are powered by the flow_01_turbo model for natural, expressive speech.

Featured Voices

Our most popular voices for common use cases:

Voice ID	Name	Language	Gender	Description
`v-female-R2s4N9qJ`	小芮	Chinese	Female	Professional customer service voice
`male-qn-qingse`	青涩青年	Chinese	Male	Young, energetic male voice
`v-en-female-amy`	Amy	English (US)	Female	Warm, friendly American accent
`v-en-male-brian`	Brian	English (UK)	Male	Professional British narrator

To explore the complete voice library with audio samples, visit the TTS Studio or call the GET /api/tts/voices endpoint.

💎 Quota & Pricing

Flow TTS uses a quota-based pricing model. Each API request consumes quota points based on the operation type.

Quota Consumption

Operation	Quota Cost	Notes
Text to Speech	1 point	Per synthesis request (max 5000 chars)
Streaming TTS	1 point	Per SSE session
Voice Clone	10 points	Per voice cloning request
List Voices	0 points	Free, no authentication required

Pricing Tiers

Free

100

quota per day

All TTS voices
Streaming support
Voice cloning
Email support
Daily quota reset

Pro

500

quota per day

Everything in Free
Priority support
Custom voices
Advanced analytics
SLA guarantee

Max

2000

quota per day

Everything in Pro
Dedicated support
Custom integration
On-premise option
Volume discounts

Contact Sales for Enterprise

⚠️ Error Handling

The API uses standard HTTP status codes and returns detailed error information in JSON format.

Common Error Codes

Status Code	Error Code	Description
400	invalid_request	Missing or invalid request parameters
401	unauthorized	Missing or invalid JWT token
429	quota_exceeded	Daily quota limit reached
500	internal_error	Server error, please retry or contact support

Error Response Format

{
  "code": "quota_exceeded",
  "message": "Daily quota limit reached. Resets at 00:00 UTC.",
  "quota": {
    "daily": 100,
    "used": 100,
    "remaining": 0,
    "resetAt": "2025-01-16T00:00:00Z"
  }
}

🚦 Rate Limits

To ensure fair usage and system stability, the following rate limits apply:

Requests per minute: 60 requests
Concurrent requests: 5 simultaneous connections
Daily quota: Based on your pricing tier (100/500/2000)
Maximum text length: 5000 characters per request
Voice clone audio: 4-12 seconds duration

Rate limit information is included in response headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1642348800

📦 SDKs & Libraries

Official SDKs and community libraries to accelerate your development:

Python SDK

Official Python library with async support

Node.js SDK

NPM package for server-side JavaScript

Go SDK

High-performance Go module

REST API

Direct HTTP integration for any language

More SDKs and examples available in our GitHub repository. Complete examples for multiple languages and scenarios are provided.

✨ Best Practices

Performance Optimization

Use streaming TTS (/api/tts/synthesize-stream) for long texts to reduce latency
Cache generated audio files to avoid redundant API calls
Implement exponential backoff for retries on 429/500 errors
Use connection pooling for high-volume applications

Voice Cloning Tips

Provide clear, noise-free audio samples (4-12 seconds)
Use single-speaker recordings without background music
Ensure consistent audio quality (16kHz mono recommended)
Test cloned voices with diverse text before production use

Security

Never expose JWT tokens in client-side code or public repositories
Rotate tokens regularly (every 24 hours recommended)
Use HTTPS for all API requests
Implement proper error handling to avoid leaking sensitive information

💬 Support

Need help with integration? Our support team is here to assist you.

Email Support

Get help from our team within 24 hours

Community Forum

Join discussions with other developers

Status Page

Check API uptime and incident reports

Flow TTS API Documentation

Developer Quickstart

Authentication

API Endpoints

Voice Library

Quota & Pricing

API Status

Support

🔐 Authentication

Getting Your JWT Token

Request Headers

Token Expiration

📡 API Endpoints

Base URL

Request Body

Response

Example Request

Request Body

SSE Event Format

Example Request (JavaScript)

Request Body

Response

Response

Example Request

🚀 TTS Model: flow_01_turbo

Key Features

Configuration Parameters

🎙️ Voice Library

Featured Voices

💎 Quota & Pricing

Quota Consumption

Pricing Tiers

⚠️ Error Handling

Common Error Codes

Error Response Format

🚦 Rate Limits

📦 SDKs & Libraries

Python SDK

Node.js SDK

Go SDK

REST API

✨ Best Practices

Performance Optimization

Voice Cloning Tips

Security

💬 Support

Email Support

Community Forum

Status Page