TTSFM API Documentation

Overview

TTSFM is a free Text-to-Speech API service that provides OpenAI-compatible endpoints using the openai.fm service. It supports multiple voices, audio formats, speed adjustment, and automatic text splitting for long content.

Base URL: http://p.789ai.top/

Key Features

🎤 Multiple Voices: 6 voices (alloy, echo, fable, onyx, nova, shimmer)
🎵 Audio Formats: MP3, WAV (always available) + OPUS, AAC, FLAC, PCM (requires ffmpeg)
🤖 OpenAI-Compatible API: Drop-in replacement for OpenAI's TTS API
⚡ Speed Adjustment: 0.25x to 4.0x playback speed (requires ffmpeg)
🔄 Format Conversion: Real ffmpeg-based audio format conversion
✨ Auto-Combine: Automatic splitting and combining for long text
🐳 Docker Images: Full (with ffmpeg) and Slim (without ffmpeg) variants
📊 Capabilities API: Runtime feature detection and availability checking

Version 3.4.1: WebSocket connection fixes, improved Socket.IO configuration, and enhanced error handling!

Docker Image Variants

TTSFM provides two Docker image variants to suit different needs:

Full Image (Recommended)

dbcccc/ttsfm:latest

Includes:

ffmpeg for audio processing
All 6 audio formats (MP3, WAV, OPUS, AAC, FLAC, PCM)
Speed adjustment (0.25x - 4.0x)
Format conversion
MP3 auto-combine for long text

Size: ~200MB

Slim Image (Lightweight)

dbcccc/ttsfm:slim

Includes:

Basic TTS functionality
2 audio formats (MP3, WAV only)
No speed adjustment
No format conversion
WAV auto-combine for long text

Size: ~100MB

Automatic Detection: The system automatically detects which variant is running and provides appropriate error messages when features are unavailable. Use the /api/capabilities endpoint to check available features.

Authentication

API key authentication is optional and disabled by default. When enabled, include your API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Set REQUIRE_API_KEY=true environment variable to enable API key protection.

OpenAI-Compatible API

TTSFM provides a drop-in replacement for OpenAI's Text-to-Speech API. Use the /v1/audio/speech endpoint with the same request format.

POST /v1/audio/speech

Generate speech from text using OpenAI-compatible format.

Request Body

{
  "model": "tts-1",
  "input": "Hello, world!",
  "voice": "alloy",
  "response_format": "mp3",
  "speed": 1.0
}

Parameters

model (string): Model ID (any value accepted, uses openai.fm)
input (string, required): Text to convert to speech
voice (string, required): Voice ID (alloy, echo, fable, onyx, nova, shimmer)
response_format (string): Audio format (mp3, wav, opus, aac, flac, pcm). Default: mp3
speed (number): Playback speed 0.25-4.0 (requires ffmpeg). Default: 1.0

Response

Returns audio file with appropriate Content-Type header.

Response Headers

Content-Type: MIME type of the audio format
X-Requested-Speed: The speed value requested
X-Speed-Applied: Whether speed adjustment was applied (true/false)
X-Chunks-Combined: Number of chunks combined (for long text)

System Capabilities

Check which features are available in the current Docker image variant using the capabilities endpoint.

GET /api/capabilities

Get system capabilities and available features.

Response Example

{
  "ffmpeg_available": true,
  "image_variant": "full",
  "features": {
    "speed_adjustment": true,
    "format_conversion": true,
    "mp3_auto_combine": true,
    "basic_formats": true
  },
  "supported_formats": ["mp3", "wav", "opus", "aac", "flac", "pcm"]
}

Tip: Use this endpoint to check feature availability before making requests. In slim images, advanced features will return helpful error messages.

Speed Adjustment

Adjust audio playback speed from 0.25x (slower) to 4.0x (faster). This feature requires ffmpeg and is only available in the full Docker image.

Speed Values

0.25 - 4x slower
0.5 - 2x slower
1.0 - Normal speed (default)
1.5 - 1.5x faster
2.0 - 2x faster
4.0 - 4x faster

Example Request

curl -X POST http://p.789ai.top/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello!",
    "voice": "alloy",
    "speed": 1.5
  }' --output speech.mp3

Slim Image: Speed adjustment is not available in the slim Docker image. Requests with speed != 1.0 will return an error with instructions to use the full image.

Format Conversion

TTSFM supports 6 audio formats with real ffmpeg-based conversion for high-quality output.

Format	MIME Type	Availability	Description
`mp3`	audio/mpeg	Always	Direct from openai.fm, best compatibility
`wav`	audio/wav	Always	Direct from openai.fm, uncompressed
`opus`	audio/opus	Full Image	Converted from WAV, internet streaming
`aac`	audio/aac	Full Image	Converted from WAV, digital audio
`flac`	audio/flac	Full Image	Converted from WAV, lossless compression
`pcm`	audio/pcm	Full Image	Converted from WAV, raw samples at 24kHz

Note: MP3 and WAV are available in both full and slim images. OPUS, AAC, FLAC, and PCM require ffmpeg and are only available in the full image.

Long Text Handling

TTSFM automatically handles long text by splitting it into chunks and combining the audio output. The openai.fm service has a limit of approximately 1000 characters per request.

Automatic Splitting: Text longer than the limit is automatically split at sentence boundaries, processed separately, and combined into a single audio file.

Python Package Options

max_length: Maximum characters per chunk (default: 1000)
validate_length: Raise error if text exceeds limit (default: False)
preserve_words: Split at word boundaries (default: True)
auto_combine: Automatically combine chunks (default: True)

Note: MP3 auto-combine requires ffmpeg (full image only). WAV auto-combine works in both full and slim images.

Python Package

Install the TTSFM Python package for easy integration into your Python applications.

pip install ttsfm

Basic Usage

from ttsfm import TTSClient, Voice, AudioFormat

# Create client
client = TTSClient()

# Generate speech
response = client.generate_speech(
    text="Hello, world!",
    voice=Voice.ALLOY,
    response_format=AudioFormat.MP3,
    speed=1.0
)

# Save to file
response.save_to_file("output.mp3")

Long Text Support

Automatically split and combine long text:

# Auto-combine mode (single file output)
response = client.generate_speech(
    text="Very long text...",
    voice=Voice.ALLOY,
    response_format=AudioFormat.MP3,
    auto_combine=True  # Default
)
response.save_to_file("combined.mp3")

# Manual chunks mode (multiple files)
responses = client.generate_speech_long_text(
    text="Very long text...",
    voice=Voice.ALLOY,
    response_format=AudioFormat.MP3
)
for i, resp in enumerate(responses, 1):
    resp.save_to_file(f"part_{i:03d}.mp3")

Async Client

from ttsfm import AsyncTTSClient, Voice, AudioFormat
import asyncio

async def main():
    client = AsyncTTSClient()
    response = await client.generate_speech(
        text="Hello, async world!",
        voice=Voice.ALLOY,
        response_format=AudioFormat.MP3
    )
    response.save_to_file("async_output.mp3")

asyncio.run(main())

Documentation: For complete API reference, visit GitHub Repository

WebSocket Streaming

Stream audio generation in real-time using WebSocket for better user experience with long text.

Try it: Visit the WebSocket Demo page to see it in action.

WebSocket Endpoint

ws://p.789ai.top/ws/generate

Message Format

{
  "text": "Your text here",
  "voice": "alloy",
  "format": "mp3",
  "speed": 1.0
}

Response Events

start: Generation started
chunk: Audio chunk ready (base64 encoded)
complete: All chunks sent
error: Error occurred

Error Handling

TTSFM provides clear error messages with helpful hints for troubleshooting.

Common Error Codes

Code	Description	Solution
`ffmpeg_required`	Feature requires ffmpeg (not available in slim image)	Use full Docker image: `dbcccc/ttsfm:latest`
`invalid_voice`	Voice ID not recognized	Use one of: alloy, echo, fable, onyx, nova, shimmer
`invalid_format`	Audio format not supported	Use: mp3, wav, opus, aac, flac, or pcm
`invalid_speed`	Speed value out of range	Use value between 0.25 and 4.0
`text_too_long`	Text exceeds maximum length	Enable auto_combine or split text manually

Example Error Response

{
  "error": {
    "message": "Format 'opus' requires ffmpeg. Available formats: mp3, wav",
    "type": "feature_unavailable_error",
    "code": "ffmpeg_required",
    "hint": "Use the full Docker image (dbcccc/ttsfm:latest) instead of the slim variant.",
    "available_formats": ["mp3", "wav"]
  }
}

docs.table_of_contents