Overview

TTSFM is a free Text-to-Speech API service that provides OpenAI-compatible endpoints using the openai.fm service. It supports multiple voices, audio formats, speed adjustment, and automatic text splitting for long content.

Base URL: http://p.789ai.top/

Key Features

  • 🎀 Multiple Voices: 6 voices (alloy, echo, fable, onyx, nova, shimmer)
  • 🎡 Audio Formats: MP3, WAV (always available) + OPUS, AAC, FLAC, PCM (requires ffmpeg)
  • πŸ€– OpenAI-Compatible API: Drop-in replacement for OpenAI's TTS API
  • ⚑ Speed Adjustment: 0.25x to 4.0x playback speed (requires ffmpeg)
  • πŸ”„ Format Conversion: Real ffmpeg-based audio format conversion
  • ✨ Auto-Combine: Automatic splitting and combining for long text
  • 🐳 Docker Images: Full (with ffmpeg) and Slim (without ffmpeg) variants
  • πŸ“Š Capabilities API: Runtime feature detection and availability checking
Version 3.4.1: WebSocket connection fixes, improved Socket.IO configuration, and enhanced error handling!

Docker Image Variants

TTSFM provides two Docker image variants to suit different needs:

Full Image (Recommended)

dbcccc/ttsfm:latest

Includes:

  • ffmpeg for audio processing
  • All 6 audio formats (MP3, WAV, OPUS, AAC, FLAC, PCM)
  • Speed adjustment (0.25x - 4.0x)
  • Format conversion
  • MP3 auto-combine for long text

Size: ~200MB

Slim Image (Lightweight)

dbcccc/ttsfm:slim

Includes:

  • Basic TTS functionality
  • 2 audio formats (MP3, WAV only)
  • No speed adjustment
  • No format conversion
  • WAV auto-combine for long text

Size: ~100MB

Automatic Detection: The system automatically detects which variant is running and provides appropriate error messages when features are unavailable. Use the /api/capabilities endpoint to check available features.

Authentication

API key authentication is optional and disabled by default. When enabled, include your API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Set REQUIRE_API_KEY=true environment variable to enable API key protection.

OpenAI-Compatible API

TTSFM provides a drop-in replacement for OpenAI's Text-to-Speech API. Use the /v1/audio/speech endpoint with the same request format.

POST /v1/audio/speech

Generate speech from text using OpenAI-compatible format.

Request Body
{
  "model": "tts-1",
  "input": "Hello, world!",
  "voice": "alloy",
  "response_format": "mp3",
  "speed": 1.0
}
Parameters
  • model (string): Model ID (any value accepted, uses openai.fm)
  • input (string, required): Text to convert to speech
  • voice (string, required): Voice ID (alloy, echo, fable, onyx, nova, shimmer)
  • response_format (string): Audio format (mp3, wav, opus, aac, flac, pcm). Default: mp3
  • speed (number): Playback speed 0.25-4.0 (requires ffmpeg). Default: 1.0
Response

Returns audio file with appropriate Content-Type header.

Response Headers
  • Content-Type: MIME type of the audio format
  • X-Requested-Speed: The speed value requested
  • X-Speed-Applied: Whether speed adjustment was applied (true/false)
  • X-Chunks-Combined: Number of chunks combined (for long text)

System Capabilities

Check which features are available in the current Docker image variant using the capabilities endpoint.

GET /api/capabilities

Get system capabilities and available features.

Response Example
{
  "ffmpeg_available": true,
  "image_variant": "full",
  "features": {
    "speed_adjustment": true,
    "format_conversion": true,
    "mp3_auto_combine": true,
    "basic_formats": true
  },
  "supported_formats": ["mp3", "wav", "opus", "aac", "flac", "pcm"]
}
Tip: Use this endpoint to check feature availability before making requests. In slim images, advanced features will return helpful error messages.

Speed Adjustment

Adjust audio playback speed from 0.25x (slower) to 4.0x (faster). This feature requires ffmpeg and is only available in the full Docker image.

Speed Values
  • 0.25 - 4x slower
  • 0.5 - 2x slower
  • 1.0 - Normal speed (default)
  • 1.5 - 1.5x faster
  • 2.0 - 2x faster
  • 4.0 - 4x faster
Example Request
curl -X POST http://p.789ai.top/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello!",
    "voice": "alloy",
    "speed": 1.5
  }' --output speech.mp3
Slim Image: Speed adjustment is not available in the slim Docker image. Requests with speed != 1.0 will return an error with instructions to use the full image.

Format Conversion

TTSFM supports 6 audio formats with real ffmpeg-based conversion for high-quality output.

Format MIME Type Availability Description
mp3 audio/mpeg Always Direct from openai.fm, best compatibility
wav audio/wav Always Direct from openai.fm, uncompressed
opus audio/opus Full Image Converted from WAV, internet streaming
aac audio/aac Full Image Converted from WAV, digital audio
flac audio/flac Full Image Converted from WAV, lossless compression
pcm audio/pcm Full Image Converted from WAV, raw samples at 24kHz
Note: MP3 and WAV are available in both full and slim images. OPUS, AAC, FLAC, and PCM require ffmpeg and are only available in the full image.

Long Text Handling

TTSFM automatically handles long text by splitting it into chunks and combining the audio output. The openai.fm service has a limit of approximately 1000 characters per request.

Automatic Splitting: Text longer than the limit is automatically split at sentence boundaries, processed separately, and combined into a single audio file.
Python Package Options
  • max_length: Maximum characters per chunk (default: 1000)
  • validate_length: Raise error if text exceeds limit (default: False)
  • preserve_words: Split at word boundaries (default: True)
  • auto_combine: Automatically combine chunks (default: True)
Note: MP3 auto-combine requires ffmpeg (full image only). WAV auto-combine works in both full and slim images.

Python Package

Install the TTSFM Python package for easy integration into your Python applications.

pip install ttsfm
Basic Usage
from ttsfm import TTSClient, Voice, AudioFormat

# Create client
client = TTSClient()

# Generate speech
response = client.generate_speech(
    text="Hello, world!",
    voice=Voice.ALLOY,
    response_format=AudioFormat.MP3,
    speed=1.0
)

# Save to file
response.save_to_file("output.mp3")
Long Text Support

Automatically split and combine long text:

# Auto-combine mode (single file output)
response = client.generate_speech(
    text="Very long text...",
    voice=Voice.ALLOY,
    response_format=AudioFormat.MP3,
    auto_combine=True  # Default
)
response.save_to_file("combined.mp3")

# Manual chunks mode (multiple files)
responses = client.generate_speech_long_text(
    text="Very long text...",
    voice=Voice.ALLOY,
    response_format=AudioFormat.MP3
)
for i, resp in enumerate(responses, 1):
    resp.save_to_file(f"part_{i:03d}.mp3")
Async Client
from ttsfm import AsyncTTSClient, Voice, AudioFormat
import asyncio

async def main():
    client = AsyncTTSClient()
    response = await client.generate_speech(
        text="Hello, async world!",
        voice=Voice.ALLOY,
        response_format=AudioFormat.MP3
    )
    response.save_to_file("async_output.mp3")

asyncio.run(main())
Documentation: For complete API reference, visit GitHub Repository

WebSocket Streaming

Stream audio generation in real-time using WebSocket for better user experience with long text.

Try it: Visit the WebSocket Demo page to see it in action.
WebSocket Endpoint
ws://p.789ai.top/ws/generate
Message Format
{
  "text": "Your text here",
  "voice": "alloy",
  "format": "mp3",
  "speed": 1.0
}
Response Events
  • start: Generation started
  • chunk: Audio chunk ready (base64 encoded)
  • complete: All chunks sent
  • error: Error occurred

Error Handling

TTSFM provides clear error messages with helpful hints for troubleshooting.

Common Error Codes
Code Description Solution
ffmpeg_required Feature requires ffmpeg (not available in slim image) Use full Docker image: dbcccc/ttsfm:latest
invalid_voice Voice ID not recognized Use one of: alloy, echo, fable, onyx, nova, shimmer
invalid_format Audio format not supported Use: mp3, wav, opus, aac, flac, or pcm
invalid_speed Speed value out of range Use value between 0.25 and 4.0
text_too_long Text exceeds maximum length Enable auto_combine or split text manually
Example Error Response
{
  "error": {
    "message": "Format 'opus' requires ffmpeg. Available formats: mp3, wav",
    "type": "feature_unavailable_error",
    "code": "ffmpeg_required",
    "hint": "Use the full Docker image (dbcccc/ttsfm:latest) instead of the slim variant.",
    "available_formats": ["mp3", "wav"]
  }
}