Overview
TTSFM is a free Text-to-Speech API service that provides OpenAI-compatible endpoints using the openai.fm service. It supports multiple voices, audio formats, speed adjustment, and automatic text splitting for long content.
http://p.789ai.top/
Key Features
- π€ Multiple Voices: 6 voices (alloy, echo, fable, onyx, nova, shimmer)
- π΅ Audio Formats: MP3, WAV (always available) + OPUS, AAC, FLAC, PCM (requires ffmpeg)
- π€ OpenAI-Compatible API: Drop-in replacement for OpenAI's TTS API
- β‘ Speed Adjustment: 0.25x to 4.0x playback speed (requires ffmpeg)
- π Format Conversion: Real ffmpeg-based audio format conversion
- β¨ Auto-Combine: Automatic splitting and combining for long text
- π³ Docker Images: Full (with ffmpeg) and Slim (without ffmpeg) variants
- π Capabilities API: Runtime feature detection and availability checking
Docker Image Variants
TTSFM provides two Docker image variants to suit different needs:
dbcccc/ttsfm:latest
Includes:
- ffmpeg for audio processing
- All 6 audio formats (MP3, WAV, OPUS, AAC, FLAC, PCM)
- Speed adjustment (0.25x - 4.0x)
- Format conversion
- MP3 auto-combine for long text
Size: ~200MB
dbcccc/ttsfm:slim
Includes:
- Basic TTS functionality
- 2 audio formats (MP3, WAV only)
- No speed adjustment
- No format conversion
- WAV auto-combine for long text
Size: ~100MB
/api/capabilities endpoint to check available features.
Authentication
API key authentication is optional and disabled by default. When enabled, include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY
Set REQUIRE_API_KEY=true environment variable to enable API key protection.
OpenAI-Compatible API
TTSFM provides a drop-in replacement for OpenAI's Text-to-Speech API. Use the /v1/audio/speech endpoint with the same request format.
POST /v1/audio/speech
Generate speech from text using OpenAI-compatible format.
Request Body
{
"model": "tts-1",
"input": "Hello, world!",
"voice": "alloy",
"response_format": "mp3",
"speed": 1.0
}
Parameters
model(string): Model ID (any value accepted, uses openai.fm)input(string, required): Text to convert to speechvoice(string, required): Voice ID (alloy, echo, fable, onyx, nova, shimmer)response_format(string): Audio format (mp3, wav, opus, aac, flac, pcm). Default: mp3speed(number): Playback speed 0.25-4.0 (requires ffmpeg). Default: 1.0
Response
Returns audio file with appropriate Content-Type header.
Response Headers
Content-Type: MIME type of the audio formatX-Requested-Speed: The speed value requestedX-Speed-Applied: Whether speed adjustment was applied (true/false)X-Chunks-Combined: Number of chunks combined (for long text)
System Capabilities
Check which features are available in the current Docker image variant using the capabilities endpoint.
GET /api/capabilities
Get system capabilities and available features.
Response Example
{
"ffmpeg_available": true,
"image_variant": "full",
"features": {
"speed_adjustment": true,
"format_conversion": true,
"mp3_auto_combine": true,
"basic_formats": true
},
"supported_formats": ["mp3", "wav", "opus", "aac", "flac", "pcm"]
}
Speed Adjustment
Adjust audio playback speed from 0.25x (slower) to 4.0x (faster). This feature requires ffmpeg and is only available in the full Docker image.
Speed Values
0.25- 4x slower0.5- 2x slower1.0- Normal speed (default)1.5- 1.5x faster2.0- 2x faster4.0- 4x faster
Example Request
curl -X POST http://p.789ai.top/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello!",
"voice": "alloy",
"speed": 1.5
}' --output speech.mp3
Format Conversion
TTSFM supports 6 audio formats with real ffmpeg-based conversion for high-quality output.
| Format | MIME Type | Availability | Description |
|---|---|---|---|
mp3 |
audio/mpeg | Always | Direct from openai.fm, best compatibility |
wav |
audio/wav | Always | Direct from openai.fm, uncompressed |
opus |
audio/opus | Full Image | Converted from WAV, internet streaming |
aac |
audio/aac | Full Image | Converted from WAV, digital audio |
flac |
audio/flac | Full Image | Converted from WAV, lossless compression |
pcm |
audio/pcm | Full Image | Converted from WAV, raw samples at 24kHz |
Long Text Handling
TTSFM automatically handles long text by splitting it into chunks and combining the audio output. The openai.fm service has a limit of approximately 1000 characters per request.
Python Package Options
max_length: Maximum characters per chunk (default: 1000)validate_length: Raise error if text exceeds limit (default: False)preserve_words: Split at word boundaries (default: True)auto_combine: Automatically combine chunks (default: True)
Python Package
Install the TTSFM Python package for easy integration into your Python applications.
pip install ttsfm
Basic Usage
from ttsfm import TTSClient, Voice, AudioFormat
# Create client
client = TTSClient()
# Generate speech
response = client.generate_speech(
text="Hello, world!",
voice=Voice.ALLOY,
response_format=AudioFormat.MP3,
speed=1.0
)
# Save to file
response.save_to_file("output.mp3")
Long Text Support
Automatically split and combine long text:
# Auto-combine mode (single file output)
response = client.generate_speech(
text="Very long text...",
voice=Voice.ALLOY,
response_format=AudioFormat.MP3,
auto_combine=True # Default
)
response.save_to_file("combined.mp3")
# Manual chunks mode (multiple files)
responses = client.generate_speech_long_text(
text="Very long text...",
voice=Voice.ALLOY,
response_format=AudioFormat.MP3
)
for i, resp in enumerate(responses, 1):
resp.save_to_file(f"part_{i:03d}.mp3")
Async Client
from ttsfm import AsyncTTSClient, Voice, AudioFormat
import asyncio
async def main():
client = AsyncTTSClient()
response = await client.generate_speech(
text="Hello, async world!",
voice=Voice.ALLOY,
response_format=AudioFormat.MP3
)
response.save_to_file("async_output.mp3")
asyncio.run(main())
WebSocket Streaming
Stream audio generation in real-time using WebSocket for better user experience with long text.
WebSocket Endpoint
ws://p.789ai.top/ws/generate
Message Format
{
"text": "Your text here",
"voice": "alloy",
"format": "mp3",
"speed": 1.0
}
Response Events
start: Generation startedchunk: Audio chunk ready (base64 encoded)complete: All chunks senterror: Error occurred
Error Handling
TTSFM provides clear error messages with helpful hints for troubleshooting.
Common Error Codes
| Code | Description | Solution |
|---|---|---|
ffmpeg_required |
Feature requires ffmpeg (not available in slim image) | Use full Docker image: dbcccc/ttsfm:latest |
invalid_voice |
Voice ID not recognized | Use one of: alloy, echo, fable, onyx, nova, shimmer |
invalid_format |
Audio format not supported | Use: mp3, wav, opus, aac, flac, or pcm |
invalid_speed |
Speed value out of range | Use value between 0.25 and 4.0 |
text_too_long |
Text exceeds maximum length | Enable auto_combine or split text manually |
Example Error Response
{
"error": {
"message": "Format 'opus' requires ffmpeg. Available formats: mp3, wav",
"type": "feature_unavailable_error",
"code": "ffmpeg_required",
"hint": "Use the full Docker image (dbcccc/ttsfm:latest) instead of the slim variant.",
"available_formats": ["mp3", "wav"]
}
}