Audio API Reference
Generate natural-sounding speech from text (TTS) and transcribe audio to text (STT) using OpenAI’s Whisper model. Both endpoints follow the OpenAI Audio API format.Text-to-Speech (TTS)
Convert text into lifelike spoken audio.Endpoint
Request Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | - | TTS model (tts-1 or tts-1-hd) |
input | string | Yes | - | Text to synthesize (max 4096 chars) |
voice | string | Yes | - | Voice to use |
response_format | string | No | mp3 | Audio format |
speed | number | No | 1.0 | Speaking speed (0.25 to 4.0) |
Supported Voices
| Voice | Description | Best For |
|---|---|---|
alloy | Neutral, balanced | General purpose |
echo | Warm, conversational | Podcasts, narration |
fable | Expressive, dramatic | Storytelling |
onyx | Deep, authoritative | Professional content |
nova | Friendly, upbeat | Marketing, tutorials |
shimmer | Clear, gentle | Audiobooks, meditation |
Supported Formats
| Format | MIME Type | Description |
|---|---|---|
mp3 | audio/mpeg | Compressed, widely compatible |
opus | audio/opus | Optimized for streaming |
aac | audio/aac | Good for mobile |
flac | audio/flac | Lossless quality |
wav | audio/wav | Uncompressed |
pcm | audio/pcm | Raw audio |
Models
| Model | Quality | Latency | Price |
|---|---|---|---|
tts-1 | Standard | Fast | $0.015/1K chars |
tts-1-hd | High Definition | Slower | $0.030/1K chars |
Request Examples
cURL
JavaScript
Python
Response
Returns binary audio data with these headers:| Header | Description |
|---|---|
Content-Type | Audio MIME type (e.g., audio/mpeg) |
X-Perf-Request-Id | Request tracking ID |
X-Perf-Model-Used | Model that generated the audio |
X-Perf-Cost-Usd | Generation cost |
X-Perf-Latency-Ms | Generation latency |
Speech-to-Text (Transcription)
Transcribe audio files into text using OpenAI’s Whisper model.Endpoint
Request Body (multipart/form-data)
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file | file | Yes | - | Audio file to transcribe |
model | string | Yes | - | Model (whisper-1) |
language | string | No | auto | ISO-639-1 language code |
response_format | string | No | json | Output format |
Supported Audio Formats
- MP3 (
.mp3) - MP4 (
.mp4,.m4a) - MPEG (
.mpeg,.mpga) - WAV (
.wav) - WebM (
.webm) - OGG (
.ogg) - FLAC (
.flac)
Response Formats
| Format | Description |
|---|---|
json | Simple JSON with text field |
text | Plain text only |
srt | SubRip subtitle format |
vtt | WebVTT subtitle format |
verbose_json | Detailed JSON with timestamps |
Pricing
| Model | Price |
|---|---|
whisper-1 | $0.006/minute of audio |
Request Examples
cURL
JavaScript
Python
Response (JSON format)
Response (verbose_json format)
Error Responses
400 Bad Request
400 File Too Large
400 Unsupported Format
Rate Limits
| Tier | TTS Requests/Min | Transcription Min/Day |
|---|---|---|
| Free | 10 | 10 minutes |
| Pro | 60 | 500 minutes |
| Enterprise | Custom | Custom |
Related Endpoints
- Chat API - Text generation with audio input support
- Image Generation - Generate images
- Video Generation - Generate video content