Audio API Reference

Generate natural-sounding speech from text (TTS) and transcribe audio to text (STT) using OpenAI’s Whisper model. Both endpoints follow the OpenAI Audio API format.

Text-to-Speech (TTS)

Convert text into lifelike spoken audio.

Endpoint

POST https://api.withperf.pro/v1/audio/speech

Request Body

Parameter	Type	Required	Default	Description
`model`	string	Yes	-	TTS model (`tts-1` or `tts-1-hd`)
`input`	string	Yes	-	Text to synthesize (max 4096 chars)
`voice`	string	Yes	-	Voice to use
`response_format`	string	No	`mp3`	Audio format
`speed`	number	No	`1.0`	Speaking speed (0.25 to 4.0)

Supported Voices

Voice	Description	Best For
`alloy`	Neutral, balanced	General purpose
`echo`	Warm, conversational	Podcasts, narration
`fable`	Expressive, dramatic	Storytelling
`onyx`	Deep, authoritative	Professional content
`nova`	Friendly, upbeat	Marketing, tutorials
`shimmer`	Clear, gentle	Audiobooks, meditation

Supported Formats

Format	MIME Type	Description
`mp3`	audio/mpeg	Compressed, widely compatible
`opus`	audio/opus	Optimized for streaming
`aac`	audio/aac	Good for mobile
`flac`	audio/flac	Lossless quality
`wav`	audio/wav	Uncompressed
`pcm`	audio/pcm	Raw audio

Models

Model	Quality	Latency	Price
`tts-1`	Standard	Fast	$0.015/1K chars
`tts-1-hd`	High Definition	Slower	$0.030/1K chars

Request Examples

cURL

curl -X POST https://api.withperf.pro/v1/audio/speech \
  -H "Authorization: Bearer pk_live_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Welcome to Perf AI. We make AI routing simple and cost-effective.",
    "voice": "nova",
    "response_format": "mp3",
    "speed": 1.0
  }' \
  --output speech.mp3

JavaScript

const response = await fetch('https://api.withperf.pro/v1/audio/speech', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer pk_live_abc123',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'tts-1',
    input: 'Welcome to Perf AI. We make AI routing simple and cost-effective.',
    voice: 'nova'
  })
});

const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
// Play or download the audio

Python

import requests

response = requests.post(
    'https://api.withperf.pro/v1/audio/speech',
    headers={
        'Authorization': 'Bearer pk_live_abc123',
        'Content-Type': 'application/json'
    },
    json={
        'model': 'tts-1',
        'input': 'Welcome to Perf AI. We make AI routing simple and cost-effective.',
        'voice': 'nova'
    }
)

with open('speech.mp3', 'wb') as f:
    f.write(response.content)

Response

Returns binary audio data with these headers:

Header	Description
`Content-Type`	Audio MIME type (e.g., `audio/mpeg`)
`X-Perf-Request-Id`	Request tracking ID
`X-Perf-Model-Used`	Model that generated the audio
`X-Perf-Cost-Usd`	Generation cost
`X-Perf-Latency-Ms`	Generation latency

Speech-to-Text (Transcription)

Transcribe audio files into text using OpenAI’s Whisper model.

Endpoint

POST https://api.withperf.pro/v1/audio/transcriptions

Request Body (multipart/form-data)

Parameter	Type	Required	Default	Description
`file`	file	Yes	-	Audio file to transcribe
`model`	string	Yes	-	Model (`whisper-1`)
`language`	string	No	auto	ISO-639-1 language code
`response_format`	string	No	`json`	Output format

Supported Audio Formats

MP3 (.mp3)
MP4 (.mp4, .m4a)
MPEG (.mpeg, .mpga)
WAV (.wav)
WebM (.webm)
OGG (.ogg)
FLAC (.flac)

Max file size: 25 MB

Response Formats

Format	Description
`json`	Simple JSON with `text` field
`text`	Plain text only
`srt`	SubRip subtitle format
`vtt`	WebVTT subtitle format
`verbose_json`	Detailed JSON with timestamps

Pricing

Model	Price
`whisper-1`	$0.006/minute of audio

Request Examples

cURL

curl -X POST https://api.withperf.pro/v1/audio/transcriptions \
  -H "Authorization: Bearer pk_live_abc123" \
  -F "file=@meeting.mp3" \
  -F "model=whisper-1" \
  -F "language=en" \
  -F "response_format=json"

JavaScript

const formData = new FormData();
formData.append('file', audioFile);
formData.append('model', 'whisper-1');
formData.append('language', 'en');

const response = await fetch('https://api.withperf.pro/v1/audio/transcriptions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer pk_live_abc123'
  },
  body: formData
});

const data = await response.json();
console.log(data.text);

Python

import requests

with open('meeting.mp3', 'rb') as audio_file:
    response = requests.post(
        'https://api.withperf.pro/v1/audio/transcriptions',
        headers={
            'Authorization': 'Bearer pk_live_abc123'
        },
        files={
            'file': ('meeting.mp3', audio_file, 'audio/mpeg')
        },
        data={
            'model': 'whisper-1',
            'language': 'en'
        }
    )

data = response.json()
print(data['text'])

Response (JSON format)

{
  "text": "Welcome to the weekly team meeting. Today we'll discuss our Q1 goals and the upcoming product launch.",
  "perf": {
    "request_id": "req_trans_abc123",
    "model_used": "whisper-1",
    "audio_duration_seconds": 12.5,
    "cost_usd": 0.00125,
    "latency_ms": 2341
  }
}

Response (verbose_json format)

{
  "text": "Welcome to the weekly team meeting.",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 2.5,
      "text": "Welcome to the weekly team meeting."
    }
  ],
  "language": "en",
  "perf": {
    "request_id": "req_trans_abc123",
    "model_used": "whisper-1",
    "audio_duration_seconds": 2.5,
    "cost_usd": 0.00025,
    "latency_ms": 890
  }
}

Error Responses

400 Bad Request

{
  "error": {
    "type": "invalid_request",
    "message": "input text is required",
    "param": "input"
  }
}

400 File Too Large

{
  "error": {
    "type": "invalid_request",
    "message": "File size exceeds maximum of 25MB"
  }
}

400 Unsupported Format

{
  "error": {
    "type": "invalid_request",
    "message": "Unsupported audio format. Supported: mp3, mp4, mpeg, mpga, m4a, wav, webm"
  }
}

Rate Limits

Tier	TTS Requests/Min	Transcription Min/Day
Free	10	10 minutes
Pro	60	500 minutes
Enterprise	Custom	Custom

Chat API - Text generation with audio input support
Image Generation - Generate images
Video Generation - Generate video content

​Audio API Reference

​Text-to-Speech (TTS)

​Endpoint

​Request Body

​Supported Voices

​Supported Formats

​Models

​Request Examples

​cURL

​JavaScript

​Python

​Response

​Speech-to-Text (Transcription)

​Endpoint

​Request Body (multipart/form-data)

​Supported Audio Formats

​Response Formats

​Pricing

​Request Examples

​cURL

​JavaScript

​Python

​Response (JSON format)

​Response (verbose_json format)

​Error Responses

​400 Bad Request

​400 File Too Large

​400 Unsupported Format

​Rate Limits

​Related Endpoints

Audio API Reference

Text-to-Speech (TTS)

Endpoint

Request Body

Supported Voices

Supported Formats

Models

Request Examples

cURL

JavaScript

Python

Response

Speech-to-Text (Transcription)

Endpoint

Request Body (multipart/form-data)

Supported Audio Formats

Response Formats

Pricing

Request Examples

cURL

JavaScript

Python

Response (JSON format)

Response (verbose_json format)

Error Responses

400 Bad Request

400 File Too Large

400 Unsupported Format

Rate Limits

Related Endpoints