Skip to main content

Audio API Reference

Generate natural-sounding speech from text (TTS) and transcribe audio to text (STT) using OpenAI’s Whisper model. Both endpoints follow the OpenAI Audio API format.

Text-to-Speech (TTS)

Convert text into lifelike spoken audio.

Endpoint

POST https://api.withperf.pro/v1/audio/speech

Request Body

ParameterTypeRequiredDefaultDescription
modelstringYes-TTS model (tts-1 or tts-1-hd)
inputstringYes-Text to synthesize (max 4096 chars)
voicestringYes-Voice to use
response_formatstringNomp3Audio format
speednumberNo1.0Speaking speed (0.25 to 4.0)

Supported Voices

VoiceDescriptionBest For
alloyNeutral, balancedGeneral purpose
echoWarm, conversationalPodcasts, narration
fableExpressive, dramaticStorytelling
onyxDeep, authoritativeProfessional content
novaFriendly, upbeatMarketing, tutorials
shimmerClear, gentleAudiobooks, meditation

Supported Formats

FormatMIME TypeDescription
mp3audio/mpegCompressed, widely compatible
opusaudio/opusOptimized for streaming
aacaudio/aacGood for mobile
flacaudio/flacLossless quality
wavaudio/wavUncompressed
pcmaudio/pcmRaw audio

Models

ModelQualityLatencyPrice
tts-1StandardFast$0.015/1K chars
tts-1-hdHigh DefinitionSlower$0.030/1K chars

Request Examples

cURL

curl -X POST https://api.withperf.pro/v1/audio/speech \
  -H "Authorization: Bearer pk_live_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Welcome to Perf AI. We make AI routing simple and cost-effective.",
    "voice": "nova",
    "response_format": "mp3",
    "speed": 1.0
  }' \
  --output speech.mp3

JavaScript

const response = await fetch('https://api.withperf.pro/v1/audio/speech', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer pk_live_abc123',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'tts-1',
    input: 'Welcome to Perf AI. We make AI routing simple and cost-effective.',
    voice: 'nova'
  })
});

const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
// Play or download the audio

Python

import requests

response = requests.post(
    'https://api.withperf.pro/v1/audio/speech',
    headers={
        'Authorization': 'Bearer pk_live_abc123',
        'Content-Type': 'application/json'
    },
    json={
        'model': 'tts-1',
        'input': 'Welcome to Perf AI. We make AI routing simple and cost-effective.',
        'voice': 'nova'
    }
)

with open('speech.mp3', 'wb') as f:
    f.write(response.content)

Response

Returns binary audio data with these headers:
HeaderDescription
Content-TypeAudio MIME type (e.g., audio/mpeg)
X-Perf-Request-IdRequest tracking ID
X-Perf-Model-UsedModel that generated the audio
X-Perf-Cost-UsdGeneration cost
X-Perf-Latency-MsGeneration latency

Speech-to-Text (Transcription)

Transcribe audio files into text using OpenAI’s Whisper model.

Endpoint

POST https://api.withperf.pro/v1/audio/transcriptions

Request Body (multipart/form-data)

ParameterTypeRequiredDefaultDescription
filefileYes-Audio file to transcribe
modelstringYes-Model (whisper-1)
languagestringNoautoISO-639-1 language code
response_formatstringNojsonOutput format

Supported Audio Formats

  • MP3 (.mp3)
  • MP4 (.mp4, .m4a)
  • MPEG (.mpeg, .mpga)
  • WAV (.wav)
  • WebM (.webm)
  • OGG (.ogg)
  • FLAC (.flac)
Max file size: 25 MB

Response Formats

FormatDescription
jsonSimple JSON with text field
textPlain text only
srtSubRip subtitle format
vttWebVTT subtitle format
verbose_jsonDetailed JSON with timestamps

Pricing

ModelPrice
whisper-1$0.006/minute of audio

Request Examples

cURL

curl -X POST https://api.withperf.pro/v1/audio/transcriptions \
  -H "Authorization: Bearer pk_live_abc123" \
  -F "file=@meeting.mp3" \
  -F "model=whisper-1" \
  -F "language=en" \
  -F "response_format=json"

JavaScript

const formData = new FormData();
formData.append('file', audioFile);
formData.append('model', 'whisper-1');
formData.append('language', 'en');

const response = await fetch('https://api.withperf.pro/v1/audio/transcriptions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer pk_live_abc123'
  },
  body: formData
});

const data = await response.json();
console.log(data.text);

Python

import requests

with open('meeting.mp3', 'rb') as audio_file:
    response = requests.post(
        'https://api.withperf.pro/v1/audio/transcriptions',
        headers={
            'Authorization': 'Bearer pk_live_abc123'
        },
        files={
            'file': ('meeting.mp3', audio_file, 'audio/mpeg')
        },
        data={
            'model': 'whisper-1',
            'language': 'en'
        }
    )

data = response.json()
print(data['text'])

Response (JSON format)

{
  "text": "Welcome to the weekly team meeting. Today we'll discuss our Q1 goals and the upcoming product launch.",
  "perf": {
    "request_id": "req_trans_abc123",
    "model_used": "whisper-1",
    "audio_duration_seconds": 12.5,
    "cost_usd": 0.00125,
    "latency_ms": 2341
  }
}

Response (verbose_json format)

{
  "text": "Welcome to the weekly team meeting.",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 2.5,
      "text": "Welcome to the weekly team meeting."
    }
  ],
  "language": "en",
  "perf": {
    "request_id": "req_trans_abc123",
    "model_used": "whisper-1",
    "audio_duration_seconds": 2.5,
    "cost_usd": 0.00025,
    "latency_ms": 890
  }
}

Error Responses

400 Bad Request

{
  "error": {
    "type": "invalid_request",
    "message": "input text is required",
    "param": "input"
  }
}

400 File Too Large

{
  "error": {
    "type": "invalid_request",
    "message": "File size exceeds maximum of 25MB"
  }
}

400 Unsupported Format

{
  "error": {
    "type": "invalid_request",
    "message": "Unsupported audio format. Supported: mp3, mp4, mpeg, mpga, m4a, wav, webm"
  }
}

Rate Limits

TierTTS Requests/MinTranscription Min/Day
Free1010 minutes
Pro60500 minutes
EnterpriseCustomCustom