> ## Documentation Index
> Fetch the complete documentation index at: https://docs.withperf.pro/llms.txt
> Use this file to discover all available pages before exploring further.

# Audio API

> Text-to-speech (TTS) and speech-to-text (STT) transcription

# Audio API Reference

Generate natural-sounding speech from text (TTS) and transcribe audio to text (STT) using OpenAI's Whisper model. Both endpoints follow the OpenAI Audio API format.

## Text-to-Speech (TTS)

Convert text into lifelike spoken audio.

### Endpoint

```
POST https://api.withperf.pro/v1/audio/speech
```

### Request Body

| Parameter         | Type   | Required | Default | Description                         |
| ----------------- | ------ | -------- | ------- | ----------------------------------- |
| `model`           | string | Yes      | -       | TTS model (`tts-1` or `tts-1-hd`)   |
| `input`           | string | Yes      | -       | Text to synthesize (max 4096 chars) |
| `voice`           | string | Yes      | -       | Voice to use                        |
| `response_format` | string | No       | `mp3`   | Audio format                        |
| `speed`           | number | No       | `1.0`   | Speaking speed (0.25 to 4.0)        |

### Supported Voices

| Voice     | Description          | Best For               |
| --------- | -------------------- | ---------------------- |
| `alloy`   | Neutral, balanced    | General purpose        |
| `echo`    | Warm, conversational | Podcasts, narration    |
| `fable`   | Expressive, dramatic | Storytelling           |
| `onyx`    | Deep, authoritative  | Professional content   |
| `nova`    | Friendly, upbeat     | Marketing, tutorials   |
| `shimmer` | Clear, gentle        | Audiobooks, meditation |

### Supported Formats

| Format | MIME Type  | Description                   |
| ------ | ---------- | ----------------------------- |
| `mp3`  | audio/mpeg | Compressed, widely compatible |
| `opus` | audio/opus | Optimized for streaming       |
| `aac`  | audio/aac  | Good for mobile               |
| `flac` | audio/flac | Lossless quality              |
| `wav`  | audio/wav  | Uncompressed                  |
| `pcm`  | audio/pcm  | Raw audio                     |

### Models

| Model      | Quality         | Latency | Price            |
| ---------- | --------------- | ------- | ---------------- |
| `tts-1`    | Standard        | Fast    | \$0.015/1K chars |
| `tts-1-hd` | High Definition | Slower  | \$0.030/1K chars |

### Request Examples

#### cURL

```bash theme={null}
curl -X POST https://api.withperf.pro/v1/audio/speech \
  -H "Authorization: Bearer pk_live_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Welcome to Perf AI. We make AI routing simple and cost-effective.",
    "voice": "nova",
    "response_format": "mp3",
    "speed": 1.0
  }' \
  --output speech.mp3
```

#### JavaScript

```javascript theme={null}
const response = await fetch('https://api.withperf.pro/v1/audio/speech', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer pk_live_abc123',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'tts-1',
    input: 'Welcome to Perf AI. We make AI routing simple and cost-effective.',
    voice: 'nova'
  })
});

const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
// Play or download the audio
```

#### Python

```python theme={null}
import requests

response = requests.post(
    'https://api.withperf.pro/v1/audio/speech',
    headers={
        'Authorization': 'Bearer pk_live_abc123',
        'Content-Type': 'application/json'
    },
    json={
        'model': 'tts-1',
        'input': 'Welcome to Perf AI. We make AI routing simple and cost-effective.',
        'voice': 'nova'
    }
)

with open('speech.mp3', 'wb') as f:
    f.write(response.content)
```

### Response

Returns binary audio data with these headers:

| Header              | Description                          |
| ------------------- | ------------------------------------ |
| `Content-Type`      | Audio MIME type (e.g., `audio/mpeg`) |
| `X-Perf-Request-Id` | Request tracking ID                  |
| `X-Perf-Model-Used` | Model that generated the audio       |
| `X-Perf-Cost-Usd`   | Generation cost                      |
| `X-Perf-Latency-Ms` | Generation latency                   |

***

## Speech-to-Text (Transcription)

Transcribe audio files into text using OpenAI's Whisper model.

### Endpoint

```
POST https://api.withperf.pro/v1/audio/transcriptions
```

### Request Body (multipart/form-data)

| Parameter         | Type   | Required | Default | Description              |
| ----------------- | ------ | -------- | ------- | ------------------------ |
| `file`            | file   | Yes      | -       | Audio file to transcribe |
| `model`           | string | Yes      | -       | Model (`whisper-1`)      |
| `language`        | string | No       | auto    | ISO-639-1 language code  |
| `response_format` | string | No       | `json`  | Output format            |

### Supported Audio Formats

* MP3 (`.mp3`)
* MP4 (`.mp4`, `.m4a`)
* MPEG (`.mpeg`, `.mpga`)
* WAV (`.wav`)
* WebM (`.webm`)
* OGG (`.ogg`)
* FLAC (`.flac`)

**Max file size:** 25 MB

### Response Formats

| Format         | Description                   |
| -------------- | ----------------------------- |
| `json`         | Simple JSON with `text` field |
| `text`         | Plain text only               |
| `srt`          | SubRip subtitle format        |
| `vtt`          | WebVTT subtitle format        |
| `verbose_json` | Detailed JSON with timestamps |

### Pricing

| Model       | Price                   |
| ----------- | ----------------------- |
| `whisper-1` | \$0.006/minute of audio |

### Request Examples

#### cURL

```bash theme={null}
curl -X POST https://api.withperf.pro/v1/audio/transcriptions \
  -H "Authorization: Bearer pk_live_abc123" \
  -F "file=@meeting.mp3" \
  -F "model=whisper-1" \
  -F "language=en" \
  -F "response_format=json"
```

#### JavaScript

```javascript theme={null}
const formData = new FormData();
formData.append('file', audioFile);
formData.append('model', 'whisper-1');
formData.append('language', 'en');

const response = await fetch('https://api.withperf.pro/v1/audio/transcriptions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer pk_live_abc123'
  },
  body: formData
});

const data = await response.json();
console.log(data.text);
```

#### Python

```python theme={null}
import requests

with open('meeting.mp3', 'rb') as audio_file:
    response = requests.post(
        'https://api.withperf.pro/v1/audio/transcriptions',
        headers={
            'Authorization': 'Bearer pk_live_abc123'
        },
        files={
            'file': ('meeting.mp3', audio_file, 'audio/mpeg')
        },
        data={
            'model': 'whisper-1',
            'language': 'en'
        }
    )

data = response.json()
print(data['text'])
```

### Response (JSON format)

```json theme={null}
{
  "text": "Welcome to the weekly team meeting. Today we'll discuss our Q1 goals and the upcoming product launch.",
  "perf": {
    "request_id": "req_trans_abc123",
    "model_used": "whisper-1",
    "audio_duration_seconds": 12.5,
    "cost_usd": 0.00125,
    "latency_ms": 2341
  }
}
```

### Response (verbose\_json format)

```json theme={null}
{
  "text": "Welcome to the weekly team meeting.",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 2.5,
      "text": "Welcome to the weekly team meeting."
    }
  ],
  "language": "en",
  "perf": {
    "request_id": "req_trans_abc123",
    "model_used": "whisper-1",
    "audio_duration_seconds": 2.5,
    "cost_usd": 0.00025,
    "latency_ms": 890
  }
}
```

## Error Responses

### 400 Bad Request

```json theme={null}
{
  "error": {
    "type": "invalid_request",
    "message": "input text is required",
    "param": "input"
  }
}
```

### 400 File Too Large

```json theme={null}
{
  "error": {
    "type": "invalid_request",
    "message": "File size exceeds maximum of 25MB"
  }
}
```

### 400 Unsupported Format

```json theme={null}
{
  "error": {
    "type": "invalid_request",
    "message": "Unsupported audio format. Supported: mp3, mp4, mpeg, mpga, m4a, wav, webm"
  }
}
```

## Rate Limits

| Tier       | TTS Requests/Min | Transcription Min/Day |
| ---------- | ---------------- | --------------------- |
| Free       | 10               | 10 minutes            |
| Pro        | 60               | 500 minutes           |
| Enterprise | Custom           | Custom                |

## Related Endpoints

* [Chat API](./chat) - Text generation with audio input support
* [Image Generation](./images) - Generate images
* [Video Generation](./video) - Generate video content
