> ## Documentation Index
> Fetch the complete documentation index at: https://docs.withperf.pro/llms.txt
> Use this file to discover all available pages before exploring further.

# Voice Agents

> Build conversational voice AI agents with Perf

# Voice Agents

Build real-time conversational voice agents powered by your custom instructions, knowledge base, and content safety policies. Voice agents handle speech recognition, natural language understanding, response generation, and text-to-speech — all through a single WebSocket connection.

## How It Works

```
Your App → WebSocket → Perf → LLM + TTS + STT
                ↕
         Audio streaming
       (PCM16, 16kHz, mono)
```

1. Your application opens a WebSocket connection to Perf
2. Perf establishes a real-time voice pipeline (speech-to-text, LLM, text-to-speech)
3. Your app streams microphone audio to Perf, and receives agent audio + transcripts back
4. Content safety policies are evaluated on every turn

## Quick Start

### 1. Create a Voice Agent

In the [Perf Dashboard](https://dashboard.withperf.pro/dashboard/voice/agents), click **Create Agent** and configure:

* **Name** — A label for your agent (e.g. "Customer Support")
* **System Prompt** — Instructions that define the agent's behavior
* **Voice** — Choose from available voices
* **First Message** — What the agent says when a conversation starts
* **Content Policy** (optional) — Attach a policy for PII redaction, blocked terms, or custom safety criteria

### 2. Add the SDK

The fastest way to integrate is the [PerfVoice JavaScript SDK](./sdk):

```html theme={null}
<script src="https://api.withperf.pro/v1/voice/sdk.js"></script>
<script>
  const voice = new PerfVoice({
    apiKey: 'YOUR_API_KEY',
    agentId: 'YOUR_AGENT_ID',
  });

  voice.on('transcript', (role, text) => {
    console.log(role + ': ' + text);
  });

  document.getElementById('startBtn').onclick = () => voice.start();
  document.getElementById('stopBtn').onclick = () => voice.stop();
</script>
```

That's it. The SDK handles microphone capture, audio encoding, WebSocket protocol, audio playback, interruptions, and ping/pong keepalive.

### 3. Test It

Click your start button, allow microphone access, and speak. You should hear the agent respond and see transcripts in the console.

## Features

| Feature                   | Description                                              |
| ------------------------- | -------------------------------------------------------- |
| **Real-time streaming**   | Sub-second latency from speech to agent response         |
| **Interruption handling** | Users can interrupt the agent mid-sentence               |
| **Custom voices**         | Choose from multiple voice options                       |
| **Knowledge base (RAG)**  | Attach document collections for grounded answers         |
| **Web search**            | Enable real-time web search for up-to-date information   |
| **Content safety**        | PII detection, blocked terms, custom criteria            |
| **Loop detection**        | Automatic detection and breaking of conversational loops |
| **Transcripts**           | Real-time agent and user transcripts via events          |

## Integration Options

| Method             | Best For                             | Docs                              |
| ------------------ | ------------------------------------ | --------------------------------- |
| **JavaScript SDK** | Web apps, fastest integration        | [SDK Reference](./sdk)            |
| **Raw WebSocket**  | Full control, custom audio pipelines | [WebSocket Protocol](./websocket) |
| **Python**         | Server-side, IVR systems, telephony  | [Python Integration](./python)    |

## Authentication

Voice agent connections require two parameters:

| Parameter  | Description                                    |
| ---------- | ---------------------------------------------- |
| `api_key`  | Your project API key (format: `pk_live_...`)   |
| `agent_id` | The voice agent ID (from the dashboard or API) |

These are passed as query parameters on the WebSocket URL:

```
wss://api.withperf.pro/v1/voice/conversation?api_key=YOUR_API_KEY&agent_id=YOUR_AGENT_ID
```

## Audio Format

All audio is streamed as **PCM 16-bit, 16kHz, mono, little-endian**:

| Property    | Value                           |
| ----------- | ------------------------------- |
| Encoding    | PCM signed 16-bit integer       |
| Sample rate | 16,000 Hz                       |
| Channels    | 1 (mono)                        |
| Byte order  | Little-endian                   |
| Transport   | Base64-encoded in JSON messages |

## Content Safety

Voice agents support the same content safety policies as the rest of the Perf platform:

* **Blocked terms** — Prevent specific words or phrases in agent responses
* **PII detection** — Detect and redact personally identifiable information
* **Custom criteria** — Define LLM-evaluated safety rules (e.g. "Agent must not provide medical advice")
* **Filler phrases** — Play natural filler audio while safety evaluation runs

Configure policies in the [Dashboard](https://dashboard.withperf.pro/dashboard/policies) and attach them to your voice agent.

## Next Steps

* [JavaScript SDK Reference](./sdk) — Full SDK API documentation
* [WebSocket Protocol](./websocket) — Raw WebSocket integration for advanced use cases
* [Python Integration](./python) — Server-side Python integration
* [Content Policies](../api-reference/policies) — Configure safety policies
