> ## Documentation Index
> Fetch the complete documentation index at: https://docs.withperf.pro/llms.txt
> Use this file to discover all available pages before exploring further.

# WebSocket Protocol

> Raw WebSocket integration for voice agents — full protocol reference

# WebSocket Protocol

This page documents the raw WebSocket protocol for Perf Voice Agents. Use this if you need full control over the audio pipeline or are integrating from a platform where the [JavaScript SDK](./sdk) isn't available.

> For most web applications, the [JavaScript SDK](./sdk) is the recommended approach — it handles all of the protocol details described here.

## Connection

### Endpoint

```
wss://api.withperf.pro/v1/voice/conversation
```

### Query Parameters

| Parameter  | Type   | Required | Description                          |
| ---------- | ------ | -------- | ------------------------------------ |
| `api_key`  | string | Yes      | Your project API key (`pk_live_...`) |
| `agent_id` | string | Yes      | Voice agent ID                       |

### Example

```javascript theme={null}
const ws = new WebSocket(
  'wss://api.withperf.pro/v1/voice/conversation?api_key=YOUR_API_KEY&agent_id=YOUR_AGENT_ID'
);
```

## Protocol Flow

```
Client                          Perf
  |                               |
  |------- WebSocket OPEN ------->|
  |                               |--- connects to voice pipeline
  |<-- conversation_initiation ---|
  |                               |
  |--- user_audio_chunk --------->|  (repeat: stream mic audio)
  |<-- audio ---------------------|  (agent speaks back)
  |<-- agent_response ------------|  (agent transcript)
  |<-- user_transcript -----------|  (user transcript)
  |                               |
  |<-- ping ----------------------|  (keepalive)
  |--- pong --------------------->|
  |                               |
  |<-- interruption --------------|  (user spoke over agent)
  |                               |
  |------- WebSocket CLOSE ------>|
```

> **Important:** Do not send audio until you receive the `conversation_initiation_metadata` message. Sending audio before initialization will cause the connection to close with code `1008`.

## Messages: Client → Server

### Send Audio

Stream microphone audio as base64-encoded PCM16 chunks:

```json theme={null}
{
  "user_audio_chunk": "<base64-encoded PCM16 audio>"
}
```

**Audio format:** PCM 16-bit signed integer, 16kHz, mono, little-endian. Base64-encode the raw bytes.

**Recommended chunk size:** 2048 samples (128ms at 16kHz).

#### JavaScript Example: Capture and Send Microphone Audio

```javascript theme={null}
const stream = await navigator.mediaDevices.getUserMedia({
  audio: { sampleRate: 16000, channelCount: 1, echoCancellation: true, noiseSuppression: true }
});

const audioCtx = new AudioContext({ sampleRate: 16000 });
const source = audioCtx.createMediaStreamSource(stream);
const processor = audioCtx.createScriptProcessor(2048, 1, 1);

processor.onaudioprocess = (e) => {
  if (ws.readyState !== WebSocket.OPEN || !ready) return;

  const input = e.inputBuffer.getChannelData(0);
  const pcm16 = new Int16Array(input.length);
  for (let i = 0; i < input.length; i++) {
    const s = Math.max(-1, Math.min(1, input[i]));
    pcm16[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
  }

  const bytes = new Uint8Array(pcm16.buffer);
  let binary = '';
  for (let j = 0; j < bytes.length; j++) {
    binary += String.fromCharCode(bytes[j]);
  }

  ws.send(JSON.stringify({ user_audio_chunk: btoa(binary) }));
};

source.connect(processor);
processor.connect(audioCtx.destination);
```

### Pong (Keepalive Response)

Reply to `ping` messages to keep the connection alive:

```json theme={null}
{
  "type": "pong",
  "event_id": "<event_id from the ping>"
}
```

## Messages: Server → Client

### `conversation_initiation_metadata`

Sent once after connection is established. Signals that the voice pipeline is ready.

```json theme={null}
{
  "type": "conversation_initiation_metadata",
  "conversation_initiation_metadata_event": {
    "conversation_id": "conv_abc123",
    "agent_output_audio_format": "pcm_16000"
  }
}
```

| Field                       | Description                               |
| --------------------------- | ----------------------------------------- |
| `conversation_id`           | Unique session identifier                 |
| `agent_output_audio_format` | Output audio format (usually `pcm_16000`) |

**Start sending audio only after receiving this message.**

### `audio`

Agent speech audio. Base64-encoded PCM16, same format as input.

```json theme={null}
{
  "type": "audio",
  "audio_event": {
    "audio_base_64": "<base64-encoded PCM16 audio>"
  }
}
```

#### JavaScript Example: Play Agent Audio

```javascript theme={null}
let nextPlayTime = 0;
const sources = [];

function playAudio(base64) {
  const bin = atob(base64);
  const bytes = new Uint8Array(bin.length);
  for (let i = 0; i < bin.length; i++) bytes[i] = bin.charCodeAt(i);

  const pcm16 = new Int16Array(bytes.buffer);
  const float32 = new Float32Array(pcm16.length);
  for (let i = 0; i < pcm16.length; i++) float32[i] = pcm16[i] / 32768;

  const buffer = audioCtx.createBuffer(1, float32.length, 16000);
  buffer.getChannelData(0).set(float32);

  const src = audioCtx.createBufferSource();
  src.buffer = buffer;
  src.connect(audioCtx.destination);

  // Schedule sequentially to avoid gaps
  const now = audioCtx.currentTime;
  if (nextPlayTime < now) nextPlayTime = now;
  src.start(nextPlayTime);
  nextPlayTime += buffer.duration;

  // Track for interruption cleanup
  sources.push(src);
  src.onended = () => {
    const idx = sources.indexOf(src);
    if (idx !== -1) sources.splice(idx, 1);
  };
}
```

### `agent_response`

The agent's text response (transcript of what the agent is saying).

```json theme={null}
{
  "type": "agent_response",
  "agent_response_event": {
    "agent_response": "Hello! How can I help you today?"
  }
}
```

### `user_transcript`

Transcript of what the user said.

```json theme={null}
{
  "type": "user_transcript",
  "user_transcription_event": {
    "user_transcript": "I'd like to check on my order status."
  }
}
```

### `interruption`

Sent when the user speaks while the agent is talking. **You must stop all currently playing agent audio immediately** to avoid the agent's voice overlapping with the new response.

```json theme={null}
{
  "type": "interruption"
}
```

```javascript theme={null}
// Handle interruption — stop all playing audio
sources.forEach(s => { try { s.stop(); } catch (e) {} });
sources.length = 0;
nextPlayTime = 0;
```

### `ping`

Keepalive ping. You must respond with a `pong` to keep the connection alive.

```json theme={null}
{
  "type": "ping",
  "ping_event": {
    "event_id": 12345
  }
}
```

## Complete JavaScript Example

A full working implementation using raw WebSocket (no SDK):

```javascript theme={null}
const WS_URL = 'wss://api.withperf.pro/v1/voice/conversation';
const API_KEY = 'YOUR_API_KEY';
const AGENT_ID = 'YOUR_AGENT_ID';

let ws, audioCtx, micStream, ready = false, nextPlay = 0, sources = [];

async function startVoice() {
  // 1. Get microphone
  micStream = await navigator.mediaDevices.getUserMedia({
    audio: { sampleRate: 16000, channelCount: 1, echoCancellation: true, noiseSuppression: true }
  });
  audioCtx = new AudioContext({ sampleRate: 16000 });
  const mic = audioCtx.createMediaStreamSource(micStream);
  const proc = audioCtx.createScriptProcessor(2048, 1, 1);

  // 2. Stream mic audio as base64 PCM16
  proc.onaudioprocess = (e) => {
    if (!ws || ws.readyState !== WebSocket.OPEN || !ready) return;
    const input = e.inputBuffer.getChannelData(0);
    const pcm = new Int16Array(input.length);
    for (let i = 0; i < input.length; i++) {
      const s = Math.max(-1, Math.min(1, input[i]));
      pcm[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
    }
    const bytes = new Uint8Array(pcm.buffer);
    let bin = '';
    for (let i = 0; i < bytes.length; i++) bin += String.fromCharCode(bytes[i]);
    ws.send(JSON.stringify({ user_audio_chunk: btoa(bin) }));
  };
  mic.connect(proc);
  proc.connect(audioCtx.destination);

  // 3. Connect WebSocket
  ws = new WebSocket(WS_URL + '?api_key=' + API_KEY + '&agent_id=' + AGENT_ID);

  ws.onmessage = (event) => {
    if (typeof event.data !== 'string') return;
    const data = JSON.parse(event.data);

    switch (data.type) {
      case 'conversation_initiation_metadata':
        ready = true;
        console.log('Session:', data.conversation_initiation_metadata_event?.conversation_id);
        break;
      case 'audio':
        if (data.audio_event?.audio_base_64) playAudio(data.audio_event.audio_base_64);
        break;
      case 'agent_response':
        console.log('Agent:', data.agent_response_event?.agent_response);
        break;
      case 'user_transcript':
        console.log('You:', data.user_transcription_event?.user_transcript);
        break;
      case 'interruption':
        sources.forEach(s => { try { s.stop(); } catch (e) {} });
        sources = []; nextPlay = 0;
        break;
      case 'ping':
        ws.send(JSON.stringify({ type: 'pong', event_id: data.ping_event?.event_id }));
        break;
    }
  };

  ws.onclose = () => stopVoice();
}

// 4. Play agent audio (base64 PCM16 → AudioBuffer)
function playAudio(base64) {
  if (!audioCtx) return;
  const bin = atob(base64), bytes = new Uint8Array(bin.length);
  for (let i = 0; i < bin.length; i++) bytes[i] = bin.charCodeAt(i);
  const pcm = new Int16Array(bytes.buffer);
  const f32 = new Float32Array(pcm.length);
  for (let i = 0; i < pcm.length; i++) f32[i] = pcm[i] / 32768;
  const buf = audioCtx.createBuffer(1, f32.length, 16000);
  buf.getChannelData(0).set(f32);
  const src = audioCtx.createBufferSource();
  src.buffer = buf;
  src.connect(audioCtx.destination);
  const now = audioCtx.currentTime;
  if (nextPlay < now) nextPlay = now;
  src.start(nextPlay);
  nextPlay += buf.duration;
  sources.push(src);
  src.onended = () => { const i = sources.indexOf(src); if (i !== -1) sources.splice(i, 1); };
}

// 5. Cleanup
function stopVoice() {
  ready = false;
  sources.forEach(s => { try { s.stop(); } catch (e) {} });
  sources = []; nextPlay = 0;
  if (micStream) { micStream.getTracks().forEach(t => t.stop()); micStream = null; }
  if (audioCtx) { audioCtx.close().catch(() => {}); audioCtx = null; }
  if (ws) { ws.close(); ws = null; }
}
```

## WebSocket Close Codes

| Code   | Meaning                                                      |
| ------ | ------------------------------------------------------------ |
| `1000` | Normal closure (client or server initiated)                  |
| `1008` | Policy violation (e.g., sending audio before initialization) |
| `1011` | Server error (internal pipeline failure)                     |
| `4001` | Authentication failed (invalid API key)                      |
| `4004` | Agent not found (invalid agent\_id)                          |

## Troubleshooting

| Symptom                             | Cause                                                    | Fix                                                       |
| ----------------------------------- | -------------------------------------------------------- | --------------------------------------------------------- |
| Immediate disconnect with code 1008 | Sending audio before `conversation_initiation_metadata`  | Wait for the init message before streaming audio          |
| No audio from agent                 | Playing audio as binary instead of decoding base64 PCM16 | Decode base64 → Int16Array → Float32Array → AudioBuffer   |
| Agent speaks over itself            | Not handling `interruption` events                       | Stop all scheduled AudioBufferSourceNodes on interruption |
| Connection drops after \~30s        | Not responding to `ping` messages                        | Send `pong` response with the `event_id` from each ping   |
| Audio is garbled                    | Wrong sample rate or encoding                            | Ensure PCM16, 16kHz, mono, little-endian                  |

## Related

* [Voice Agents Overview](./overview) — Architecture and features
* [JavaScript SDK](./sdk) — Recommended for web apps
* [Python Integration](./python) — Server-side integration
