Voice Agents

Build real-time conversational voice agents powered by your custom instructions, knowledge base, and content safety policies. Voice agents handle speech recognition, natural language understanding, response generation, and text-to-speech — all through a single WebSocket connection.

How It Works

Your App → WebSocket → Perf → LLM + TTS + STT
                ↕
         Audio streaming
       (PCM16, 16kHz, mono)

Your application opens a WebSocket connection to Perf
Perf establishes a real-time voice pipeline (speech-to-text, LLM, text-to-speech)
Your app streams microphone audio to Perf, and receives agent audio + transcripts back
Content safety policies are evaluated on every turn

Quick Start

1. Create a Voice Agent

In the Perf Dashboard, click Create Agent and configure:

Name — A label for your agent (e.g. “Customer Support”)
System Prompt — Instructions that define the agent’s behavior
Voice — Choose from available voices
First Message — What the agent says when a conversation starts
Content Policy (optional) — Attach a policy for PII redaction, blocked terms, or custom safety criteria

2. Add the SDK

The fastest way to integrate is the PerfVoice JavaScript SDK:

<script src="https://api.withperf.pro/v1/voice/sdk.js"></script>
<script>
  const voice = new PerfVoice({
    apiKey: 'YOUR_API_KEY',
    agentId: 'YOUR_AGENT_ID',
  });

  voice.on('transcript', (role, text) => {
    console.log(role + ': ' + text);
  });

  document.getElementById('startBtn').onclick = () => voice.start();
  document.getElementById('stopBtn').onclick = () => voice.stop();
</script>

That’s it. The SDK handles microphone capture, audio encoding, WebSocket protocol, audio playback, interruptions, and ping/pong keepalive.

3. Test It

Click your start button, allow microphone access, and speak. You should hear the agent respond and see transcripts in the console.

Features

Feature	Description
Real-time streaming	Sub-second latency from speech to agent response
Interruption handling	Users can interrupt the agent mid-sentence
Custom voices	Choose from multiple voice options
Knowledge base (RAG)	Attach document collections for grounded answers
Web search	Enable real-time web search for up-to-date information
Content safety	PII detection, blocked terms, custom criteria
Loop detection	Automatic detection and breaking of conversational loops
Transcripts	Real-time agent and user transcripts via events

Integration Options

Method	Best For	Docs
JavaScript SDK	Web apps, fastest integration	SDK Reference
Raw WebSocket	Full control, custom audio pipelines	WebSocket Protocol
Python	Server-side, IVR systems, telephony	Python Integration

Authentication

Voice agent connections require two parameters:

Parameter	Description
`api_key`	Your project API key (format: `pk_live_...`)
`agent_id`	The voice agent ID (from the dashboard or API)

These are passed as query parameters on the WebSocket URL:

wss://api.withperf.pro/v1/voice/conversation?api_key=YOUR_API_KEY&agent_id=YOUR_AGENT_ID

Audio Format

All audio is streamed as PCM 16-bit, 16kHz, mono, little-endian:

Property	Value
Encoding	PCM signed 16-bit integer
Sample rate	16,000 Hz
Channels	1 (mono)
Byte order	Little-endian
Transport	Base64-encoded in JSON messages

Content Safety

Voice agents support the same content safety policies as the rest of the Perf platform:

Blocked terms — Prevent specific words or phrases in agent responses
PII detection — Detect and redact personally identifiable information
Custom criteria — Define LLM-evaluated safety rules (e.g. “Agent must not provide medical advice”)
Filler phrases — Play natural filler audio while safety evaluation runs

Configure policies in the Dashboard and attach them to your voice agent.

Next Steps

JavaScript SDK Reference — Full SDK API documentation
WebSocket Protocol — Raw WebSocket integration for advanced use cases
Python Integration — Server-side Python integration
Content Policies — Configure safety policies

​Voice Agents

​How It Works

​Quick Start

​1. Create a Voice Agent

​2. Add the SDK

​3. Test It

​Features

​Integration Options

​Authentication

​Audio Format

​Content Safety

​Next Steps