Voice Agents
Build real-time conversational voice agents powered by your custom instructions, knowledge base, and content safety policies. Voice agents handle speech recognition, natural language understanding, response generation, and text-to-speech — all through a single WebSocket connection.How It Works
- Your application opens a WebSocket connection to Perf
- Perf establishes a real-time voice pipeline (speech-to-text, LLM, text-to-speech)
- Your app streams microphone audio to Perf, and receives agent audio + transcripts back
- Content safety policies are evaluated on every turn
Quick Start
1. Create a Voice Agent
In the Perf Dashboard, click Create Agent and configure:- Name — A label for your agent (e.g. “Customer Support”)
- System Prompt — Instructions that define the agent’s behavior
- Voice — Choose from available voices
- First Message — What the agent says when a conversation starts
- Content Policy (optional) — Attach a policy for PII redaction, blocked terms, or custom safety criteria
2. Add the SDK
The fastest way to integrate is the PerfVoice JavaScript SDK:3. Test It
Click your start button, allow microphone access, and speak. You should hear the agent respond and see transcripts in the console.Features
| Feature | Description |
|---|---|
| Real-time streaming | Sub-second latency from speech to agent response |
| Interruption handling | Users can interrupt the agent mid-sentence |
| Custom voices | Choose from multiple voice options |
| Knowledge base (RAG) | Attach document collections for grounded answers |
| Web search | Enable real-time web search for up-to-date information |
| Content safety | PII detection, blocked terms, custom criteria |
| Loop detection | Automatic detection and breaking of conversational loops |
| Transcripts | Real-time agent and user transcripts via events |
Integration Options
| Method | Best For | Docs |
|---|---|---|
| JavaScript SDK | Web apps, fastest integration | SDK Reference |
| Raw WebSocket | Full control, custom audio pipelines | WebSocket Protocol |
| Python | Server-side, IVR systems, telephony | Python Integration |
Authentication
Voice agent connections require two parameters:| Parameter | Description |
|---|---|
api_key | Your project API key (format: pk_live_...) |
agent_id | The voice agent ID (from the dashboard or API) |
Audio Format
All audio is streamed as PCM 16-bit, 16kHz, mono, little-endian:| Property | Value |
|---|---|
| Encoding | PCM signed 16-bit integer |
| Sample rate | 16,000 Hz |
| Channels | 1 (mono) |
| Byte order | Little-endian |
| Transport | Base64-encoded in JSON messages |
Content Safety
Voice agents support the same content safety policies as the rest of the Perf platform:- Blocked terms — Prevent specific words or phrases in agent responses
- PII detection — Detect and redact personally identifiable information
- Custom criteria — Define LLM-evaluated safety rules (e.g. “Agent must not provide medical advice”)
- Filler phrases — Play natural filler audio while safety evaluation runs
Next Steps
- JavaScript SDK Reference — Full SDK API documentation
- WebSocket Protocol — Raw WebSocket integration for advanced use cases
- Python Integration — Server-side Python integration
- Content Policies — Configure safety policies