Streaming API Reference
The Streaming API provides real-time, token-by-token responses for building responsive chat interfaces and interactive applications. The streaming format is OpenAI-compatible, using Server-Sent Events (SSE).Endpoint
Authentication
Include your API key in the Authorization header:How It Works
The Streaming API returns responses using Server-Sent Events (SSE), sending text chunks as they’re generated rather than waiting for the complete response.Benefits
- Lower perceived latency: Users see responses immediately
- Better UX: Progressive rendering feels more responsive
- Real-time feedback: Stop generation early if needed
- Streaming UI: Perfect for chat interfaces
Request Body
Same as the Chat API, but responses stream incrementally. This includes full support for multimodal content (images, audio, video, documents) in messages.Example Request
Response Format
The response uses OpenAI-compatible Server-Sent Events (SSE) format:Event Types
Content Chunk
Sent for each token or group of tokens. The first chunk includes therole, subsequent chunks only have content:
Final Chunk
Sent when generation is complete withfinish_reason: "stop":
Done Signal
After the final chunk, a[DONE] message indicates the stream is complete:
Client Implementation
JavaScript/TypeScript
React Hook
Python
Python Async
Go
React Component Example
Error Handling
Connection Errors
Timeout Handling
Performance Optimization
Chunking Strategy
Perf optimizes chunk size for balance between latency and throughput:- Small prompts: Sends tokens individually for fastest perceived speed
- Large generations: Batches tokens for network efficiency
- Adaptive: Adjusts based on connection quality
Buffering
For smoother UI updates, buffer chunks:Rate Limits
Same limits as the Chat API:| Tier | Requests/Minute | Concurrent Streams |
|---|---|---|
| Free | 60 | 3 |
| Pro | 300 | 10 |
| Enterprise | Custom | Custom |
Best Practices
1. Show Loading State
2. Handle Stream Interruption
Allow users to stop generation:3. Graceful Degradation
Fall back to non-streaming if not supported:4. Optimize for Mobile
Consider connection quality:Comparison: Streaming vs Non-Streaming
| Feature | Streaming | Non-Streaming |
|---|---|---|
| First token latency | ~200ms | ~2-5s |
| Perceived speed | Immediate | Delayed |
| Implementation complexity | Medium | Low |
| Network efficiency | Same | Same |
| Error recovery | More complex | Simple |
| Best for | Chat UIs, long responses | Batch processing, short responses |
Related Endpoints
- Chat API - Non-streaming version
- Metrics API - Analytics and monitoring
- Logs API - Debugging and audit trails
Support
- Documentation: docs.withperf.pro
- Email: support@withperf.pro
- Examples: github.com/perf/examples