Chat API Reference
The Chat API is Perf’s primary endpoint for text generation. It automatically routes your request to the optimal model based on task type, complexity, and your cost constraints. The response format is OpenAI-compatible, making it easy to integrate with existing applications.Endpoint
Authentication
Include your API key in the Authorization header:Request Body
Required Parameters
| Parameter | Type | Description |
|---|---|---|
messages | array | Array of message objects with role and content |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
max_cost_per_call | number | none | Maximum cost in USD for this request. If exceeded, Perf will try to use a cheaper model. |
Message Object
Multimodal Content (Vision)
Thecontent field can be a string for text-only messages, or an array of content parts for multimodal messages (images, audio, video, documents).
Content Part Types
| Type | Format | Description |
|---|---|---|
text | { type: "text", text: "..." } | Text content |
image_url | { type: "image_url", image_url: { url: "...", detail?: "low" | "high" | "auto" } } | Image (base64 data URL or HTTPS URL) |
input_audio | { type: "input_audio", input_audio: { data: "...", format: "wav" | "mp3" } } | Base64-encoded audio |
video_url | { type: "video_url", video_url: { url: "..." } } | Video URL |
document | { type: "document", document: { type: "pdf", data: "...", name?: "..." } } | Base64-encoded document |
Vision Request Example
Base64 Image Example
Request Example
Response
Success Response (200 OK)
The response follows the OpenAI Chat Completion format:Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier for the completion |
object | string | Always "chat.completion" |
created | number | Unix timestamp of when the completion was created |
model | string | The model that processed your request |
choices | array | Array of completion choices |
choices[].message.content | string | The generated text response |
choices[].finish_reason | string | Why the model stopped ("stop", "length", etc.) |
usage.prompt_tokens | number | Input tokens consumed |
usage.completion_tokens | number | Output tokens generated |
usage.total_tokens | number | Total tokens used |
perf.task_type | string | Detected task type |
perf.complexity | number | Estimated complexity (0-1) |
perf.model_selected | string | Model chosen by router |
perf.latency_ms | number | Response time in milliseconds |
perf.fallback_used | boolean | Whether fallback model was used |
perf.validation_passed | boolean | Whether output passed quality checks |
perf.policy_evaluation | object | Policy evaluation results (if policies configured) |
perf.content_evaluation | object | Content evaluation results (if content policies configured) |
Response Headers
Perf includes additional metadata in response headers:| Header | Description |
|---|---|
X-Perf-Model | The model that processed the request |
X-Perf-Fallback | true if a fallback model was used |
X-Perf-Latency-Ms | Response time in milliseconds |
Policy Evaluation (Pro+)
When routing policies are configured for your project, the response includes policy evaluation details:| Field | Type | Description |
|---|---|---|
result | string | Overall result: allow, warn, soft_block, hard_block |
violations | array | List of policy violations (if any) |
modifications | array | Changes made by policies (model overrides, etc.) |
policies_evaluated | array | IDs of policies that were checked |
policies_matched | array | IDs of policies that triggered |
allow- Request proceeds normallywarn- Request proceeds with warning loggedsoft_block- Request proceeds with modifications appliedhard_block- Request rejected with 403 error
Content Evaluation (Pro+)
When content policies are configured (PII detection, term filtering), the response includes content evaluation details:| Field | Type | Description |
|---|---|---|
result | string | Overall result: allow, warn, redact, block |
phase | string | Evaluation phase: post_response |
pii_detected | boolean | Whether PII was detected in output |
pii_count | integer | Number of PII items detected |
redacted | boolean | Whether content was redacted |
criteria_passed | integer | Number of criteria passed (if using LLM-as-judge) |
criteria_failed | integer | Number of criteria failed |
latency_ms | integer | Content evaluation latency |
allow- Content passes all checkswarn- Content flagged but returnedredact- PII/terms redacted from output (e.g.,john@example.com→[REDACTED])block- Content blocked, error returned
ssn- Social Security Numberscredit_card- Credit card numbers (with Luhn validation)email- Email addressesphone_us- US phone numbersip_address- IP addressesdate_of_birth- Dates of birth
Task Types
Perf automatically detects your task type for optimal routing:| Task Type | Description | Example |
|---|---|---|
extraction | Extracting structured data | ”Extract email from text” |
classification | Categorizing or labeling | ”Classify sentiment” |
summarization | Condensing information | ”Summarize this article” |
reasoning | Logic and analysis | ”Solve this math problem” |
code | Code generation/explanation | ”Write a binary search” |
writing | Creative or professional writing | ”Write a blog post” |
vision | Image understanding | Requests with image content parts |
audio | Audio understanding | Requests with audio content parts |
Generation Intent Detection
The Chat API intelligently detects when your prompt is requesting media generation (images, video, audio) and automatically routes to the appropriate generation model.Example
- Image Generation API - DALL-E, Stable Diffusion, Flux, and more
- Video Generation API - Veo, Runway, Luma, Pika
- Audio API - Text-to-speech and transcription
Cost Control
Budget Enforcement
When you setmax_cost_per_call, Perf will:
- Estimate the cost for the optimal model
- If estimated cost > budget, select a cheaper alternative
- Process with the selected model
- Include a
cost_warningin theperfobject if budget was a factor
Quality Validation
Perf automatically validates outputs and retries if needed:Validation Checks
- JSON format correctness (for extraction/classification tasks)
- Refusal detection (“I cannot assist with that…”)
- Incomplete response detection
Retry Logic
If validation fails:- Retry with the same model (max 1 retry)
- If still failing, escalate to fallback model
- Return best available result
Multi-Turn Conversations
Include conversation history in themessages array:
- Summarizes long conversation history to fit context windows
- Maintains semantic coherence
- Optimizes for cost by compressing older messages
Error Responses
400 Bad Request
401 Unauthorized
429 Too Many Requests
500 Internal Server Error
503 Service Unavailable
Structured Output
For extraction and classification tasks, Perf automatically detects when JSON output is needed and routes to models that excel at structured output. To get JSON output, simply ask for it in your prompt:Rate Limits
| Tier | Requests/Minute | Requests/Day |
|---|---|---|
| Free | 60 | 1,000 |
| Pro | 300 | 100,000 |
| Enterprise | Custom | Custom |
Best Practices
1. Set Appropriate Budgets
2. Use System Messages
Guide model behavior with system messages:3. Optimize for Task Type
Be explicit about the task for better routing:4. Handle Errors Gracefully
SDK Support
Official SDKs coming soon:- Python SDK
- Node.js SDK
- Go SDK
- Ruby SDK
Related Endpoints
- Streaming API - For real-time responses
- Metrics API - For analytics and monitoring
- Logs API - For debugging and audit trails
Support
- Documentation: docs.withperf.pro
- Email: support@withperf.pro
- Status: status.withperf.pro