Chat API Reference
The Chat API is Perf’s primary endpoint for text generation. It automatically routes your request to the optimal model based on task type, complexity, and your cost constraints.Endpoint
Authentication
Include your API key in the Authorization header:Request Body
Required Parameters
| Parameter | Type | Description |
|---|---|---|
messages | array | Array of message objects with role and content |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
max_cost_per_call | number | 0.01 | Maximum cost in USD for this request |
temperature | number | 0.7 | Sampling temperature (0-2) |
max_tokens | number | 2048 | Maximum tokens to generate |
top_p | number | 1.0 | Nucleus sampling parameter |
frequency_penalty | number | 0.0 | Penalize repeated tokens (-2 to 2) |
presence_penalty | number | 0.0 | Penalize new topics (-2 to 2) |
stop | array | null | Stop sequences to end generation |
response_format | string | "text" | Output format: "text" or "json" |
user_id | string | null | Your user identifier for analytics |
metadata | object | {} | Custom metadata for tracking |
Message Object
Request Example
Response
Success Response (200 OK)
Response Fields
| Field | Type | Description |
|---|---|---|
model_used | string | The model that processed your request |
output | string | The generated text response |
billing.cost_usd | number | Actual cost of this request |
billing.cost_warning | boolean | True if cost exceeded max_cost_per_call |
tokens.input | number | Input tokens consumed |
tokens.output | number | Output tokens generated |
tokens.total | number | Total tokens used |
metadata.call_id | string | Unique identifier for this call |
metadata.task_type | string | Detected task type |
metadata.complexity_score | number | Estimated complexity (0-1) |
metadata.routing_reason | string | Why this model was selected |
metadata.latency_ms | number | Response time in milliseconds |
metadata.fallback_used | boolean | Whether fallback model was used |
metadata.validation_passed | boolean | Whether output passed quality checks |
metadata.timestamp | string | ISO 8601 timestamp |
Task Types
Perf automatically detects your task type for optimal routing:| Task Type | Description | Example |
|---|---|---|
extraction | Extracting structured data | ”Extract email from text” |
classification | Categorizing or labeling | ”Classify sentiment” |
summarization | Condensing information | ”Summarize this article” |
reasoning | Logic and analysis | ”Solve this math problem” |
code | Code generation/explanation | ”Write a binary search” |
writing | Creative or professional writing | ”Write a blog post” |
general | General conversation | ”Hello, how are you?” |
Cost Control
Budget Enforcement
When you setmax_cost_per_call, Perf will:
- Estimate the cost for the optimal model
- If estimated cost > budget, select a cheaper alternative
- Process with the selected model
- Set
cost_warning: trueif actual cost exceeds budget
Quality Validation
Perf automatically validates outputs and retries if needed:Validation Checks
- JSON format correctness (when
response_format: "json") - Refusal detection (“I cannot assist with that…”)
- Incomplete response detection
- Quality disclaimer detection (“As an AI…”)
Retry Logic
If validation fails:- Retry with the same model (max 1 retry)
- If still failing, escalate to fallback model
- Return best available result
Multi-Turn Conversations
Include conversation history in themessages array:
- Summarizes long conversation history to fit context windows
- Maintains semantic coherence
- Optimizes for cost by compressing older messages
Error Responses
400 Bad Request
401 Unauthorized
429 Too Many Requests
500 Internal Server Error
503 Service Unavailable
Advanced Usage
Structured Output
Request JSON output for easy parsing:Custom Metadata
Track requests with custom metadata:Temperature Control
Adjust creativity vs consistency:Rate Limits
| Tier | Requests/Minute | Requests/Day |
|---|---|---|
| Free | 60 | 1,000 |
| Pro | 300 | 100,000 |
| Enterprise | Custom | Custom |
Best Practices
1. Set Appropriate Budgets
2. Use System Messages
Guide model behavior with system messages:3. Optimize for Task Type
Be explicit about the task for better routing:4. Handle Errors Gracefully
SDK Support
Official SDKs coming soon:- Python SDK
- Node.js SDK
- Go SDK
- Ruby SDK
Related Endpoints
- Streaming API - For real-time responses
- Metrics API - For analytics and monitoring
- Logs API - For debugging and audit trails
Support
- Documentation: docs.withperf.pro
- Email: [email protected]
- Status: status.withperf.pro