# Chat API

> Complete Chat API reference documentation

# Chat API Reference

The Chat API is Perf's primary endpoint for text generation. It automatically routes your request to the optimal model based on task type, complexity, and your cost constraints.

## Endpoint

```
POST https://api.withperf.pro/v1/chat
```

## Authentication

Include your API key in the Authorization header:

```
Authorization: Bearer YOUR_API_KEY
```

## Request Body

### Required Parameters

| Parameter  | Type  | Description                                        |
| ---------- | ----- | -------------------------------------------------- |
| `messages` | array | Array of message objects with `role` and `content` |

### Optional Parameters

| Parameter           | Type   | Default  | Description                          |
| ------------------- | ------ | -------- | ------------------------------------ |
| `max_cost_per_call` | number | `0.01`   | Maximum cost in USD for this request |
| `temperature`       | number | `0.7`    | Sampling temperature (0-2)           |
| `max_tokens`        | number | `2048`   | Maximum tokens to generate           |
| `top_p`             | number | `1.0`    | Nucleus sampling parameter           |
| `frequency_penalty` | number | `0.0`    | Penalize repeated tokens (-2 to 2)   |
| `presence_penalty`  | number | `0.0`    | Penalize new topics (-2 to 2)        |
| `stop`              | array  | `null`   | Stop sequences to end generation     |
| `response_format`   | string | `"text"` | Output format: `"text"` or `"json"`  |
| `user_id`           | string | `null`   | Your user identifier for analytics   |
| `metadata`          | object | `{}`     | Custom metadata for tracking         |

### Message Object

```json  theme={null}
{
  "role": "user" | "assistant" | "system",
  "content": "string"
}
```

## Request Example

```bash  theme={null}
curl -X POST https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_test_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that extracts structured data."
      },
      {
        "role": "user",
        "content": "Extract name, email, and phone from: John Doe, contact at john@example.com or call 555-1234"
      }
    ],
    "max_cost_per_call": 0.005,
    "response_format": "json",
    "temperature": 0.3
  }'
```

## Response

### Success Response (200 OK)

```json  theme={null}
{
  "model_used": "gpt-4o-mini",
  "output": "{\n  \"name\": \"John Doe\",\n  \"email\": \"john@example.com\",\n  \"phone\": \"555-1234\"\n}",
  "billing": {
    "cost_usd": 0.00023,
    "cost_warning": false
  },
  "tokens": {
    "input": 47,
    "output": 28,
    "total": 75
  },
  "metadata": {
    "call_id": "call_abc123xyz",
    "task_type": "extraction",
    "complexity_score": 0.3,
    "routing_reason": "Optimal for structured data extraction",
    "latency_ms": 342,
    "fallback_used": false,
    "validation_passed": true,
    "timestamp": "2024-01-15T10:30:00Z"
  }
}
```

### Response Fields

| Field                        | Type    | Description                               |
| ---------------------------- | ------- | ----------------------------------------- |
| `model_used`                 | string  | The model that processed your request     |
| `output`                     | string  | The generated text response               |
| `billing.cost_usd`           | number  | Actual cost of this request               |
| `billing.cost_warning`       | boolean | True if cost exceeded `max_cost_per_call` |
| `tokens.input`               | number  | Input tokens consumed                     |
| `tokens.output`              | number  | Output tokens generated                   |
| `tokens.total`               | number  | Total tokens used                         |
| `metadata.call_id`           | string  | Unique identifier for this call           |
| `metadata.task_type`         | string  | Detected task type                        |
| `metadata.complexity_score`  | number  | Estimated complexity (0-1)                |
| `metadata.routing_reason`    | string  | Why this model was selected               |
| `metadata.latency_ms`        | number  | Response time in milliseconds             |
| `metadata.fallback_used`     | boolean | Whether fallback model was used           |
| `metadata.validation_passed` | boolean | Whether output passed quality checks      |
| `metadata.timestamp`         | string  | ISO 8601 timestamp                        |

## Task Types

Perf automatically detects your task type for optimal routing:

| Task Type        | Description                      | Example                   |
| ---------------- | -------------------------------- | ------------------------- |
| `extraction`     | Extracting structured data       | "Extract email from text" |
| `classification` | Categorizing or labeling         | "Classify sentiment"      |
| `summarization`  | Condensing information           | "Summarize this article"  |
| `reasoning`      | Logic and analysis               | "Solve this math problem" |
| `code`           | Code generation/explanation      | "Write a binary search"   |
| `writing`        | Creative or professional writing | "Write a blog post"       |
| `general`        | General conversation             | "Hello, how are you?"     |

## Cost Control

### Budget Enforcement

When you set `max_cost_per_call`, Perf will:

1. Estimate the cost for the optimal model
2. If estimated cost > budget, select a cheaper alternative
3. Process with the selected model
4. Set `cost_warning: true` if actual cost exceeds budget

```json  theme={null}
{
  "messages": [...],
  "max_cost_per_call": 0.001  // 0.1 cents maximum
}
```

Response when budget is tight:

```json  theme={null}
{
  "model_used": "gpt-4o-mini",
  "billing": {
    "cost_usd": 0.00012,
    "cost_warning": false
  },
  "metadata": {
    "routing_reason": "Budget-optimized selection (GPT-4o would exceed limit)"
  }
}
```

## Quality Validation

Perf automatically validates outputs and retries if needed:

### Validation Checks

* JSON format correctness (when `response_format: "json"`)
* Refusal detection ("I cannot assist with that...")
* Incomplete response detection
* Quality disclaimer detection ("As an AI...")

### Retry Logic

If validation fails:

1. Retry with the same model (max 1 retry)
2. If still failing, escalate to fallback model
3. Return best available result

## Multi-Turn Conversations

Include conversation history in the `messages` array:

```json  theme={null}
{
  "messages": [
    {"role": "user", "content": "What is photosynthesis?"},
    {"role": "assistant", "content": "Photosynthesis is the process..."},
    {"role": "user", "content": "How does it differ from cellular respiration?"}
  ]
}
```

Perf automatically:

* Summarizes long conversation history to fit context windows
* Maintains semantic coherence
* Optimizes for cost by compressing older messages

## Error Responses

### 400 Bad Request

```json  theme={null}
{
  "error": {
    "type": "invalid_request",
    "message": "messages array is required",
    "param": "messages"
  }
}
```

### 401 Unauthorized

```json  theme={null}
{
  "error": {
    "type": "authentication_error",
    "message": "Invalid API key"
  }
}
```

### 429 Too Many Requests

```json  theme={null}
{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Rate limit exceeded",
    "retry_after": 30
  }
}
```

Response headers:

```
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1673456789
Retry-After: 30
```

### 500 Internal Server Error

```json  theme={null}
{
  "error": {
    "type": "server_error",
    "message": "An internal error occurred",
    "request_id": "req_abc123"
  }
}
```

### 503 Service Unavailable

```json  theme={null}
{
  "error": {
    "type": "service_unavailable",
    "message": "All providers are currently experiencing issues",
    "retry_after": 60
  }
}
```

## Advanced Usage

### Structured Output

Request JSON output for easy parsing:

```json  theme={null}
{
  "messages": [
    {
      "role": "user",
      "content": "List 3 benefits of exercise in JSON format with fields: benefit, description, category"
    }
  ],
  "response_format": "json",
  "temperature": 0.2
}
```

### Custom Metadata

Track requests with custom metadata:

```json  theme={null}
{
  "messages": [...],
  "user_id": "user_12345",
  "metadata": {
    "session_id": "sess_abc",
    "feature": "chat_support",
    "experiment_variant": "v2"
  }
}
```

View this metadata in your dashboard analytics.

### Temperature Control

Adjust creativity vs consistency:

```json  theme={null}
{
  "messages": [...],
  "temperature": 0.0  // Deterministic (good for data extraction)
}
```

```json  theme={null}
{
  "messages": [...],
  "temperature": 1.5  // Creative (good for brainstorming)
}
```

## Rate Limits

| Tier       | Requests/Minute | Requests/Day |
| ---------- | --------------- | ------------ |
| Free       | 60              | 1,000        |
| Pro        | 300             | 100,000      |
| Enterprise | Custom          | Custom       |

## Best Practices

### 1. Set Appropriate Budgets

```json  theme={null}
{
  "max_cost_per_call": 0.001  // Simple extraction
}
```

```json  theme={null}
{
  "max_cost_per_call": 0.05   // Complex reasoning
}
```

### 2. Use System Messages

Guide model behavior with system messages:

```json  theme={null}
{
  "messages": [
    {
      "role": "system",
      "content": "You are a concise assistant. Keep responses under 50 words."
    },
    {
      "role": "user",
      "content": "Explain gravity"
    }
  ]
}
```

### 3. Optimize for Task Type

Be explicit about the task for better routing:

```json  theme={null}
{
  "messages": [
    {
      "role": "user",
      "content": "EXTRACT the following data as JSON: name, age, location from: 'Sarah is 28 and lives in Seattle'"
    }
  ],
  "response_format": "json"
}
```

### 4. Handle Errors Gracefully

```python  theme={null}
try:
    response = requests.post(url, json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.HTTPError as e:
    if response.status_code == 429:
        # Implement exponential backoff
        retry_after = int(response.headers.get('Retry-After', 60))
        time.sleep(retry_after)
    elif response.status_code >= 500:
        # Provider issue, retry with different request
        pass
```

## SDK Support

Official SDKs coming soon:

* Python SDK
* Node.js SDK
* Go SDK
* Ruby SDK

## Related Endpoints

* [Streaming API](./streaming) - For real-time responses
* [Metrics API](./metrics) - For analytics and monitoring
* [Logs API](./logs) - For debugging and audit trails

## Support

* **Documentation**: [docs.withperf.pro](https://docs.withperf.pro)
* **Email**: [support@withperf.pro](mailto:support@withperf.pro)
* **Status**: [status.withperf.pro](https://status.withperf.pro)


---

> To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.withperf.pro/llms.txt