> ## Documentation Index
> Fetch the complete documentation index at: https://docs.withperf.pro/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat API

> Complete Chat API reference documentation

# Chat API Reference

The Chat API is Perf's primary endpoint for text generation. It automatically routes your request to the optimal model based on task type, complexity, and your cost constraints.

The response format is **OpenAI-compatible**, making it easy to integrate with existing applications.

## Endpoint

```
POST https://api.withperf.pro/v1/chat
```

## Authentication

Include your API key in the Authorization header:

```
Authorization: Bearer YOUR_API_KEY
```

## Request Body

### Required Parameters

| Parameter  | Type  | Description                                        |
| ---------- | ----- | -------------------------------------------------- |
| `messages` | array | Array of message objects with `role` and `content` |

### Optional Parameters

| Parameter           | Type      | Default | Description                                                                                                    |
| ------------------- | --------- | ------- | -------------------------------------------------------------------------------------------------------------- |
| `max_cost_per_call` | number    | none    | Maximum cost in USD for this request. If exceeded, Perf will try to use a cheaper model.                       |
| `document_id`       | string    | none    | ID of an uploaded document to use as context. The document content is automatically injected into the request. |
| `document_ids`      | string\[] | none    | Array of document IDs to use as context. Use when referencing multiple documents in a single request.          |
| `schema`            | object    | none    | Inline JSON Schema to validate and enforce on the response. See [Schema Enforcement](./schemas).               |
| `schema_id`         | string    | none    | ID or slug of a saved schema. See [Schema Enforcement](./schemas).                                             |
| `schema_strict`     | boolean   | false   | Disable auto-repair, fail on any schema mismatch.                                                              |

### Message Object

```json theme={null}
{
  "role": "user" | "assistant" | "system",
  "content": "string" | ContentPart[]
}
```

### Multimodal Content (Vision)

The `content` field can be a string for text-only messages, or an array of content parts for multimodal messages (images, audio, video, documents).

#### Content Part Types

| Type          | Format                                                                                 | Description                          |
| ------------- | -------------------------------------------------------------------------------------- | ------------------------------------ |
| `text`        | `{ type: "text", text: "..." }`                                                        | Text content                         |
| `image_url`   | `{ type: "image_url", image_url: { url: "...", detail?: "low" \| "high" \| "auto" } }` | Image (base64 data URL or HTTPS URL) |
| `input_audio` | `{ type: "input_audio", input_audio: { data: "...", format: "wav" \| "mp3" } }`        | Base64-encoded audio                 |
| `video_url`   | `{ type: "video_url", video_url: { url: "..." } }`                                     | Video URL                            |
| `document`    | `{ type: "document", document: { type: "pdf", data: "...", name?: "..." } }`           | Base64-encoded document              |

#### Vision Request Example

```json theme={null}
{
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What's in this image?" },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg",
            "detail": "high"
          }
        }
      ]
    }
  ]
}
```

#### Base64 Image Example

```json theme={null}
{
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Describe this screenshot" },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,iVBORw0KGgo..."
          }
        }
      ]
    }
  ]
}
```

Perf automatically routes vision requests to models with image understanding capabilities (GPT-4o, Claude 3.5 Sonnet, Gemini Pro Vision).

## Request Example

```bash theme={null}
curl -X POST https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_test_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that extracts structured data."
      },
      {
        "role": "user",
        "content": "Extract name, email, and phone from: John Doe, contact at john@example.com or call 555-1234"
      }
    ],
    "max_cost_per_call": 0.005
  }'
```

## Response

### Success Response (200 OK)

The response follows the **OpenAI Chat Completion format**:

```json theme={null}
{
  "id": "chatcmpl-abc123xyz",
  "object": "chat.completion",
  "created": 1705312200,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\n  \"name\": \"John Doe\",\n  \"email\": \"john@example.com\",\n  \"phone\": \"555-1234\"\n}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 47,
    "completion_tokens": 28,
    "total_tokens": 75
  },
  "perf": {
    "task_type": "extraction",
    "complexity": 0.3,
    "model_selected": "gpt-4o-mini",
    "latency_ms": 342,
    "fallback_used": false,
    "validation_passed": true
  }
}
```

### Response Fields

| Field                       | Type    | Description                                                        |
| --------------------------- | ------- | ------------------------------------------------------------------ |
| `id`                        | string  | Unique identifier for the completion                               |
| `object`                    | string  | Always `"chat.completion"`                                         |
| `created`                   | number  | Unix timestamp of when the completion was created                  |
| `model`                     | string  | The model that processed your request                              |
| `choices`                   | array   | Array of completion choices                                        |
| `choices[].message.content` | string  | The generated text response                                        |
| `choices[].finish_reason`   | string  | Why the model stopped (`"stop"`, `"length"`, etc.)                 |
| `usage.prompt_tokens`       | number  | Input tokens consumed                                              |
| `usage.completion_tokens`   | number  | Output tokens generated                                            |
| `usage.total_tokens`        | number  | Total tokens used                                                  |
| `perf.task_type`            | string  | Detected task type                                                 |
| `perf.complexity`           | number  | Estimated complexity (0-1)                                         |
| `perf.model_selected`       | string  | Model chosen by router                                             |
| `perf.latency_ms`           | number  | Response time in milliseconds                                      |
| `perf.fallback_used`        | boolean | Whether fallback model was used                                    |
| `perf.validation_passed`    | boolean | Whether output passed quality checks                               |
| `perf.document_id`          | string  | ID of the document used as context (if `document_id` was provided) |
| `perf.policy_evaluation`    | object  | Policy evaluation results (if policies configured)                 |
| `perf.content_evaluation`   | object  | Content evaluation results (if content policies configured)        |

### Response Headers

Perf includes additional metadata in response headers:

| Header              | Description                          |
| ------------------- | ------------------------------------ |
| `X-Perf-Model`      | The model that processed the request |
| `X-Perf-Fallback`   | `true` if a fallback model was used  |
| `X-Perf-Latency-Ms` | Response time in milliseconds        |

### Policy Evaluation (Pro+)

When routing policies are configured for your project, the response includes policy evaluation details:

```json theme={null}
{
  "perf": {
    "policy_evaluation": {
      "result": "allow",
      "violations": [],
      "modifications": [
        {
          "type": "model_override",
          "from": "gpt-4o",
          "to": "gemini-2.5-flash",
          "reason": "Budget Mode policy - prefer cheaper models"
        }
      ],
      "policies_evaluated": ["policy_abc123"],
      "policies_matched": ["policy_abc123"]
    }
  }
}
```

| Field                | Type   | Description                                                 |
| -------------------- | ------ | ----------------------------------------------------------- |
| `result`             | string | Overall result: `allow`, `warn`, `soft_block`, `hard_block` |
| `violations`         | array  | List of policy violations (if any)                          |
| `modifications`      | array  | Changes made by policies (model overrides, etc.)            |
| `policies_evaluated` | array  | IDs of policies that were checked                           |
| `policies_matched`   | array  | IDs of policies that triggered                              |

**Policy Results:**

* `allow` - Request proceeds normally
* `warn` - Request proceeds with warning logged
* `soft_block` - Request proceeds with modifications applied
* `hard_block` - Request rejected with 403 error

See [Policies API](./policies) for available policy templates and configuration.

### Content Evaluation (Pro+)

When content policies are configured (PII detection, term filtering), the response includes content evaluation details:

```json theme={null}
{
  "perf": {
    "content_evaluation": {
      "result": "allow",
      "phase": "post_response",
      "pii_detected": true,
      "pii_count": 2,
      "redacted": true,
      "criteria_passed": 0,
      "criteria_failed": 0,
      "latency_ms": 15
    }
  }
}
```

| Field             | Type    | Description                                        |
| ----------------- | ------- | -------------------------------------------------- |
| `result`          | string  | Overall result: `allow`, `warn`, `redact`, `block` |
| `phase`           | string  | Evaluation phase: `post_response`                  |
| `pii_detected`    | boolean | Whether PII was detected in output                 |
| `pii_count`       | integer | Number of PII items detected                       |
| `redacted`        | boolean | Whether content was redacted                       |
| `criteria_passed` | integer | Number of criteria passed (if using LLM-as-judge)  |
| `criteria_failed` | integer | Number of criteria failed                          |
| `latency_ms`      | integer | Content evaluation latency                         |

**Content Results:**

* `allow` - Content passes all checks
* `warn` - Content flagged but returned
* `redact` - PII/terms redacted from output (e.g., `john@example.com` → `[REDACTED]`)
* `block` - Content blocked, error returned

**Supported PII Types:**

* `ssn` - Social Security Numbers
* `credit_card` - Credit card numbers (with Luhn validation)
* `email` - Email addresses
* `phone_us` - US phone numbers
* `ip_address` - IP addresses
* `date_of_birth` - Dates of birth

## Task Types

Perf automatically detects your task type for optimal routing:

| Task Type        | Description                      | Example                           |
| ---------------- | -------------------------------- | --------------------------------- |
| `extraction`     | Extracting structured data       | "Extract email from text"         |
| `classification` | Categorizing or labeling         | "Classify sentiment"              |
| `summarization`  | Condensing information           | "Summarize this article"          |
| `reasoning`      | Logic and analysis               | "Solve this math problem"         |
| `code`           | Code generation/explanation      | "Write a binary search"           |
| `writing`        | Creative or professional writing | "Write a blog post"               |
| `vision`         | Image understanding              | Requests with image content parts |
| `audio`          | Audio understanding              | Requests with audio content parts |

## Generation Intent Detection

The Chat API intelligently detects when your prompt is requesting media generation (images, video, audio) and automatically routes to the appropriate generation model.

### Example

```bash theme={null}
curl -X POST https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_live_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Generate an image of a sunset over mountains"
      }
    ]
  }'
```

Perf understands this is an image generation request and routes to DALL-E, returning the generated image URL in the response.

For more control over generation parameters (model selection, dimensions, quality), use the dedicated generation endpoints:

* [Image Generation API](./images) - DALL-E, Stable Diffusion, Flux, and more
* [Video Generation API](./video) - Veo, Runway, Luma, Pika
* [Audio API](./audio) - Text-to-speech and transcription

## Document Context

Reference uploaded documents directly in your chat requests. Perf automatically retrieves the document content and injects it as context for the AI model.

This is ideal for extracting structured data from PDFs, answering questions about uploaded files, or any task that requires grounding the AI response in specific document content.

### How It Works

1. Upload a document via `POST /v1/documents` (see [Documents API](./tools#documents--rag))
2. Wait for the document status to become `ready`
3. Pass the `document_id` in your chat request
4. Perf retrieves the document content and includes it as context

For small documents (under \~40 pages), the full content is injected. For larger documents, Perf uses semantic search (RAG) to find and inject the most relevant sections based on your message.

### Single Document

```bash theme={null}
curl -X POST https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_live_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Extract all technical specifications from this datasheet"
      }
    ],
    "document_id": "doc_abc123"
  }'
```

### Multiple Documents

```bash theme={null}
curl -X POST https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_live_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Compare the warranty terms across these contracts"
      }
    ],
    "document_ids": ["doc_abc123", "doc_def456"]
  }'
```

### Document + Schema (Structured Extraction)

Combine `document_id` with `schema_id` to extract structured data from documents. Upload a PDF, define a schema, and get validated JSON back.

```bash theme={null}
curl -X POST https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_live_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Extract all specifications from this solar panel datasheet"
      }
    ],
    "document_id": "doc_abc123",
    "schema_id": "solar-panel-specs"
  }'
```

Response with structured, schema-validated output:

```json theme={null}
{
  "choices": [{
    "message": {
      "content": "{\"manufacturer\": \"Vikram Solar\", \"model\": \"Somera Grand 580\", \"max_power_w\": 580, \"efficiency_pct\": 22.53, \"voltage_mpp_v\": 44.65, \"weight_kg\": 28.5, \"dimensions_mm\": \"2278x1134x30\", \"warranty_years\": 30}"
    }
  }],
  "perf": {
    "document_id": "doc_abc123",
    "schema_validation": {
      "enabled": true,
      "passed": true
    }
  }
}
```

### Document Error Responses

| Status | Condition           | Description                                                         |
| ------ | ------------------- | ------------------------------------------------------------------- |
| 400    | Document not found  | The `document_id` does not exist or does not belong to your project |
| 400    | Document failed     | The document failed processing and cannot be used                   |
| 409    | Document processing | The document is still being processed. Retry after a few seconds.   |

## Cost Control

### Budget Enforcement

When you set `max_cost_per_call`, Perf will:

1. Estimate the cost for the optimal model
2. If estimated cost > budget, select a cheaper alternative
3. Process with the selected model
4. Include a `cost_warning` in the `perf` object if budget was a factor

```json theme={null}
{
  "messages": [...],
  "max_cost_per_call": 0.001
}
```

## Quality Validation

Perf automatically validates outputs and retries if needed:

### Validation Checks

* JSON format correctness (for extraction/classification tasks)
* Refusal detection ("I cannot assist with that...")
* Incomplete response detection

### Retry Logic

If validation fails:

1. Retry with the same model (max 1 retry)
2. If still failing, escalate to fallback model
3. Return best available result

## Multi-Turn Conversations

Include conversation history in the `messages` array:

```json theme={null}
{
  "messages": [
    {"role": "user", "content": "What is photosynthesis?"},
    {"role": "assistant", "content": "Photosynthesis is the process..."},
    {"role": "user", "content": "How does it differ from cellular respiration?"}
  ]
}
```

Perf automatically:

* Summarizes long conversation history to fit context windows
* Maintains semantic coherence
* Optimizes for cost by compressing older messages

## Error Responses

### 400 Bad Request

```json theme={null}
{
  "error": {
    "type": "invalid_request",
    "message": "messages array is required",
    "param": "messages"
  }
}
```

### 401 Unauthorized

```json theme={null}
{
  "error": {
    "type": "authentication_error",
    "message": "Invalid API key"
  }
}
```

### 429 Too Many Requests

```json theme={null}
{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Rate limit exceeded",
    "retry_after": 30
  }
}
```

Response headers:

```
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1673456789
Retry-After: 30
```

### 500 Internal Server Error

```json theme={null}
{
  "error": {
    "type": "server_error",
    "message": "An internal error occurred",
    "request_id": "req_abc123"
  }
}
```

### 503 Service Unavailable

```json theme={null}
{
  "error": {
    "type": "service_unavailable",
    "message": "All providers are currently experiencing issues",
    "retry_after": 60
  }
}
```

## Structured Output

For extraction and classification tasks, Perf automatically detects when JSON output is needed and routes to models that excel at structured output.

To get JSON output, simply ask for it in your prompt:

```json theme={null}
{
  "messages": [
    {
      "role": "user",
      "content": "List 3 benefits of exercise. Return as JSON with fields: benefit, description, category"
    }
  ]
}
```

Perf will detect this is an extraction task and route accordingly.

## Rate Limits

| Tier       | Requests/Minute | Requests/Day |
| ---------- | --------------- | ------------ |
| Free       | 60              | 1,000        |
| Pro        | 300             | 100,000      |
| Enterprise | Custom          | Custom       |

## Best Practices

### 1. Set Appropriate Budgets

```json theme={null}
{
  "max_cost_per_call": 0.001  // Simple extraction
}
```

```json theme={null}
{
  "max_cost_per_call": 0.05   // Complex reasoning
}
```

### 2. Use System Messages

Guide model behavior with system messages:

```json theme={null}
{
  "messages": [
    {
      "role": "system",
      "content": "You are a concise assistant. Keep responses under 50 words."
    },
    {
      "role": "user",
      "content": "Explain gravity"
    }
  ]
}
```

### 3. Optimize for Task Type

Be explicit about the task for better routing:

```json theme={null}
{
  "messages": [
    {
      "role": "user",
      "content": "EXTRACT the following data as JSON: name, age, location from: 'Sarah is 28 and lives in Seattle'"
    }
  ],
  "response_format": "json"
}
```

### 4. Handle Errors Gracefully

```python theme={null}
try:
    response = requests.post(url, json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.HTTPError as e:
    if response.status_code == 429:
        # Implement exponential backoff
        retry_after = int(response.headers.get('Retry-After', 60))
        time.sleep(retry_after)
    elif response.status_code >= 500:
        # Provider issue, retry with different request
        pass
```

## SDK Support

Official SDKs coming soon:

* Python SDK
* Node.js SDK
* Go SDK
* Ruby SDK

## Related Endpoints

* [Schema Enforcement](./schemas) - Validate and auto-repair LLM outputs
* [Tools API](./tools) - Documents/RAG, web search, and memory
* [Streaming API](./streaming) - For real-time responses
* [Metrics API](./metrics) - For analytics and monitoring
* [Logs API](./logs) - For debugging and audit trails

## Support

* **Documentation**: [docs.withperf.pro](https://docs.withperf.pro)
* **Email**: [support@withperf.pro](mailto:support@withperf.pro)
* **Status**: [status.withperf.pro](https://status.withperf.pro)
