Chat API Reference

The Chat API is Perf’s primary endpoint for text generation. It automatically routes your request to the optimal model based on task type, complexity, and your cost constraints. The response format is OpenAI-compatible, making it easy to integrate with existing applications.

Endpoint

POST https://api.withperf.pro/v1/chat

Authentication

Include your API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Request Body

Required Parameters

Parameter	Type	Description
`messages`	array	Array of message objects with `role` and `content`

Optional Parameters

Parameter	Type	Default	Description
`max_cost_per_call`	number	none	Maximum cost in USD for this request. If exceeded, Perf will try to use a cheaper model.

Message Object

{
  "role": "user" | "assistant" | "system",
  "content": "string" | ContentPart[]
}

Multimodal Content (Vision)

The content field can be a string for text-only messages, or an array of content parts for multimodal messages (images, audio, video, documents).

Content Part Types

Type	Format	Description
`text`	`{ type: "text", text: "..." }`	Text content
`image_url`	`{ type: "image_url", image_url: { url: "...", detail?: "low" \| "high" \| "auto" } }`	Image (base64 data URL or HTTPS URL)
`input_audio`	`{ type: "input_audio", input_audio: { data: "...", format: "wav" \| "mp3" } }`	Base64-encoded audio
`video_url`	`{ type: "video_url", video_url: { url: "..." } }`	Video URL
`document`	`{ type: "document", document: { type: "pdf", data: "...", name?: "..." } }`	Base64-encoded document

Vision Request Example

{
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What's in this image?" },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg",
            "detail": "high"
          }
        }
      ]
    }
  ]
}

Base64 Image Example

{
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Describe this screenshot" },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,iVBORw0KGgo..."
          }
        }
      ]
    }
  ]
}

Perf automatically routes vision requests to models with image understanding capabilities (GPT-4o, Claude 3.5 Sonnet, Gemini Pro Vision).

Request Example

curl -X POST https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_test_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that extracts structured data."
      },
      {
        "role": "user",
        "content": "Extract name, email, and phone from: John Doe, contact at john@example.com or call 555-1234"
      }
    ],
    "max_cost_per_call": 0.005
  }'

Response

Success Response (200 OK)

The response follows the OpenAI Chat Completion format:

{
  "id": "chatcmpl-abc123xyz",
  "object": "chat.completion",
  "created": 1705312200,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\n  \"name\": \"John Doe\",\n  \"email\": \"john@example.com\",\n  \"phone\": \"555-1234\"\n}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 47,
    "completion_tokens": 28,
    "total_tokens": 75
  },
  "perf": {
    "task_type": "extraction",
    "complexity": 0.3,
    "model_selected": "gpt-4o-mini",
    "latency_ms": 342,
    "fallback_used": false,
    "validation_passed": true
  }
}

Response Fields

Field	Type	Description
`id`	string	Unique identifier for the completion
`object`	string	Always `"chat.completion"`
`created`	number	Unix timestamp of when the completion was created
`model`	string	The model that processed your request
`choices`	array	Array of completion choices
`choices[].message.content`	string	The generated text response
`choices[].finish_reason`	string	Why the model stopped (`"stop"`, `"length"`, etc.)
`usage.prompt_tokens`	number	Input tokens consumed
`usage.completion_tokens`	number	Output tokens generated
`usage.total_tokens`	number	Total tokens used
`perf.task_type`	string	Detected task type
`perf.complexity`	number	Estimated complexity (0-1)
`perf.model_selected`	string	Model chosen by router
`perf.latency_ms`	number	Response time in milliseconds
`perf.fallback_used`	boolean	Whether fallback model was used
`perf.validation_passed`	boolean	Whether output passed quality checks
`perf.policy_evaluation`	object	Policy evaluation results (if policies configured)
`perf.content_evaluation`	object	Content evaluation results (if content policies configured)

Response Headers

Perf includes additional metadata in response headers:

Header	Description
`X-Perf-Model`	The model that processed the request
`X-Perf-Fallback`	`true` if a fallback model was used
`X-Perf-Latency-Ms`	Response time in milliseconds

Policy Evaluation (Pro+)

When routing policies are configured for your project, the response includes policy evaluation details:

{
  "perf": {
    "policy_evaluation": {
      "result": "allow",
      "violations": [],
      "modifications": [
        {
          "type": "model_override",
          "from": "gpt-4o",
          "to": "gemini-2.5-flash",
          "reason": "Budget Mode policy - prefer cheaper models"
        }
      ],
      "policies_evaluated": ["policy_abc123"],
      "policies_matched": ["policy_abc123"]
    }
  }
}

Field	Type	Description
`result`	string	Overall result: `allow`, `warn`, `soft_block`, `hard_block`
`violations`	array	List of policy violations (if any)
`modifications`	array	Changes made by policies (model overrides, etc.)
`policies_evaluated`	array	IDs of policies that were checked
`policies_matched`	array	IDs of policies that triggered

Policy Results:

allow - Request proceeds normally
warn - Request proceeds with warning logged
soft_block - Request proceeds with modifications applied
hard_block - Request rejected with 403 error

See Policies API for available policy templates and configuration.

Content Evaluation (Pro+)

When content policies are configured (PII detection, term filtering), the response includes content evaluation details:

{
  "perf": {
    "content_evaluation": {
      "result": "allow",
      "phase": "post_response",
      "pii_detected": true,
      "pii_count": 2,
      "redacted": true,
      "criteria_passed": 0,
      "criteria_failed": 0,
      "latency_ms": 15
    }
  }
}

Field	Type	Description
`result`	string	Overall result: `allow`, `warn`, `redact`, `block`
`phase`	string	Evaluation phase: `post_response`
`pii_detected`	boolean	Whether PII was detected in output
`pii_count`	integer	Number of PII items detected
`redacted`	boolean	Whether content was redacted
`criteria_passed`	integer	Number of criteria passed (if using LLM-as-judge)
`criteria_failed`	integer	Number of criteria failed
`latency_ms`	integer	Content evaluation latency

Content Results:

allow - Content passes all checks
warn - Content flagged but returned
redact - PII/terms redacted from output (e.g., john@example.com → [REDACTED])
block - Content blocked, error returned

Supported PII Types:

ssn - Social Security Numbers
credit_card - Credit card numbers (with Luhn validation)
email - Email addresses
phone_us - US phone numbers
ip_address - IP addresses
date_of_birth - Dates of birth

Task Types

Perf automatically detects your task type for optimal routing:

Task Type	Description	Example
`extraction`	Extracting structured data	”Extract email from text”
`classification`	Categorizing or labeling	”Classify sentiment”
`summarization`	Condensing information	”Summarize this article”
`reasoning`	Logic and analysis	”Solve this math problem”
`code`	Code generation/explanation	”Write a binary search”
`writing`	Creative or professional writing	”Write a blog post”
`vision`	Image understanding	Requests with image content parts
`audio`	Audio understanding	Requests with audio content parts

Generation Intent Detection

The Chat API intelligently detects when your prompt is requesting media generation (images, video, audio) and automatically routes to the appropriate generation model.

Example

curl -X POST https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_live_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Generate an image of a sunset over mountains"
      }
    ]
  }'

Perf understands this is an image generation request and routes to DALL-E, returning the generated image URL in the response. For more control over generation parameters (model selection, dimensions, quality), use the dedicated generation endpoints:

Image Generation API - DALL-E, Stable Diffusion, Flux, and more
Video Generation API - Veo, Runway, Luma, Pika
Audio API - Text-to-speech and transcription

Cost Control

Budget Enforcement

When you set max_cost_per_call, Perf will:

Estimate the cost for the optimal model
If estimated cost > budget, select a cheaper alternative
Process with the selected model
Include a cost_warning in the perf object if budget was a factor

{
  "messages": [...],
  "max_cost_per_call": 0.001
}

Quality Validation

Perf automatically validates outputs and retries if needed:

Validation Checks

JSON format correctness (for extraction/classification tasks)
Refusal detection (“I cannot assist with that…”)
Incomplete response detection

Retry Logic

If validation fails:

Retry with the same model (max 1 retry)
If still failing, escalate to fallback model
Return best available result

Multi-Turn Conversations

Include conversation history in the messages array:

{
  "messages": [
    {"role": "user", "content": "What is photosynthesis?"},
    {"role": "assistant", "content": "Photosynthesis is the process..."},
    {"role": "user", "content": "How does it differ from cellular respiration?"}
  ]
}

Perf automatically:

Summarizes long conversation history to fit context windows
Maintains semantic coherence
Optimizes for cost by compressing older messages

Error Responses

400 Bad Request

{
  "error": {
    "type": "invalid_request",
    "message": "messages array is required",
    "param": "messages"
  }
}

401 Unauthorized

{
  "error": {
    "type": "authentication_error",
    "message": "Invalid API key"
  }
}

429 Too Many Requests

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Rate limit exceeded",
    "retry_after": 30
  }
}

Response headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1673456789
Retry-After: 30

500 Internal Server Error

{
  "error": {
    "type": "server_error",
    "message": "An internal error occurred",
    "request_id": "req_abc123"
  }
}

503 Service Unavailable

{
  "error": {
    "type": "service_unavailable",
    "message": "All providers are currently experiencing issues",
    "retry_after": 60
  }
}

Structured Output

For extraction and classification tasks, Perf automatically detects when JSON output is needed and routes to models that excel at structured output. To get JSON output, simply ask for it in your prompt:

{
  "messages": [
    {
      "role": "user",
      "content": "List 3 benefits of exercise. Return as JSON with fields: benefit, description, category"
    }
  ]
}

Perf will detect this is an extraction task and route accordingly.

Rate Limits

Tier	Requests/Minute	Requests/Day
Free	60	1,000
Pro	300	100,000
Enterprise	Custom	Custom

Best Practices

1. Set Appropriate Budgets

{
  "max_cost_per_call": 0.001  // Simple extraction
}

{
  "max_cost_per_call": 0.05   // Complex reasoning
}

2. Use System Messages

Guide model behavior with system messages:

{
  "messages": [
    {
      "role": "system",
      "content": "You are a concise assistant. Keep responses under 50 words."
    },
    {
      "role": "user",
      "content": "Explain gravity"
    }
  ]
}

3. Optimize for Task Type

Be explicit about the task for better routing:

{
  "messages": [
    {
      "role": "user",
      "content": "EXTRACT the following data as JSON: name, age, location from: 'Sarah is 28 and lives in Seattle'"
    }
  ],
  "response_format": "json"
}

4. Handle Errors Gracefully

try:
    response = requests.post(url, json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.HTTPError as e:
    if response.status_code == 429:
        # Implement exponential backoff
        retry_after = int(response.headers.get('Retry-After', 60))
        time.sleep(retry_after)
    elif response.status_code >= 500:
        # Provider issue, retry with different request
        pass

SDK Support

Official SDKs coming soon:

Python SDK
Node.js SDK
Go SDK
Ruby SDK

Streaming API - For real-time responses
Metrics API - For analytics and monitoring
Logs API - For debugging and audit trails

Support

Documentation: docs.withperf.pro
Email: support@withperf.pro
Status: status.withperf.pro

Getting Started

Voice Agents

SDKs

API Documentation

Platform

Advanced

Resources

​Chat API Reference

​Endpoint

​Authentication

​Request Body

​Required Parameters

​Optional Parameters

​Message Object

​Multimodal Content (Vision)

​Content Part Types

​Vision Request Example

​Base64 Image Example

​Request Example

​Response

​Success Response (200 OK)

​Response Fields

​Response Headers

​Policy Evaluation (Pro+)

​Content Evaluation (Pro+)

​Task Types

​Generation Intent Detection

​Example

​Cost Control

​Budget Enforcement

​Quality Validation

​Validation Checks

​Retry Logic

​Multi-Turn Conversations

​Error Responses

​400 Bad Request

​401 Unauthorized

​429 Too Many Requests

​500 Internal Server Error

​503 Service Unavailable

​Structured Output

​Rate Limits

​Best Practices

​1. Set Appropriate Budgets

​2. Use System Messages

​3. Optimize for Task Type

​4. Handle Errors Gracefully

​SDK Support

​Related Endpoints

​Support