Skip to main content

Chat API Reference

The Chat API is Perf’s primary endpoint for text generation. It automatically routes your request to the optimal model based on task type, complexity, and your cost constraints. The response format is OpenAI-compatible, making it easy to integrate with existing applications.

Endpoint

POST https://api.withperf.pro/v1/chat

Authentication

Include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY

Request Body

Required Parameters

ParameterTypeDescription
messagesarrayArray of message objects with role and content

Optional Parameters

ParameterTypeDefaultDescription
max_cost_per_callnumbernoneMaximum cost in USD for this request. If exceeded, Perf will try to use a cheaper model.

Message Object

{
  "role": "user" | "assistant" | "system",
  "content": "string" | ContentPart[]
}

Multimodal Content (Vision)

The content field can be a string for text-only messages, or an array of content parts for multimodal messages (images, audio, video, documents).

Content Part Types

TypeFormatDescription
text{ type: "text", text: "..." }Text content
image_url{ type: "image_url", image_url: { url: "...", detail?: "low" | "high" | "auto" } }Image (base64 data URL or HTTPS URL)
input_audio{ type: "input_audio", input_audio: { data: "...", format: "wav" | "mp3" } }Base64-encoded audio
video_url{ type: "video_url", video_url: { url: "..." } }Video URL
document{ type: "document", document: { type: "pdf", data: "...", name?: "..." } }Base64-encoded document

Vision Request Example

{
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What's in this image?" },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg",
            "detail": "high"
          }
        }
      ]
    }
  ]
}

Base64 Image Example

{
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Describe this screenshot" },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,iVBORw0KGgo..."
          }
        }
      ]
    }
  ]
}
Perf automatically routes vision requests to models with image understanding capabilities (GPT-4o, Claude 3.5 Sonnet, Gemini Pro Vision).

Request Example

curl -X POST https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_test_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that extracts structured data."
      },
      {
        "role": "user",
        "content": "Extract name, email, and phone from: John Doe, contact at john@example.com or call 555-1234"
      }
    ],
    "max_cost_per_call": 0.005
  }'

Response

Success Response (200 OK)

The response follows the OpenAI Chat Completion format:
{
  "id": "chatcmpl-abc123xyz",
  "object": "chat.completion",
  "created": 1705312200,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\n  \"name\": \"John Doe\",\n  \"email\": \"john@example.com\",\n  \"phone\": \"555-1234\"\n}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 47,
    "completion_tokens": 28,
    "total_tokens": 75
  },
  "perf": {
    "task_type": "extraction",
    "complexity": 0.3,
    "model_selected": "gpt-4o-mini",
    "latency_ms": 342,
    "fallback_used": false,
    "validation_passed": true
  }
}

Response Fields

FieldTypeDescription
idstringUnique identifier for the completion
objectstringAlways "chat.completion"
creatednumberUnix timestamp of when the completion was created
modelstringThe model that processed your request
choicesarrayArray of completion choices
choices[].message.contentstringThe generated text response
choices[].finish_reasonstringWhy the model stopped ("stop", "length", etc.)
usage.prompt_tokensnumberInput tokens consumed
usage.completion_tokensnumberOutput tokens generated
usage.total_tokensnumberTotal tokens used
perf.task_typestringDetected task type
perf.complexitynumberEstimated complexity (0-1)
perf.model_selectedstringModel chosen by router
perf.latency_msnumberResponse time in milliseconds
perf.fallback_usedbooleanWhether fallback model was used
perf.validation_passedbooleanWhether output passed quality checks
perf.policy_evaluationobjectPolicy evaluation results (if policies configured)
perf.content_evaluationobjectContent evaluation results (if content policies configured)

Response Headers

Perf includes additional metadata in response headers:
HeaderDescription
X-Perf-ModelThe model that processed the request
X-Perf-Fallbacktrue if a fallback model was used
X-Perf-Latency-MsResponse time in milliseconds

Policy Evaluation (Pro+)

When routing policies are configured for your project, the response includes policy evaluation details:
{
  "perf": {
    "policy_evaluation": {
      "result": "allow",
      "violations": [],
      "modifications": [
        {
          "type": "model_override",
          "from": "gpt-4o",
          "to": "gemini-2.5-flash",
          "reason": "Budget Mode policy - prefer cheaper models"
        }
      ],
      "policies_evaluated": ["policy_abc123"],
      "policies_matched": ["policy_abc123"]
    }
  }
}
FieldTypeDescription
resultstringOverall result: allow, warn, soft_block, hard_block
violationsarrayList of policy violations (if any)
modificationsarrayChanges made by policies (model overrides, etc.)
policies_evaluatedarrayIDs of policies that were checked
policies_matchedarrayIDs of policies that triggered
Policy Results:
  • allow - Request proceeds normally
  • warn - Request proceeds with warning logged
  • soft_block - Request proceeds with modifications applied
  • hard_block - Request rejected with 403 error
See Policies API for available policy templates and configuration.

Content Evaluation (Pro+)

When content policies are configured (PII detection, term filtering), the response includes content evaluation details:
{
  "perf": {
    "content_evaluation": {
      "result": "allow",
      "phase": "post_response",
      "pii_detected": true,
      "pii_count": 2,
      "redacted": true,
      "criteria_passed": 0,
      "criteria_failed": 0,
      "latency_ms": 15
    }
  }
}
FieldTypeDescription
resultstringOverall result: allow, warn, redact, block
phasestringEvaluation phase: post_response
pii_detectedbooleanWhether PII was detected in output
pii_countintegerNumber of PII items detected
redactedbooleanWhether content was redacted
criteria_passedintegerNumber of criteria passed (if using LLM-as-judge)
criteria_failedintegerNumber of criteria failed
latency_msintegerContent evaluation latency
Content Results:
  • allow - Content passes all checks
  • warn - Content flagged but returned
  • redact - PII/terms redacted from output (e.g., john@example.com[REDACTED])
  • block - Content blocked, error returned
Supported PII Types:
  • ssn - Social Security Numbers
  • credit_card - Credit card numbers (with Luhn validation)
  • email - Email addresses
  • phone_us - US phone numbers
  • ip_address - IP addresses
  • date_of_birth - Dates of birth

Task Types

Perf automatically detects your task type for optimal routing:
Task TypeDescriptionExample
extractionExtracting structured data”Extract email from text”
classificationCategorizing or labeling”Classify sentiment”
summarizationCondensing information”Summarize this article”
reasoningLogic and analysis”Solve this math problem”
codeCode generation/explanation”Write a binary search”
writingCreative or professional writing”Write a blog post”
visionImage understandingRequests with image content parts
audioAudio understandingRequests with audio content parts

Generation Intent Detection

The Chat API intelligently detects when your prompt is requesting media generation (images, video, audio) and automatically routes to the appropriate generation model.

Example

curl -X POST https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_live_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Generate an image of a sunset over mountains"
      }
    ]
  }'
Perf understands this is an image generation request and routes to DALL-E, returning the generated image URL in the response. For more control over generation parameters (model selection, dimensions, quality), use the dedicated generation endpoints:

Cost Control

Budget Enforcement

When you set max_cost_per_call, Perf will:
  1. Estimate the cost for the optimal model
  2. If estimated cost > budget, select a cheaper alternative
  3. Process with the selected model
  4. Include a cost_warning in the perf object if budget was a factor
{
  "messages": [...],
  "max_cost_per_call": 0.001
}

Quality Validation

Perf automatically validates outputs and retries if needed:

Validation Checks

  • JSON format correctness (for extraction/classification tasks)
  • Refusal detection (“I cannot assist with that…”)
  • Incomplete response detection

Retry Logic

If validation fails:
  1. Retry with the same model (max 1 retry)
  2. If still failing, escalate to fallback model
  3. Return best available result

Multi-Turn Conversations

Include conversation history in the messages array:
{
  "messages": [
    {"role": "user", "content": "What is photosynthesis?"},
    {"role": "assistant", "content": "Photosynthesis is the process..."},
    {"role": "user", "content": "How does it differ from cellular respiration?"}
  ]
}
Perf automatically:
  • Summarizes long conversation history to fit context windows
  • Maintains semantic coherence
  • Optimizes for cost by compressing older messages

Error Responses

400 Bad Request

{
  "error": {
    "type": "invalid_request",
    "message": "messages array is required",
    "param": "messages"
  }
}

401 Unauthorized

{
  "error": {
    "type": "authentication_error",
    "message": "Invalid API key"
  }
}

429 Too Many Requests

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Rate limit exceeded",
    "retry_after": 30
  }
}
Response headers:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1673456789
Retry-After: 30

500 Internal Server Error

{
  "error": {
    "type": "server_error",
    "message": "An internal error occurred",
    "request_id": "req_abc123"
  }
}

503 Service Unavailable

{
  "error": {
    "type": "service_unavailable",
    "message": "All providers are currently experiencing issues",
    "retry_after": 60
  }
}

Structured Output

For extraction and classification tasks, Perf automatically detects when JSON output is needed and routes to models that excel at structured output. To get JSON output, simply ask for it in your prompt:
{
  "messages": [
    {
      "role": "user",
      "content": "List 3 benefits of exercise. Return as JSON with fields: benefit, description, category"
    }
  ]
}
Perf will detect this is an extraction task and route accordingly.

Rate Limits

TierRequests/MinuteRequests/Day
Free601,000
Pro300100,000
EnterpriseCustomCustom

Best Practices

1. Set Appropriate Budgets

{
  "max_cost_per_call": 0.001  // Simple extraction
}
{
  "max_cost_per_call": 0.05   // Complex reasoning
}

2. Use System Messages

Guide model behavior with system messages:
{
  "messages": [
    {
      "role": "system",
      "content": "You are a concise assistant. Keep responses under 50 words."
    },
    {
      "role": "user",
      "content": "Explain gravity"
    }
  ]
}

3. Optimize for Task Type

Be explicit about the task for better routing:
{
  "messages": [
    {
      "role": "user",
      "content": "EXTRACT the following data as JSON: name, age, location from: 'Sarah is 28 and lives in Seattle'"
    }
  ],
  "response_format": "json"
}

4. Handle Errors Gracefully

try:
    response = requests.post(url, json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.HTTPError as e:
    if response.status_code == 429:
        # Implement exponential backoff
        retry_after = int(response.headers.get('Retry-After', 60))
        time.sleep(retry_after)
    elif response.status_code >= 500:
        # Provider issue, retry with different request
        pass

SDK Support

Official SDKs coming soon:
  • Python SDK
  • Node.js SDK
  • Go SDK
  • Ruby SDK

Support