Skip to main content

Chat API Reference

The Chat API is Perf’s primary endpoint for text generation. It automatically routes your request to the optimal model based on task type, complexity, and your cost constraints.

Endpoint

POST https://api.withperf.pro/v1/chat

Authentication

Include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY

Request Body

Required Parameters

ParameterTypeDescription
messagesarrayArray of message objects with role and content

Optional Parameters

ParameterTypeDefaultDescription
max_cost_per_callnumber0.01Maximum cost in USD for this request
temperaturenumber0.7Sampling temperature (0-2)
max_tokensnumber2048Maximum tokens to generate
top_pnumber1.0Nucleus sampling parameter
frequency_penaltynumber0.0Penalize repeated tokens (-2 to 2)
presence_penaltynumber0.0Penalize new topics (-2 to 2)
stoparraynullStop sequences to end generation
response_formatstring"text"Output format: "text" or "json"
user_idstringnullYour user identifier for analytics
metadataobject{}Custom metadata for tracking

Message Object

{
  "role": "user" | "assistant" | "system",
  "content": "string"
}

Request Example

curl -X POST https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_test_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that extracts structured data."
      },
      {
        "role": "user",
        "content": "Extract name, email, and phone from: John Doe, contact at [email protected] or call 555-1234"
      }
    ],
    "max_cost_per_call": 0.005,
    "response_format": "json",
    "temperature": 0.3
  }'

Response

Success Response (200 OK)

{
  "model_used": "gpt-4o-mini",
  "output": "{\n  \"name\": \"John Doe\",\n  \"email\": \"[email protected]\",\n  \"phone\": \"555-1234\"\n}",
  "billing": {
    "cost_usd": 0.00023,
    "cost_warning": false
  },
  "tokens": {
    "input": 47,
    "output": 28,
    "total": 75
  },
  "metadata": {
    "call_id": "call_abc123xyz",
    "task_type": "extraction",
    "complexity_score": 0.3,
    "routing_reason": "Optimal for structured data extraction",
    "latency_ms": 342,
    "fallback_used": false,
    "validation_passed": true,
    "timestamp": "2024-01-15T10:30:00Z"
  }
}

Response Fields

FieldTypeDescription
model_usedstringThe model that processed your request
outputstringThe generated text response
billing.cost_usdnumberActual cost of this request
billing.cost_warningbooleanTrue if cost exceeded max_cost_per_call
tokens.inputnumberInput tokens consumed
tokens.outputnumberOutput tokens generated
tokens.totalnumberTotal tokens used
metadata.call_idstringUnique identifier for this call
metadata.task_typestringDetected task type
metadata.complexity_scorenumberEstimated complexity (0-1)
metadata.routing_reasonstringWhy this model was selected
metadata.latency_msnumberResponse time in milliseconds
metadata.fallback_usedbooleanWhether fallback model was used
metadata.validation_passedbooleanWhether output passed quality checks
metadata.timestampstringISO 8601 timestamp

Task Types

Perf automatically detects your task type for optimal routing:
Task TypeDescriptionExample
extractionExtracting structured data”Extract email from text”
classificationCategorizing or labeling”Classify sentiment”
summarizationCondensing information”Summarize this article”
reasoningLogic and analysis”Solve this math problem”
codeCode generation/explanation”Write a binary search”
writingCreative or professional writing”Write a blog post”
generalGeneral conversation”Hello, how are you?”

Cost Control

Budget Enforcement

When you set max_cost_per_call, Perf will:
  1. Estimate the cost for the optimal model
  2. If estimated cost > budget, select a cheaper alternative
  3. Process with the selected model
  4. Set cost_warning: true if actual cost exceeds budget
{
  "messages": [...],
  "max_cost_per_call": 0.001  // 0.1 cents maximum
}
Response when budget is tight:
{
  "model_used": "gpt-4o-mini",
  "billing": {
    "cost_usd": 0.00012,
    "cost_warning": false
  },
  "metadata": {
    "routing_reason": "Budget-optimized selection (GPT-4o would exceed limit)"
  }
}

Quality Validation

Perf automatically validates outputs and retries if needed:

Validation Checks

  • JSON format correctness (when response_format: "json")
  • Refusal detection (“I cannot assist with that…”)
  • Incomplete response detection
  • Quality disclaimer detection (“As an AI…”)

Retry Logic

If validation fails:
  1. Retry with the same model (max 1 retry)
  2. If still failing, escalate to fallback model
  3. Return best available result

Multi-Turn Conversations

Include conversation history in the messages array:
{
  "messages": [
    {"role": "user", "content": "What is photosynthesis?"},
    {"role": "assistant", "content": "Photosynthesis is the process..."},
    {"role": "user", "content": "How does it differ from cellular respiration?"}
  ]
}
Perf automatically:
  • Summarizes long conversation history to fit context windows
  • Maintains semantic coherence
  • Optimizes for cost by compressing older messages

Error Responses

400 Bad Request

{
  "error": {
    "type": "invalid_request",
    "message": "messages array is required",
    "param": "messages"
  }
}

401 Unauthorized

{
  "error": {
    "type": "authentication_error",
    "message": "Invalid API key"
  }
}

429 Too Many Requests

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Rate limit exceeded",
    "retry_after": 30
  }
}
Response headers:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1673456789
Retry-After: 30

500 Internal Server Error

{
  "error": {
    "type": "server_error",
    "message": "An internal error occurred",
    "request_id": "req_abc123"
  }
}

503 Service Unavailable

{
  "error": {
    "type": "service_unavailable",
    "message": "All providers are currently experiencing issues",
    "retry_after": 60
  }
}

Advanced Usage

Structured Output

Request JSON output for easy parsing:
{
  "messages": [
    {
      "role": "user",
      "content": "List 3 benefits of exercise in JSON format with fields: benefit, description, category"
    }
  ],
  "response_format": "json",
  "temperature": 0.2
}

Custom Metadata

Track requests with custom metadata:
{
  "messages": [...],
  "user_id": "user_12345",
  "metadata": {
    "session_id": "sess_abc",
    "feature": "chat_support",
    "experiment_variant": "v2"
  }
}
View this metadata in your dashboard analytics.

Temperature Control

Adjust creativity vs consistency:
{
  "messages": [...],
  "temperature": 0.0  // Deterministic (good for data extraction)
}
{
  "messages": [...],
  "temperature": 1.5  // Creative (good for brainstorming)
}

Rate Limits

TierRequests/MinuteRequests/Day
Free601,000
Pro300100,000
EnterpriseCustomCustom

Best Practices

1. Set Appropriate Budgets

{
  "max_cost_per_call": 0.001  // Simple extraction
}
{
  "max_cost_per_call": 0.05   // Complex reasoning
}

2. Use System Messages

Guide model behavior with system messages:
{
  "messages": [
    {
      "role": "system",
      "content": "You are a concise assistant. Keep responses under 50 words."
    },
    {
      "role": "user",
      "content": "Explain gravity"
    }
  ]
}

3. Optimize for Task Type

Be explicit about the task for better routing:
{
  "messages": [
    {
      "role": "user",
      "content": "EXTRACT the following data as JSON: name, age, location from: 'Sarah is 28 and lives in Seattle'"
    }
  ],
  "response_format": "json"
}

4. Handle Errors Gracefully

try:
    response = requests.post(url, json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.HTTPError as e:
    if response.status_code == 429:
        # Implement exponential backoff
        retry_after = int(response.headers.get('Retry-After', 60))
        time.sleep(retry_after)
    elif response.status_code >= 500:
        # Provider issue, retry with different request
        pass

SDK Support

Official SDKs coming soon:
  • Python SDK
  • Node.js SDK
  • Go SDK
  • Ruby SDK

Support