Skip to main content

Quickstart Guide

Get up and running with Perf in under 5 minutes.

Prerequisites

  • An account at withperf.pro
  • Your API key (available in the dashboard)
  • Basic knowledge of REST APIs

Step 1: Get Your API Key

  1. Sign up at withperf.pro/signup
  2. Navigate to SettingsAPI Keys
  3. Click Generate New Key
  4. Copy your key (format: pk_live_... for production, pk_test_... for testing)
Important: Store your API key securely. Never commit it to version control.

Step 2: Make Your First Request

Using cURL

curl https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_test_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "What are the three primary colors?"
      }
    ]
  }'

Response

{
  "model_used": "gpt-4o-mini",
  "output": "The three primary colors are red, blue, and yellow.",
  "billing": {
    "cost_usd": 0.00012
  },
  "tokens": {
    "input": 15,
    "output": 12,
    "total": 27
  },
  "metadata": {
    "task_type": "general",
    "latency_ms": 234,
    "routing_reason": "Optimal for simple queries"
  }
}

Step 3: Add Cost Controls

Control costs by setting a budget per request:
curl https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_test_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Write a comprehensive analysis of climate change impacts..."
      }
    ],
    "max_cost_per_call": 0.005
  }'
Perf will automatically select a model that stays within your budget. If the optimal model exceeds your limit, we’ll use the best alternative and include a cost warning in the response.

Step 4: Use Streaming for Real-Time Responses

For chat applications, use streaming to show responses as they’re generated:
curl https://api.withperf.pro/v1/chat/stream \
  -H "Authorization: Bearer pk_test_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing"
      }
    ]
  }'
The response uses Server-Sent Events (SSE) format:
data: {"chunk": "Quantum", "done": false}

data: {"chunk": " computing", "done": false}

data: {"chunk": " uses", "done": false}

data: {"model_used": "claude-sonnet-4-5", "billing": {...}, "done": true}

Step 5: View Your Analytics

  1. Go to withperf.pro/dashboard
  2. See real-time metrics:
    • Total requests
    • Average cost per request
    • Model distribution
    • Latency percentiles
    • Cost savings vs baseline

Language-Specific Examples

Python

import requests

url = "https://api.withperf.pro/v1/chat"
headers = {
    "Authorization": "Bearer pk_test_your_key_here",
    "Content-Type": "application/json"
}
payload = {
    "messages": [
        {"role": "user", "content": "Hello, world!"}
    ],
    "max_cost_per_call": 0.01
}

response = requests.post(url, json=payload, headers=headers)
data = response.json()

print(f"Response: {data['output']}")
print(f"Cost: ${data['billing']['cost_usd']:.5f}")
print(f"Model: {data['model_used']}")

JavaScript/TypeScript

const response = await fetch('https://api.withperf.pro/v1/chat', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer pk_test_your_key_here',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    messages: [
      { role: 'user', content: 'Hello, world!' }
    ],
    max_cost_per_call: 0.01
  })
});

const data = await response.json();
console.log('Response:', data.output);
console.log('Cost: $', data.billing.cost_usd);
console.log('Model:', data.model_used);

Go

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
)

type Message struct {
    Role    string `json:"role"`
    Content string `json:"content"`
}

type Request struct {
    Messages       []Message `json:"messages"`
    MaxCostPerCall float64   `json:"max_cost_per_call"`
}

func main() {
    req := Request{
        Messages: []Message{
            {Role: "user", Content: "Hello, world!"},
        },
        MaxCostPerCall: 0.01,
    }

    jsonData, _ := json.Marshal(req)

    httpReq, _ := http.NewRequest("POST",
        "https://api.withperf.pro/v1/chat",
        bytes.NewBuffer(jsonData))

    httpReq.Header.Set("Authorization", "Bearer pk_test_your_key_here")
    httpReq.Header.Set("Content-Type", "application/json")

    client := &http.Client{}
    resp, _ := client.Do(httpReq)
    defer resp.Body.Close()

    var result map[string]interface{}
    json.NewDecoder(resp.Body).Decode(&result)

    fmt.Println("Response:", result["output"])
    fmt.Println("Cost:", result["billing"].(map[string]interface{})["cost_usd"])
}

Common Use Cases

Task-Specific Optimization

Perf automatically detects your task type and selects the optimal model:
# Data Extraction - uses efficient models
{
  "messages": [{"role": "user", "content": "Extract email and phone from: John Doe [email protected] 555-1234"}]
}

# Complex Reasoning - uses powerful models
{
  "messages": [{"role": "user", "content": "Solve this logic puzzle: If all A are B..."}]
}

# Code Generation - uses code-specialized models
{
  "messages": [{"role": "user", "content": "Write a binary search in Python"}]
}

Multi-Turn Conversations

{
  "messages": [
    {"role": "user", "content": "What is photosynthesis?"},
    {"role": "assistant", "content": "Photosynthesis is..."},
    {"role": "user", "content": "How does it relate to cellular respiration?"}
  ]
}

Structured Output

{
  "messages": [
    {
      "role": "user",
      "content": "Extract structured data from: 'John Smith, 35 years old, lives in NYC'\n\nReturn JSON with name, age, location"
    }
  ],
  "response_format": "json"
}

Rate Limits

  • Free Tier: 1,000 requests/month, 60 requests/minute
  • Pro Tier: 100,000 requests/month, 300 requests/minute
  • Enterprise: Custom limits
When you exceed rate limits, you’ll receive a 429 Too Many Requests response with a Retry-After header.

Error Handling

try:
    response = requests.post(url, json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.HTTPError as e:
    if response.status_code == 429:
        print("Rate limit exceeded. Retry after:",
              response.headers.get('Retry-After'))
    elif response.status_code == 401:
        print("Invalid API key")
    elif response.status_code == 400:
        print("Invalid request:", response.json())
    else:
        print("Error:", e)

Next Steps

Need Help?