Skip to main content

Quickstart Guide

Get up and running with Perf in under 5 minutes.

Prerequisites

  • An account at withperf.pro
  • Your API key (available in the dashboard)
  • Basic knowledge of REST APIs

Step 1: Get Your API Key

  1. Sign up at dashboard.withperf.pro/sign-up
  2. Navigate to SettingsAPI Keys
  3. Click Generate New Key
  4. Copy your key (format: pk_live_... for production, pk_test_... for testing)
Important: Store your API key securely. Never commit it to version control.

Step 2: Make Your First Request

Using cURL

curl https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_test_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "What are the three primary colors?"
      }
    ]
  }'

Response

The response is OpenAI-compatible:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1705312200,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The three primary colors are red, blue, and yellow."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 12,
    "total_tokens": 27
  },
  "perf": {
    "task_type": "writing",
    "latency_ms": 234,
    "fallback_used": false
  }
}

Step 3: Add Cost Controls

Control costs by setting a budget per request:
curl https://api.withperf.pro/v1/chat \
  -H "Authorization: Bearer pk_test_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Write a comprehensive analysis of climate change impacts..."
      }
    ],
    "max_cost_per_call": 0.005
  }'
Perf will automatically select a model that stays within your budget. If the optimal model exceeds your limit, we’ll use the best alternative and include a cost warning in the response.

Step 4: Use Streaming for Real-Time Responses

For chat applications, use streaming to show responses as they’re generated:
curl https://api.withperf.pro/v1/chat/stream \
  -H "Authorization: Bearer pk_test_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing"
      }
    ]
  }'
The response uses OpenAI-compatible Server-Sent Events (SSE) format:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705312200,"model":"claude-sonnet-4-5-20250929","choices":[{"index":0,"delta":{"role":"assistant","content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705312200,"model":"claude-sonnet-4-5-20250929","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705312200,"model":"claude-sonnet-4-5-20250929","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Step 5: View Your Analytics

Analytics are available via the API. Dashboard features are coming soon. Use the Metrics API to:
  • View total requests and costs
  • Analyze model distribution
  • Track latency metrics
  • Export usage data

Language-Specific Examples

Python

import requests

url = "https://api.withperf.pro/v1/chat"
headers = {
    "Authorization": "Bearer pk_test_your_key_here",
    "Content-Type": "application/json"
}
payload = {
    "messages": [
        {"role": "user", "content": "Hello, world!"}
    ],
    "max_cost_per_call": 0.01
}

response = requests.post(url, json=payload, headers=headers)
data = response.json()

# OpenAI-compatible response format
content = data['choices'][0]['message']['content']
model = data['model']
tokens = data['usage']['total_tokens']

print(f"Response: {content}")
print(f"Model: {model}")
print(f"Tokens: {tokens}")

JavaScript/TypeScript

const response = await fetch('https://api.withperf.pro/v1/chat', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer pk_test_your_key_here',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    messages: [
      { role: 'user', content: 'Hello, world!' }
    ],
    max_cost_per_call: 0.01
  })
});

const data = await response.json();

// OpenAI-compatible response format
const content = data.choices[0].message.content;
const model = data.model;
const tokens = data.usage.total_tokens;

console.log('Response:', content);
console.log('Model:', model);
console.log('Tokens:', tokens);

Go

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
)

type Message struct {
    Role    string `json:"role"`
    Content string `json:"content"`
}

type Request struct {
    Messages       []Message `json:"messages"`
    MaxCostPerCall float64   `json:"max_cost_per_call"`
}

// OpenAI-compatible response structure
type Response struct {
    Model   string `json:"model"`
    Choices []struct {
        Message struct {
            Content string `json:"content"`
        } `json:"message"`
    } `json:"choices"`
    Usage struct {
        TotalTokens int `json:"total_tokens"`
    } `json:"usage"`
}

func main() {
    req := Request{
        Messages: []Message{
            {Role: "user", Content: "Hello, world!"},
        },
        MaxCostPerCall: 0.01,
    }

    jsonData, _ := json.Marshal(req)

    httpReq, _ := http.NewRequest("POST",
        "https://api.withperf.pro/v1/chat",
        bytes.NewBuffer(jsonData))

    httpReq.Header.Set("Authorization", "Bearer pk_test_your_key_here")
    httpReq.Header.Set("Content-Type", "application/json")

    client := &http.Client{}
    resp, _ := client.Do(httpReq)
    defer resp.Body.Close()

    var result Response
    json.NewDecoder(resp.Body).Decode(&result)

    fmt.Println("Response:", result.Choices[0].Message.Content)
    fmt.Println("Model:", result.Model)
    fmt.Println("Tokens:", result.Usage.TotalTokens)
}

Common Use Cases

Task-Specific Optimization

Perf automatically detects your task type and selects the optimal model:
# Data Extraction - uses efficient models
{
  "messages": [{"role": "user", "content": "Extract email and phone from: John Doe john@example.com 555-1234"}]
}

# Complex Reasoning - uses powerful models
{
  "messages": [{"role": "user", "content": "Solve this logic puzzle: If all A are B..."}]
}

# Code Generation - uses code-specialized models
{
  "messages": [{"role": "user", "content": "Write a binary search in Python"}]
}

Multi-Turn Conversations

{
  "messages": [
    {"role": "user", "content": "What is photosynthesis?"},
    {"role": "assistant", "content": "Photosynthesis is..."},
    {"role": "user", "content": "How does it relate to cellular respiration?"}
  ]
}

Structured Output

Perf automatically detects extraction tasks and routes to models that excel at structured output. Simply ask for JSON in your prompt:
{
  "messages": [
    {
      "role": "user",
      "content": "Extract structured data from: 'John Smith, 35 years old, lives in NYC'\n\nReturn JSON with name, age, location"
    }
  ]
}

Rate Limits

  • Free Tier: 1,000 requests/month, 60 requests/minute
  • Pro Tier: 100,000 requests/month, 300 requests/minute
  • Enterprise: Custom limits
When you exceed rate limits, you’ll receive a 429 Too Many Requests response with a Retry-After header.

Error Handling

try:
    response = requests.post(url, json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.HTTPError as e:
    if response.status_code == 429:
        print("Rate limit exceeded. Retry after:",
              response.headers.get('Retry-After'))
    elif response.status_code == 401:
        print("Invalid API key")
    elif response.status_code == 400:
        print("Invalid request:", response.json())
    else:
        print("Error:", e)

Next Steps

Need Help?