Supported Models

Perf supports the latest and most capable models from leading AI providers. We automatically orchestrate your requests to the optimal model based on task type, quality requirements, and cost constraints.

Current Model Portfolio

OpenAI Models

GPT-5.2 (Latest)

Best for: Coding, agentic tasks, complex problem solving

Model ID: gpt-5.2
Provider: OpenAI
Context Window: 400,000 tokens
Max Output: 64,000 tokens
Training Data: Up to Jul 2025

Strengths:
  - Best-in-class coding capabilities
  - Superior agentic task performance
  - Extended reasoning with configurable effort
  - Large context window (400K tokens)
  - High-quality structured output

Weaknesses:
  - Premium pricing
  - Higher latency for complex tasks
  - May be overkill for simple operations

Best Use Cases:
  • Complex code generation and debugging
  • Multi-step agentic workflows
  • Advanced reasoning tasks
  • Long-form technical content
  • System design and architecture

GPT-5 mini

Best for: Fast, cost-efficient general-purpose tasks

Model ID: gpt-5-mini
Provider: OpenAI
Context Window: 400,000 tokens
Max Output: 64,000 tokens

Strengths:
  - Excellent cost/performance ratio
  - Fast response time
  - Large context window
  - Good general capabilities
  - Suitable for most common tasks

Weaknesses:
  - Less capable at complex reasoning vs GPT-5.2
  - May require more specific prompts

Best Use Cases:
  • Data extraction
  • Classification
  • Summarization
  • Simple Q&A
  • General chat
  • Content moderation

GPT-5 nano

Best for: Maximum speed and cost efficiency

Model ID: gpt-5-nano
Provider: OpenAI
Context Window: 400,000 tokens

Strengths:
  - Fastest OpenAI model
  - Most cost-efficient
  - Still maintains GPT-5 quality baseline
  - Large context support

Weaknesses:
  - Simplified capabilities vs full GPT-5
  - Best for well-defined tasks

Best Use Cases:
  • High-volume processing
  • Real-time applications
  • Simple classification
  • Quick lookups
  • Cost-sensitive workloads

GPT-4.1

Best for: Non-reasoning tasks, general intelligence

Model ID: gpt-4.1
Provider: OpenAI
Context Window: 128,000 tokens

Strengths:
  - Smartest non-reasoning model
  - Balanced performance
  - Proven reliability
  - Good multimodal support

Weaknesses:
  - Superseded by GPT-5 series
  - Higher cost than newer efficient models

Best Use Cases:
  • Legacy application support
  • When explicit reasoning not needed
  • Balanced general-purpose tasks

Anthropic Claude Models

Claude Opus 4.5 (Latest)

Best for: Maximum intelligence with practical performance

Model ID: claude-opus-4-5
Provider: Anthropic
Context Window: 200,000 tokens
Max Output: 64,000 tokens
Training Data: Up to Aug 2025

Pricing:
  Input:  $5 per 1M tokens
  Output: $25 per 1M tokens

Strengths:
  - Premium model with highest intelligence
  - Extended thinking capability
  - Exceptional reasoning
  - Large output capacity (64K tokens)
  - Most recent training data (Aug 2025)

Weaknesses:
  - Higher cost
  - Moderate latency
  - May be excessive for simple tasks

Best Use Cases:
  • Complex analysis and reasoning
  • Long-form content generation
  • Research and synthesis
  • Critical decision-making
  • Advanced code review

Claude Sonnet 4.5

Best for: Complex agents and coding tasks

Model ID: claude-sonnet-4-5
Provider: Anthropic
Context Window: 200,000 tokens (1M beta available)
Max Output: 64,000 tokens
Training Data: Up to Jul 2025

Pricing:
  Input:  $3 per 1M tokens
  Output: $15 per 1M tokens

Strengths:
  - Outstanding coding capabilities
  - Excellent for agentic workflows
  - Extended thinking support
  - Optional 1M context window (beta)
  - Fast performance
  - Best balance of intelligence and speed

Weaknesses:
  - Premium pricing tier
  - 1M context in beta (long context pricing applies)

Best Use Cases:
  • Complex coding tasks
  • Agentic AI workflows
  • Long document analysis (with 1M context)
  • Technical writing
  • Advanced problem solving

Claude Haiku 4.5

Best for: Speed and cost efficiency with near-frontier intelligence

Model ID: claude-haiku-4-5
Provider: Anthropic
Context Window: 200,000 tokens
Max Output: 64,000 tokens
Training Data: Up to Jul 2025

Pricing:
  Input:  $1 per 1M tokens
  Output: $5 per 1M tokens

Strengths:
  - Fastest Claude model
  - Most cost-effective from Anthropic
  - Near-frontier intelligence
  - Extended thinking support
  - Large context window
  - Excellent for structured tasks

Weaknesses:
  - Less capable than Opus/Sonnet for complex reasoning
  - Better for focused tasks than open-ended ones

Best Use Cases:
  • Data extraction and structuring
  • High-volume processing
  • Real-time chat applications
  • Classification at scale
  • Cost-sensitive applications
  • Quick analysis tasks

Google Gemini Models

Gemini 3 Pro (Preview)

Best for: World’s best multimodal understanding

Model ID: gemini-3-pro-preview
Provider: Google
Context Window: 1,048,576 tokens (1M)
Max Output: 65,536 tokens
Inputs: Text, Image, Video, Audio, PDF

Strengths:
  - Best-in-world multimodal understanding
  - Massive 1M token context
  - Supports all input types
  - Large output capacity
  - Advanced reasoning

Weaknesses:
  - Preview status (may change)
  - Higher latency with large inputs
  - Premium pricing expected

Best Use Cases:
  • Complex multimodal analysis
  • Video and audio understanding
  • Processing entire codebases
  • Multi-document reasoning
  • Advanced research tasks

Gemini 3 Flash (Preview)

Best for: Speed, scale, and frontier intelligence

Model ID: gemini-3-flash-preview
Provider: Google
Context Window: 1,048,576 tokens (1M)
Max Output: 65,536 tokens
Inputs: Text, Image, Video, Audio, PDF

Strengths:
  - Balanced model for speed and intelligence
  - Multimodal support
  - 1M context window
  - Fast inference
  - Frontier-level capabilities

Weaknesses:
  - Preview status
  - Slightly lower quality than Pro

Best Use Cases:
  • Large-scale multimodal processing
  • Fast document analysis
  • Real-time video/audio tasks
  • High-volume workloads

Gemini 2.5 Flash (Stable)

Best for: Production-ready large-scale processing

Model ID: gemini-2.5-flash
Provider: Google
Context Window: 1,048,576 tokens (1M)
Max Output: 65,536 tokens

Strengths:
  - Stable production model
  - Excellent for low-latency tasks
  - High-volume processing
  - 1M context window
  - Cost-effective

Weaknesses:
  - Generation 2.5 (superseded by Gemini 3)
  - Less capable than latest models

Best Use Cases:
  • Production applications requiring stability
  • Large-scale batch processing
  • High-throughput systems
  • Cost-optimized workloads

Gemini 2.5 Pro (Stable)

Best for: State-of-the-art thinking and reasoning

Model ID: gemini-2.5-pro
Provider: Google
Context Window: 1,048,576 tokens (1M)
Max Output: 65,536 tokens

Strengths:
  - State-of-the-art reasoning model
  - Complex problem solving
  - 1M context support
  - Stable for production
  - Advanced thinking capabilities

Best Use Cases:
  • Complex reasoning tasks
  • Multi-step problem solving
  • Research and analysis
  • Long-context understanding

Gemini 2.5 Flash-Lite (Stable)

Best for: Maximum cost efficiency

Model ID: gemini-2.5-flash-lite
Provider: Google
Context Window: 1,048,576 tokens (1M)
Max Output: 65,536 tokens

Strengths:
  - Fastest Flash model
  - Optimized for cost-efficiency
  - Still has 1M context
  - Good for simple tasks

Best Use Cases:
  • Ultra-high-volume processing
  • Cost-critical applications
  • Simple classification
  • Quick lookups

Mistral AI Models

Mistral Large

Best for: European data residency, reasoning, multilingual

Model ID: mistral-large-latest
Provider: Mistral AI
Context Window: 128,000 tokens
Max Output: 4,096 tokens

Strengths:
  - Strong reasoning capabilities
  - Excellent multilingual support
  - European data centers (GDPR compliant)
  - Function calling support
  - Competitive pricing

Best Use Cases:
  • European/GDPR-compliant applications
  • Multilingual tasks
  • Reasoning and analysis
  • Function calling applications

Alibaba Qwen Models

Qwen 2.5 72B

Best for: Cost-effective reasoning, Asian language support

Model ID: qwen-2.5-72b-instruct
Provider: Alibaba Cloud
Context Window: 32,768 tokens
Max Output: 8,192 tokens

Strengths:
  - Excellent value for money
  - Strong Chinese/Asian language support
  - Good reasoning capabilities
  - Fast inference

Best Use Cases:
  • Chinese language tasks
  • Cost-sensitive applications
  • Asian market applications
  • Bilingual applications

Meta Llama Models

Llama 3.1 405B

Best for: Open source, self-hosting, customization

Model ID: llama-3.1-405b-instruct
Provider: Meta (via various hosts)
Context Window: 128,000 tokens
Max Output: 4,096 tokens

Strengths:
  - Open weights (self-hostable)
  - Strong general capabilities
  - Customizable via fine-tuning
  - Active community
  - No vendor lock-in

Best Use Cases:
  • Self-hosted deployments
  • Fine-tuning for specific domains
  • Air-gapped environments
  • Long-term cost optimization

Llama 3.1 70B

Best for: Balanced open-source performance

Model ID: llama-3.1-70b-instruct
Provider: Meta (via various hosts)
Context Window: 128,000 tokens
Max Output: 4,096 tokens

Strengths:
  - Good balance of cost and quality
  - Faster than 405B variant
  - Open weights
  - Suitable for most tasks

Best Use Cases:
  • General-purpose applications
  • Cost-effective self-hosting
  • Fine-tuning base

Model Comparison Matrix

Feature	GPT-5.2	Claude Opus 4.5	Claude Sonnet 4.5	Gemini 3 Pro	Gemini 2.5 Pro
Reasoning	Excellent	Excellent	Excellent	Excellent	Excellent
Speed	Medium	Moderate	Fast	Medium	Medium
Context	400K	200K	200K/1M	1M	1M
Max Output	64K	64K	64K	65K	65K
Multimodal	Yes	No	No	Yes	No
Training Data	Jul 2025	Aug 2025	Jul 2025	Current	Current
Best For	Coding, Agents	Max Intelligence	Balanced	Multimodal	Reasoning

Intelligent Orchestration

Perf automatically selects the optimal model based on:

1. Task Type Detection

Extraction → Claude Haiku 4.5 or Gemini Flash-Lite
  "Extract email from: [email protected]"
  Reason: Fast, accurate, cost-effective for structured data

Classification → GPT-5 nano or Gemini 2.5 Flash-Lite
  "Is this email spam or legitimate?"
  Reason: Good accuracy, minimal cost

Summarization → GPT-5 mini or Claude Sonnet 4.5
  "Summarize this article..."
  Reason: Length-dependent (short→mini, long→Sonnet)

Complex Reasoning → Claude Opus 4.5 or GPT-5.2
  "Solve this logic puzzle..."
  Reason: Superior reasoning capabilities

Coding → GPT-5.2 or Claude Sonnet 4.5
  "Write a distributed system design"
  Reason: Best code generation and system design

Multimodal → Gemini 3 Pro or GPT-5.2
  "Analyze this video and extract insights"
  Reason: Native multimodal support

2. Complexity Scoring

Low Complexity (0.0-0.3) → Haiku, Flash-Lite, GPT-5 nano
  "What is 2+2?"
  "Extract phone number from text"

Medium Complexity (0.3-0.6) → GPT-5 mini, Sonnet, Gemini 2.5
  "Summarize this 500-word article"
  "Categorize customer feedback"

High Complexity (0.6-1.0) → GPT-5.2, Opus 4.5, Gemini 3
  "Design a distributed architecture"
  "Analyze complex multi-modal data"

3. Cost Constraints

Budget: $0.001
  → Gemini Flash-Lite or GPT-5 nano

Budget: $0.005
  → GPT-5 mini, Claude Haiku, or Gemini 2.5 Flash

Budget: $0.02+
  → Any model (GPT-5.2, Opus 4.5 available if optimal)

No budget specified
  → Optimal model selected regardless of cost

Multimodal Support

Perf supports multimodal inputs for compatible models:

Vision (Image Understanding)

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_url", "image_url": "https://example.com/image.jpg"}
      ]
    }
  ]
}

Supported Models:

Gemini 3 Pro (best quality, all input types)
GPT-5.2 (excellent vision support)
Gemini 3 Flash (fastest multimodal)
GPT-4.1 (legacy vision support)

Audio & Video

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Transcribe and summarize this video"},
        {"type": "video_url", "video_url": "https://example.com/video.mp4"}
      ]
    }
  ],
  "model": "gemini-3-pro-preview"
}

Supported Models:

Gemini 3 Pro (video, audio, PDF)
Gemini 3 Flash (video, audio, PDF)

Provider Reliability

Historical Uptime (Last 90 Days)

OpenAI: 99.8%
  Latest Models: GPT-5.2, GPT-5 mini, GPT-5 nano
  Incidents: Minimal downtime

Anthropic: 99.9%
  Latest Models: Claude Opus/Sonnet/Haiku 4.5
  Incidents: Excellent reliability

Google: 99.9%
  Latest Models: Gemini 3 Pro/Flash (preview), Gemini 2.5 series (stable)
  Incidents: Very stable

Failover Strategy

Perf automatically handles provider issues:

1. Primary model fails or times out
   ↓
2. Retry with exponential backoff (up to 3 attempts)
   ↓
3. Escalate to equivalent fallback model
   ↓
4. Try alternative provider with similar capabilities
   ↓
5. Return best available result

Best Practices

Choose the Right Model

Recommended:
  - Let Perf auto-orchestrate based on task
  - Set cost budgets, not specific models
  - Use task type hints for better selection
  - Test different models for your use case

Avoid:
  - Always using the most expensive model
  - Always using the cheapest model
  - Ignoring Perf recommendations
  - Overriding without valid reason

Optimize for Your Use Case

Latency-Sensitive (chat apps):
  → GPT-5 nano, Claude Haiku 4.5, or Gemini 3 Flash

Cost-Sensitive (high volume):
  → Gemini 2.5 Flash-Lite or GPT-5 nano with auto-orchestration

Quality-Critical (customer-facing):
  → Claude Opus 4.5 or GPT-5.2 with validation

Coding & Agentic:
  → GPT-5.2 or Claude Sonnet 4.5

Multimodal:
  → Gemini 3 Pro or GPT-5.2

GDPR-Compliant:
  → Mistral Large (European data centers)

Balanced:
  → Perf auto-orchestration (learns optimal mix)

Context Window Best Practices

Optimal Context Usage

GPT-5 series (400K tokens)
  Optimal: < 100K tokens
  Good: 100K-200K tokens
  Extended: 200K-400K tokens

Claude 4.5 series (200K / 1M beta)
  Optimal: < 100K tokens
  Good: 100K-200K tokens
  Extended (1M beta): 200K-1M tokens (long context pricing)

Gemini series (1M tokens)
  Optimal: < 500K tokens
  Good: 500K-800K tokens
  Extended: 800K-1M tokens

Perf automatically optimizes context when approaching limits.

Cost Optimization

Example: Customer Support Chatbot

Assumptions:
  - 10,000 conversations/month
  - Average 5 messages per conversation (50K total)
  - 100 input tokens, 50 output tokens per message

Model: GPT-5 mini
  Cost per message: ~$0.00006
  Monthly cost: ~$3,000

Model: Claude Haiku 4.5
  Cost per message: ~$0.00015
  Monthly cost: ~$7,500

Model: Gemini 2.5 Flash-Lite
  Cost per message: ~$0.00001
  Monthly cost: ~$500

Model: Perf Auto-Orchestration
  Mix: 60% Flash-Lite, 25% GPT-5 nano, 10% Haiku, 5% Sonnet
  Monthly cost: ~$1,200
  Savings: $1,800/month (60%) vs GPT-5 mini baseline

FAQ

Q: Which is the best model? A: Depends on your task. For coding: GPT-5.2 or Claude Sonnet 4.5. For multimodal: Gemini 3 Pro. For cost: Gemini Flash-Lite or GPT-5 nano. Use Perf auto-orchestration to let us choose. Q: Can I use only one provider? A: Yes, configure in Settings → Orchestration → Provider Preference Q: How often are new models added? A: We add new models within days of provider release Q: Can I bring my own model? A: Yes (Enterprise), contact [email protected] Q: Do you support fine-tuned models? A: Yes, you can upload and deploy fine-tuned versions of supported models Q: What about model deprecations? A: We handle migrations automatically when providers deprecate models Q: Do these models support function calling? A: Yes, all GPT-5, Claude 4.5, and Gemini models support function/tool calling

Next Steps

Support

Email: [email protected]
Model Requests: [email protected]
Documentation: docs.withperf.pro

Getting Started

API Documentation

Platform

Guides

Advanced

Resources

​Supported Models

​Current Model Portfolio

​OpenAI Models

​GPT-5.2 (Latest)

​GPT-5 mini

​GPT-5 nano

​GPT-4.1

​Anthropic Claude Models

​Claude Opus 4.5 (Latest)

​Claude Sonnet 4.5

​Claude Haiku 4.5

​Google Gemini Models

​Gemini 3 Pro (Preview)

​Gemini 3 Flash (Preview)

​Gemini 2.5 Flash (Stable)

​Gemini 2.5 Pro (Stable)

​Gemini 2.5 Flash-Lite (Stable)

​Mistral AI Models

​Mistral Large

​Alibaba Qwen Models

​Qwen 2.5 72B

​Meta Llama Models

​Llama 3.1 405B

​Llama 3.1 70B

​Model Comparison Matrix

​Intelligent Orchestration

​1. Task Type Detection

​2. Complexity Scoring

​3. Cost Constraints

​Multimodal Support

​Vision (Image Understanding)

​Audio & Video

​Provider Reliability

​Historical Uptime (Last 90 Days)

​Failover Strategy

​Best Practices

​Choose the Right Model

​Optimize for Your Use Case

​Context Window Best Practices

​Optimal Context Usage

​Cost Optimization

​Example: Customer Support Chatbot

​FAQ

​Next Steps

​Support