Skip to main content

Frequently Asked Questions

General

What is Perf?

Perf is an AI runtime orchestrator that sits between your application and LLM providers. We automatically select the optimal model for each request based on your cost, quality, and reliability requirements.

How does Perf reduce costs?

Perf analyzes each request and intelligently selects the most cost-effective model that can handle it. Simple queries get sent to cheaper models like GPT-4o Mini, while complex tasks use more powerful models. This typically reduces costs by 40-60% compared to using a single premium model.

Which LLM providers does Perf support?

Perf integrates with all major LLM providers:
  • OpenAI (GPT-4o, GPT-4o Mini, o1, o1-mini)
  • Anthropic (Claude Sonnet 4.5, Claude Haiku 4.5)
  • Google (Gemini 1.5 Pro, Gemini 1.5 Flash)
  • Meta (Llama 3.1 405B, 70B)
  • Mistral AI (Mistral Large)
  • Alibaba (Qwen 2.5 72B)

Is Perf compatible with the OpenAI API?

Yes, Perf is fully OpenAI-compatible. You can replace your OpenAI base URL with Perf’s endpoint and everything will work seamlessly, with added benefits of cost optimization and quality control.

Pricing & Billing

How does Perf pricing work?

You only pay for the actual model usage. Perf charges the standard rate of whichever model we select for your request, with no markup. We make money through our enterprise plans with additional features.

Is there a free tier?

Yes, we offer a free tier with 10,000 requests per month to get started. Perfect for testing and small projects.

Can I set cost limits?

Yes, you can set cost limits at multiple levels:
  • Per-request maximum (max_cost_per_call parameter)
  • Daily/monthly account limits in the dashboard
  • Team-wide budgets for enterprise plans

Technical

What’s the latency overhead?

Perf adds minimal latency overhead (typically 20-50ms) for orchestration decisions. Our intelligent caching and model selection algorithms are optimized for speed.

Do you support streaming?

Yes, Perf fully supports streaming responses using Server-Sent Events (SSE), just like the OpenAI API.

Can I use specific models?

Yes, you can override the automatic orchestration by specifying a model parameter in your request. This gives you full control when needed.

Do you support multimodal inputs?

Yes, Perf supports images, PDFs, and other multimodal inputs. The system will automatically select models that support the input format.

How do you ensure data privacy?

We are SOC 2 Type II compliant and never store your prompt data. All requests are proxied directly to the provider and only metadata (model used, tokens, cost) is logged.

Platform Features

What analytics do you provide?

The Perf dashboard provides:
  • Real-time request monitoring
  • Cost breakdown by model and task type
  • Performance metrics (latency, success rate)
  • Quality scores and user feedback
  • ROI tracking and savings reports

Can multiple team members access the same account?

Yes, our Team and Enterprise plans support multiple users with role-based access control. You can set different permissions for developers, analysts, and administrators.

Do you offer SLA guarantees?

Yes, our paid plans include 99.9% uptime SLA with automatic failover across providers to ensure reliability.

Can I export my data?

Yes, you can export all logs, metrics, and analytics data via our API or through the dashboard. We support JSON, CSV, and webhook integrations.

Getting Started

How long does integration take?

Most teams are up and running in under 30 minutes. If you’re already using OpenAI, it’s as simple as changing your base URL and adding your Perf API key.

Do you provide migration support?

Yes, our team provides migration support for all paid plans. We’ll help you migrate from your existing LLM setup and optimize your configuration.

Can I test Perf before committing?

Absolutely. Sign up for our free tier and test with your actual use cases. No credit card required.

Where can I get help?

Troubleshooting

What if a request fails?

Perf includes automatic retry logic with intelligent fallback. If a request fails with one provider, we automatically retry with an alternative provider and model.

How do I monitor quality?

The dashboard provides quality scoring based on:
  • Response completeness
  • Format validation
  • User feedback (when provided)
  • Latency and error rates
You can also set quality thresholds to automatically retry low-quality responses.

Can I provide feedback on responses?

Yes, you can submit feedback via our API or dashboard. This helps Perf learn your preferences and improve orchestration decisions over time.

Still have questions?

Contact our team at [email protected] or join our community.