Frequently Asked Questions
General
What is Perf?
Perf is an AI runtime orchestrator that sits between your application and LLM providers. We automatically select the optimal model for each request based on your cost, quality, and reliability requirements.How does Perf reduce costs?
Perf analyzes each request and intelligently selects the most cost-effective model that can handle it. Simple queries get sent to cheaper models like GPT-4o Mini, while complex tasks use more powerful models. This typically reduces costs by 40-60% compared to using a single premium model.Which LLM providers does Perf support?
Perf integrates with all major LLM providers:- OpenAI (GPT-4o, GPT-4o Mini, o1, o1-mini)
- Anthropic (Claude Sonnet 4.5, Claude Haiku 4.5)
- Google (Gemini 1.5 Pro, Gemini 1.5 Flash)
- Meta (Llama 3.1 405B, 70B)
- Mistral AI (Mistral Large)
- Alibaba (Qwen 2.5 72B)
Is Perf compatible with the OpenAI API?
Yes, Perf is fully OpenAI-compatible. You can replace your OpenAI base URL with Perf’s endpoint and everything will work seamlessly, with added benefits of cost optimization and quality control.Pricing & Billing
How does Perf pricing work?
You only pay for the actual model usage. Perf charges the standard rate of whichever model we select for your request, with no markup. We make money through our enterprise plans with additional features.Is there a free tier?
Yes, we offer a free tier with 10,000 requests per month to get started. Perfect for testing and small projects.Can I set cost limits?
Yes, you can set cost limits at multiple levels:- Per-request maximum (
max_cost_per_callparameter) - Daily/monthly account limits in the dashboard
- Team-wide budgets for enterprise plans
Technical
What’s the latency overhead?
Perf adds minimal latency overhead (typically 20-50ms) for orchestration decisions. Our intelligent caching and model selection algorithms are optimized for speed.Do you support streaming?
Yes, Perf fully supports streaming responses using Server-Sent Events (SSE), just like the OpenAI API.Can I use specific models?
Yes, you can override the automatic orchestration by specifying amodel parameter in your request. This gives you full control when needed.
Do you support multimodal inputs?
Yes, Perf supports images, PDFs, and other multimodal inputs. The system will automatically select models that support the input format.How do you ensure data privacy?
We are SOC 2 Type II compliant and never store your prompt data. All requests are proxied directly to the provider and only metadata (model used, tokens, cost) is logged.Platform Features
What analytics do you provide?
The Perf dashboard provides:- Real-time request monitoring
- Cost breakdown by model and task type
- Performance metrics (latency, success rate)
- Quality scores and user feedback
- ROI tracking and savings reports
Can multiple team members access the same account?
Yes, our Team and Enterprise plans support multiple users with role-based access control. You can set different permissions for developers, analysts, and administrators.Do you offer SLA guarantees?
Yes, our paid plans include 99.9% uptime SLA with automatic failover across providers to ensure reliability.Can I export my data?
Yes, you can export all logs, metrics, and analytics data via our API or through the dashboard. We support JSON, CSV, and webhook integrations.Getting Started
How long does integration take?
Most teams are up and running in under 30 minutes. If you’re already using OpenAI, it’s as simple as changing your base URL and adding your Perf API key.Do you provide migration support?
Yes, our team provides migration support for all paid plans. We’ll help you migrate from your existing LLM setup and optimize your configuration.Can I test Perf before committing?
Absolutely. Sign up for our free tier and test with your actual use cases. No credit card required.Where can I get help?
- Documentation: docs.withperf.pro
- Email Support: [email protected]
- Enterprise Support: Available 24/7 for enterprise customers
Troubleshooting
What if a request fails?
Perf includes automatic retry logic with intelligent fallback. If a request fails with one provider, we automatically retry with an alternative provider and model.How do I monitor quality?
The dashboard provides quality scoring based on:- Response completeness
- Format validation
- User feedback (when provided)
- Latency and error rates