Skip to main content

Analytics & Insights

Deep-dive analytics to optimize your LLM usage, reduce costs, and improve quality.

Overview

The Analytics page provides advanced metrics beyond the basic dashboard, including:
  • Model performance comparisons
  • Task-specific optimization opportunities
  • Cost breakdown and projections
  • Quality and reliability metrics
  • Provider health monitoring
  • Custom queries and reports
Access at: withperf.pro/analytics

Performance Analytics

Model Performance Matrix

Compare all models across key dimensions:
ModelCallsAvg CostAvg LatencySuccess RateQuality Score
GPT-4o Mini23,456$0.00234834ms98.7%0.92
Claude Sonnet 4.518,234$0.008761,456ms99.1%0.95
Claude Haiku 4.56,789$0.00098423ms99.2%0.91
GPT-4o3,988$0.012341,876ms98.3%0.94
Insights:
  • Color-coded cells (green = best, yellow = good, red = attention needed)
  • Click column headers to sort
  • Filterable by date range and task type

Task Type Deep-Dive

Analyze performance by specific task types:

Extraction Tasks

Total Calls: 15,678
Average Cost: $0.00123
Success Rate: 99.1%

Top Models:
1. Claude Haiku 4.5 - Best cost/quality (8,234 calls)
2. GPT-4o Mini - Good alternative (6,123 calls)
3. Claude Sonnet 4.5 - Overkill (1,321 calls) ⚠️

Recommendation: Route 90% to Claude Haiku
Potential Savings: $23.45/month

Reasoning Tasks

Total Calls: 4,567
Average Cost: $0.00987
Success Rate: 95.6%

Top Models:
1. Claude Sonnet 4.5 - Best quality (3,234 calls)
2. GPT-4o - Expensive alternative (987 calls) ⚠️
3. GPT-4o Mini - Insufficient (346 calls, 87.2% success) ❌

Recommendation: Route all to Claude Sonnet
Quality Improvement: +4.2%

Latency Analysis

Percentile Distribution

P50 (Median):  1,234ms
P90:           2,345ms
P95:           3,456ms
P99:           5,678ms
Chart: Histogram showing latency distribution

By Model

Claude Haiku 4.5:    423ms  (fastest) ⚡
GPT-4o Mini:         834ms
Claude Sonnet 4.5:  1,456ms
GPT-4o:             1,876ms (slowest)

By Task Type

Classification:    567ms  ⚡
Extraction:        834ms
Summarization:   1,234ms
Writing:         1,567ms
Reasoning:       1,876ms
Code:            2,123ms

Cost Analytics

Cost Breakdown

By Model (Last 30 Days)

Total Spend: $234.56

GPT-4o Mini:       $87.23  (37.2%) ████████████
Claude Sonnet 4.5: $123.45 (52.6%) █████████████████
GPT-4o:            $19.87  (8.5%)  ███
Claude Haiku 4.5:  $4.01   (1.7%)  █

By Task Type

Extraction:      $45.67  (19.5%)
Classification:  $38.92  (16.6%)
Summarization:   $52.34  (22.3%)
Reasoning:       $67.89  (28.9%)
Code:            $18.45  (7.9%)
Writing:         $11.29  (4.8%)
Line chart showing daily costs over time:
  • Trend line: Showing upward/downward trajectory
  • Annotations: Mark deployments, feature launches
  • Forecasting: Projected month-end cost

Budget Tracking

Monthly Budget: $500.00
Current Spend:  $234.56 (46.9%)
Days Elapsed:   15/30 (50%)

Status: ✅ On track
Projected Month End: $469.12
Buffer: $30.88 (6.2%)
Alert Thresholds:
  • 🟡 Warning at 80% ($400)
  • 🔴 Critical at 95% ($475)

Cost Optimization Opportunities

AI-powered recommendations to reduce spend:
Opportunity #1: Switch Extraction to Claude Haiku
  Current: Using GPT-4o Mini (45% of extraction calls)
  Impact: Save $23.45/month (10.1%)
  Risk: None (equal or better quality)

Opportunity #2: Reduce Summarization Context
  Current: Average 2,345 input tokens
  Impact: Save $12.34/month (5.3%)
  Action: Truncate to 1,500 tokens
  Risk: Low (minimal quality impact)

Opportunity #3: Batch Similar Requests
  Current: Processing individually
  Impact: Save $8.90/month (3.8%)
  Action: Batch extraction calls
  Risk: None (adds ~50ms latency)

Quality Analytics

Validation Metrics

Total Calls: 45,678
Validation Passed: 44,567 (97.6%)
Retry Required: 1,045 (2.3%)
Fallback Used: 556 (1.2%)
Failed: 66 (0.1%)
Chart: Funnel showing validation flow

Failure Mode Analysis

Format Violations:      489 (44.1%) ████████████████
Refusals:              312 (28.1%) ██████████
Hallucinations:        189 (17.0%) ██████
Incomplete Outputs:    145 (13.1%) █████
Reasoning Errors:      112 (10.1%) ████
Drill-down:
  • Click each failure mode to see examples
  • Identify common patterns
  • View which models fail most often

Quality Scores

Distribution of quality scores across calls:
0.90 - 1.00:  38,234 (83.7%) ████████████████████████
0.80 - 0.89:   6,123 (13.4%) █████
0.70 - 0.79:     987 (2.2%)  █
0.60 - 0.69:     234 (0.5%)
< 0.60:          100 (0.2%)

User Feedback Analysis

If you’ve integrated user feedback:
Total Feedback: 2,345
Positive: 2,012 (85.8%) 👍
Negative: 333 (14.2%) 👎

Average Rating: 4.2/5.0 ⭐⭐⭐⭐

Common Issues (from negative feedback):
1. Incomplete responses (34%)
2. Incorrect data extraction (28%)
3. Slow response time (22%)
4. Refused valid requests (16%)

Provider Health

Real-Time Status

OpenAI
  Status: ✅ Operational
  Uptime: 99.8% (last 30 days)
  Avg Latency: 1,234ms
  Error Rate: 0.2%

Anthropic
  Status: ✅ Operational
  Uptime: 99.9% (last 30 days)
  Avg Latency: 1,567ms
  Error Rate: 0.1%

Historical Reliability

Chart showing uptime over the past 90 days:
        Week 1  Week 2  Week 3  Week 4
OpenAI    99.9%   99.7%   99.8%   99.9%
Anthropic 99.9%   99.9%   99.8%   99.9%

Incident Timeline

Jan 28, 2024 14:32-14:47 UTC
  Provider: OpenAI
  Issue: Elevated latency (3x normal)
  Impact: 234 calls affected
  Mitigation: Auto-failover to Anthropic
  Resolution: Provider resolved

Jan 15, 2024 09:15-09:23 UTC
  Provider: Anthropic
  Issue: Rate limit errors
  Impact: 45 calls affected
  Mitigation: Auto-retry with backoff
  Resolution: Provider increased limits

Custom Reports

Report Builder

Create custom analytics views:
  1. Select Metrics
    • Choose from 50+ available metrics
    • Combine multiple metrics
  2. Add Dimensions
    • Group by: model, task type, date, user_id, etc.
    • Multiple grouping levels
  3. Apply Filters
    • Date range
    • Model
    • Task type
    • Cost range
    • Success/failure
    • Custom metadata
  4. Choose Visualization
    • Line chart
    • Bar chart
    • Pie chart
    • Table
    • Heatmap
  5. Save & Schedule
    • Save report configuration
    • Schedule email delivery
    • Export to CSV/PDF

Example Custom Reports

Cost by Feature
Dimensions: custom.feature, date
Metrics: total_cost_usd, call_count
Filters: last 30 days
User Behavior Analysis
Dimensions: user_id, task_type
Metrics: avg_latency, success_rate, total_calls
Filters: success_only = true
Model Performance by Complexity
Dimensions: model, complexity_score (bucketed)
Metrics: avg_quality_score, success_rate

Insights & Recommendations

AI-Powered Insights

Perf automatically analyzes your usage and surfaces insights:

🎯 Optimization Opportunities

"Your extraction tasks are using GPT-4o Mini, but Claude Haiku
would be 48% cheaper with equal quality. Switch to save $23/month."

Action: [Apply Recommendation] [Dismiss]

📊 Usage Patterns

"Your API usage spikes between 2-4pm UTC. Consider implementing
request batching during peak hours to reduce costs by ~12%."

Action: [Learn More] [Dismiss]

⚠️ Quality Degradation

"Success rate for 'reasoning' tasks has dropped from 97% to 93%
over the past week. Consider upgrading to Claude Sonnet."

Action: [View Details] [Dismiss]

💰 Budget Alerts

"You're on track to spend $487 this month, exceeding your $500
budget. Reduce costs by routing extraction to Claude Haiku."

Action: [View Optimization] [Adjust Budget]

Weekly Summary

Delivered via email every Monday:
Subject: Your Perf Weekly Summary

Key Metrics (vs. last week):
- Total Calls: 8,234 (↑ 12.3%)
- Total Cost: $42.34 (↓ 5.7%)
- Avg Latency: 1,234ms (↓ 8.2%)
- Success Rate: 98.7% (↑ 1.2%)

Top Insight:
Switching extraction tasks to Claude Haiku could save $23/month.

[View Full Report]

Cohort Analysis

Analyze usage patterns by user cohorts:

By User Segment

Enterprise Customers:
  Avg calls/day: 2,345
  Preferred models: Claude Sonnet, GPT-4o
  Cost sensitivity: Low
  Quality preference: High

Startup Customers:
  Avg calls/day: 234
  Preferred models: GPT-4o Mini, Claude Haiku
  Cost sensitivity: High
  Quality preference: Medium

By Feature

Track which features drive the most usage:
Feature          Calls    Cost    Avg Latency
Chat Support    12,345   $67.89   1,234ms
Data Extract     8,234   $23.45     834ms
Summarizer       5,678   $45.67   1,567ms
Code Helper      3,456   $34.56   2,123ms

A/B Testing Results

View results of routing experiments:
Experiment: Claude Haiku vs GPT-4o Mini for Extraction

Control (GPT-4o Mini):
  Calls: 5,000
  Avg Cost: $0.00234
  Success Rate: 98.5%
  Avg Quality: 0.92

Treatment (Claude Haiku):
  Calls: 5,000
  Avg Cost: $0.00098 (↓ 58.1%) ✅
  Success Rate: 99.1% (↑ 0.6%) ✅
  Avg Quality: 0.94 (↑ 2.2%) ✅

Result: Statistically significant (p < 0.01)
Recommendation: Roll out to 100% of traffic

[Apply to All] [Run Longer] [Dismiss]

Data Export

Bulk Export Options

Export analytics data for external analysis:
  1. CSV Export
    • All metrics for selected time range
    • Filterable by dimensions
    • Max 1M rows (Enterprise)
  2. JSON Export
    • Structured format for programmatic access
    • Includes metadata and nested objects
  3. BigQuery/Snowflake Sync (Enterprise)
    • Automatic daily sync
    • Query in your data warehouse
    • Join with your business data
  4. API Access
    • Use Metrics API
    • Real-time programmatic access
    • Build custom dashboards

Best Practices

Daily Review

  • Check Insights for new recommendations
  • Verify no quality degradation alerts
  • Monitor budget tracking status

Weekly Analysis

  • Review Cost Breakdown by task type
  • Analyze Performance Matrix for optimization
  • Check Provider Health trends

Monthly Planning

  • Export Custom Report for stakeholders
  • Review A/B Test results
  • Adjust Routing Rules based on data

Keyboard Shortcuts

ShortcutAction
COpen Cost Analytics
POpen Performance Analytics
QOpen Quality Analytics
IView Insights
EExport current view
TChange time range
FOpen filter menu

Next Steps

Support