Skip to main content

Streaming API Reference

The Streaming API provides real-time, token-by-token responses for building responsive chat interfaces and interactive applications.

Endpoint

POST https://api.withperf.pro/v1/chat/stream

Authentication

Include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY

How It Works

The Streaming API returns responses using Server-Sent Events (SSE), sending text chunks as they’re generated rather than waiting for the complete response.

Benefits

  • Lower perceived latency: Users see responses immediately
  • Better UX: Progressive rendering feels more responsive
  • Real-time feedback: Stop generation early if needed
  • Streaming UI: Perfect for chat interfaces

Request Body

Same as the Chat API, but responses stream incrementally.

Example Request

curl -X POST https://api.withperf.pro/v1/chat/stream \
  -H "Authorization: Bearer pk_test_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "max_cost_per_call": 0.01
  }'

Response Format

The response uses Server-Sent Events (SSE) with the following format:
data: {"chunk": "Quantum", "done": false}

data: {"chunk": " computing", "done": false}

data: {"chunk": " uses", "done": false}

data: {"chunk": " quantum", "done": false}

data: {"chunk": " mechanics...", "done": false}

data: {
  "chunk": "",
  "done": true,
  "model_used": "claude-sonnet-4-5",
  "billing": {
    "cost_usd": 0.00234
  },
  "tokens": {
    "input": 15,
    "output": 156,
    "total": 171
  },
  "metadata": {
    "call_id": "call_xyz789",
    "task_type": "writing",
    "latency_ms": 1823,
    "timestamp": "2024-01-15T10:30:00Z"
  }
}

Event Types

Content Chunk (done: false)

Sent for each token or group of tokens:
{
  "chunk": "text fragment",
  "done": false
}

Final Event (done: true)

Sent when generation is complete with full metadata:
{
  "chunk": "",
  "done": true,
  "model_used": "gpt-4o",
  "billing": { "cost_usd": 0.00123 },
  "tokens": { "input": 20, "output": 100, "total": 120 },
  "metadata": { ... }
}

Client Implementation

JavaScript/TypeScript

async function streamChat(messages: Message[]) {
  const response = await fetch('https://api.withperf.pro/v1/chat/stream', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ messages }),
  });

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();
  let fullText = '';

  while (true) {
    const { done, value } = await reader!.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = JSON.parse(line.slice(6));

        if (!data.done) {
          fullText += data.chunk;
          console.log('Partial:', fullText);
        } else {
          console.log('Complete!');
          console.log('Model:', data.model_used);
          console.log('Cost:', data.billing.cost_usd);
        }
      }
    }
  }

  return fullText;
}

React Hook

import { useState, useCallback } from 'react';

export function useStreamingChat() {
  const [content, setContent] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);
  const [metadata, setMetadata] = useState<any>(null);

  const streamMessage = useCallback(async (messages: Message[]) => {
    setContent('');
    setIsStreaming(true);

    try {
      const response = await fetch('/api/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ messages }),
      });

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader!.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n');

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const data = JSON.parse(line.slice(6));

            if (!data.done) {
              setContent(prev => prev + data.chunk);
            } else {
              setMetadata({
                model: data.model_used,
                cost: data.billing.cost_usd,
                tokens: data.tokens,
              });
            }
          }
        }
      }
    } finally {
      setIsStreaming(false);
    }
  }, []);

  return { content, isStreaming, metadata, streamMessage };
}

Python

import requests
import json

def stream_chat(messages):
    url = "https://api.withperf.pro/v1/chat/stream"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {"messages": messages}

    with requests.post(url, json=payload, headers=headers, stream=True) as response:
        full_text = ""

        for line in response.iter_lines():
            if line:
                line = line.decode('utf-8')
                if line.startswith('data: '):
                    data = json.loads(line[6:])

                    if not data.get('done'):
                        chunk = data.get('chunk', '')
                        full_text += chunk
                        print(chunk, end='', flush=True)
                    else:
                        print(f"\n\nModel: {data['model_used']}")
                        print(f"Cost: ${data['billing']['cost_usd']:.5f}")
                        print(f"Tokens: {data['tokens']['total']}")

        return full_text

# Usage
messages = [{"role": "user", "content": "Tell me a story"}]
result = stream_chat(messages)

Python Async

import aiohttp
import asyncio
import json

async def stream_chat_async(messages):
    url = "https://api.withperf.pro/v1/chat/stream"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {"messages": messages}

    async with aiohttp.ClientSession() as session:
        async with session.post(url, json=payload, headers=headers) as response:
            full_text = ""

            async for line in response.content:
                line = line.decode('utf-8').strip()
                if line.startswith('data: '):
                    data = json.loads(line[6:])

                    if not data.get('done'):
                        chunk = data.get('chunk', '')
                        full_text += chunk
                        yield chunk
                    else:
                        yield {
                            'done': True,
                            'metadata': data
                        }

# Usage
async def main():
    messages = [{"role": "user", "content": "Explain AI"}]
    async for chunk in stream_chat_async(messages):
        if isinstance(chunk, str):
            print(chunk, end='', flush=True)
        else:
            print(f"\n\nDone! Cost: ${chunk['metadata']['billing']['cost_usd']:.5f}")

asyncio.run(main())

Go

package main

import (
    "bufio"
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
    "strings"
)

type StreamChunk struct {
    Chunk string `json:"chunk"`
    Done  bool   `json:"done"`
}

func streamChat(messages []Message) error {
    payload, _ := json.Marshal(map[string]interface{}{
        "messages": messages,
    })

    req, _ := http.NewRequest("POST",
        "https://api.withperf.pro/v1/chat/stream",
        bytes.NewBuffer(payload))

    req.Header.Set("Authorization", "Bearer "+API_KEY)
    req.Header.Set("Content-Type", "application/json")

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    scanner := bufio.NewScanner(resp.Body)
    for scanner.Scan() {
        line := scanner.Text()
        if strings.HasPrefix(line, "data: ") {
            var chunk StreamChunk
            json.Unmarshal([]byte(line[6:]), &chunk)

            if !chunk.Done {
                fmt.Print(chunk.Chunk)
            } else {
                fmt.Println("\nDone!")
            }
        }
    }

    return scanner.Err()
}

React Component Example

'use client';

import { useState } from 'react';

export default function StreamingChat() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState('');
  const [streamingContent, setStreamingContent] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  const sendMessage = async () => {
    if (!input.trim()) return;

    const userMessage = { role: 'user', content: input };
    setMessages(prev => [...prev, userMessage]);
    setInput('');
    setIsStreaming(true);
    setStreamingContent('');

    try {
      const response = await fetch('/api/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          messages: [...messages, userMessage],
        }),
      });

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();
      let fullContent = '';

      while (true) {
        const { done, value } = await reader!.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n');

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const data = JSON.parse(line.slice(6));

            if (!data.done) {
              fullContent += data.chunk;
              setStreamingContent(fullContent);
            } else {
              // Streaming complete
              setMessages(prev => [
                ...prev,
                { role: 'assistant', content: fullContent }
              ]);
              setStreamingContent('');
            }
          }
        }
      }
    } catch (error) {
      console.error('Streaming error:', error);
    } finally {
      setIsStreaming(false);
    }
  };

  return (
    <div className="flex flex-col h-screen">
      <div className="flex-1 overflow-y-auto p-4">
        {messages.map((msg, idx) => (
          <div key={idx} className="mb-4">
            <div className="font-bold">{msg.role}:</div>
            <div>{msg.content}</div>
          </div>
        ))}

        {streamingContent && (
          <div className="mb-4">
            <div className="font-bold">assistant:</div>
            <div className="animate-pulse">{streamingContent}</div>
          </div>
        )}
      </div>

      <div className="border-t p-4">
        <input
          type="text"
          value={input}
          onChange={(e) => setInput(e.target.value)}
          onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
          disabled={isStreaming}
          className="w-full border rounded px-3 py-2"
          placeholder="Type a message..."
        />
      </div>
    </div>
  );
}

Error Handling

Connection Errors

try {
  const response = await fetch(url, { ... });

  if (!response.ok) {
    throw new Error(`HTTP ${response.status}: ${response.statusText}`);
  }

  if (!response.body) {
    throw new Error('No response body');
  }

  // Stream processing...
} catch (error) {
  console.error('Streaming failed:', error);
  // Fallback to non-streaming API
  const fallback = await fetch('/v1/chat', { ... });
}

Timeout Handling

const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 30000); // 30s timeout

try {
  const response = await fetch(url, {
    signal: controller.signal,
    ...
  });
  // Process stream...
} catch (error) {
  if (error.name === 'AbortError') {
    console.error('Request timed out');
  }
} finally {
  clearTimeout(timeout);
}

Performance Optimization

Chunking Strategy

Perf optimizes chunk size for balance between latency and throughput:
  • Small prompts: Sends tokens individually for fastest perceived speed
  • Large generations: Batches tokens for network efficiency
  • Adaptive: Adjusts based on connection quality

Buffering

For smoother UI updates, buffer chunks:
let buffer = '';
let lastUpdate = Date.now();

// In your streaming loop:
buffer += data.chunk;

const now = Date.now();
if (now - lastUpdate > 50) { // Update every 50ms
  setContent(prev => prev + buffer);
  buffer = '';
  lastUpdate = now;
}

Rate Limits

Same limits as the Chat API:
TierRequests/MinuteConcurrent Streams
Free603
Pro30010
EnterpriseCustomCustom

Best Practices

1. Show Loading State

{isStreaming && (
  <div className="flex items-center gap-2">
    <Spinner />
    <span>Generating response...</span>
  </div>
)}

2. Handle Stream Interruption

Allow users to stop generation:
const abortController = new AbortController();

// Cancel button handler
const handleCancel = () => {
  abortController.abort();
  setIsStreaming(false);
};

// Pass to fetch
fetch(url, { signal: abortController.signal, ... });

3. Graceful Degradation

Fall back to non-streaming if not supported:
const supportsStreaming = 'ReadableStream' in window;

if (supportsStreaming) {
  // Use streaming API
} else {
  // Use regular Chat API
}

4. Optimize for Mobile

Consider connection quality:
// Detect slow connections
const connection = (navigator as any).connection;
const isSlowConnection = connection?.effectiveType === '2g' ||
                         connection?.effectiveType === 'slow-2g';

if (isSlowConnection) {
  // Use regular API or increase buffer size
}

Comparison: Streaming vs Non-Streaming

FeatureStreamingNon-Streaming
First token latency~200ms~2-5s
Perceived speedImmediateDelayed
Implementation complexityMediumLow
Network efficiencySameSame
Error recoveryMore complexSimple
Best forChat UIs, long responsesBatch processing, short responses

Support