Skip to main content

Streaming API Reference

The Streaming API provides real-time, token-by-token responses for building responsive chat interfaces and interactive applications. The streaming format is OpenAI-compatible, using Server-Sent Events (SSE).

Endpoint

POST https://api.withperf.pro/v1/chat/stream

Authentication

Include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY

How It Works

The Streaming API returns responses using Server-Sent Events (SSE), sending text chunks as they’re generated rather than waiting for the complete response.

Benefits

  • Lower perceived latency: Users see responses immediately
  • Better UX: Progressive rendering feels more responsive
  • Real-time feedback: Stop generation early if needed
  • Streaming UI: Perfect for chat interfaces

Request Body

Same as the Chat API, but responses stream incrementally. This includes full support for multimodal content (images, audio, video, documents) in messages.

Example Request

curl -X POST https://api.withperf.pro/v1/chat/stream \
  -H "Authorization: Bearer pk_test_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "max_cost_per_call": 0.01
  }'

Response Format

The response uses OpenAI-compatible Server-Sent Events (SSE) format:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705312200,"model":"claude-sonnet-4-5-20250929","choices":[{"index":0,"delta":{"role":"assistant","content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705312200,"model":"claude-sonnet-4-5-20250929","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705312200,"model":"claude-sonnet-4-5-20250929","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Event Types

Content Chunk

Sent for each token or group of tokens. The first chunk includes the role, subsequent chunks only have content:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1705312200,
  "model": "claude-sonnet-4-5-20250929",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": "text fragment"
      },
      "finish_reason": null
    }
  ]
}

Final Chunk

Sent when generation is complete with finish_reason: "stop":
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1705312200,
  "model": "claude-sonnet-4-5-20250929",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ]
}

Done Signal

After the final chunk, a [DONE] message indicates the stream is complete:
data: [DONE]

Client Implementation

JavaScript/TypeScript

async function streamChat(messages: Message[]) {
  const response = await fetch('https://api.withperf.pro/v1/chat/stream', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ messages }),
  });

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();
  let fullText = '';

  while (true) {
    const { done, value } = await reader!.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const payload = line.slice(6);

        // Check for [DONE] signal
        if (payload === '[DONE]') {
          console.log('Stream complete!');
          break;
        }

        const data = JSON.parse(payload);
        const content = data.choices?.[0]?.delta?.content || '';

        if (content) {
          fullText += content;
          console.log('Partial:', fullText);
        }

        // Check for finish_reason
        if (data.choices?.[0]?.finish_reason === 'stop') {
          console.log('Model:', data.model);
        }
      }
    }
  }

  return fullText;
}

React Hook

import { useState, useCallback } from 'react';

export function useStreamingChat() {
  const [content, setContent] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);
  const [model, setModel] = useState<string | null>(null);

  const streamMessage = useCallback(async (messages: Message[]) => {
    setContent('');
    setIsStreaming(true);
    setModel(null);

    try {
      const response = await fetch('/api/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ messages }),
      });

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader!.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n');

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const payload = line.slice(6);
            if (payload === '[DONE]') break;

            const data = JSON.parse(payload);
            const deltaContent = data.choices?.[0]?.delta?.content || '';

            if (deltaContent) {
              setContent(prev => prev + deltaContent);
            }

            if (data.choices?.[0]?.finish_reason === 'stop') {
              setModel(data.model);
            }
          }
        }
      }
    } finally {
      setIsStreaming(false);
    }
  }, []);

  return { content, isStreaming, model, streamMessage };
}

Python

import requests
import json

def stream_chat(messages):
    url = "https://api.withperf.pro/v1/chat/stream"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {"messages": messages}

    with requests.post(url, json=payload, headers=headers, stream=True) as response:
        full_text = ""

        for line in response.iter_lines():
            if line:
                line = line.decode('utf-8')
                if line.startswith('data: '):
                    payload = line[6:]

                    # Check for [DONE] signal
                    if payload == '[DONE]':
                        print("\n\nStream complete!")
                        break

                    data = json.loads(payload)
                    content = data.get('choices', [{}])[0].get('delta', {}).get('content', '')

                    if content:
                        full_text += content
                        print(content, end='', flush=True)

                    # Check for finish_reason
                    if data.get('choices', [{}])[0].get('finish_reason') == 'stop':
                        print(f"\n\nModel: {data.get('model')}")

        return full_text

# Usage
messages = [{"role": "user", "content": "Tell me a story"}]
result = stream_chat(messages)

Python Async

import aiohttp
import asyncio
import json

async def stream_chat_async(messages):
    url = "https://api.withperf.pro/v1/chat/stream"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {"messages": messages}

    async with aiohttp.ClientSession() as session:
        async with session.post(url, json=payload, headers=headers) as response:
            async for line in response.content:
                line = line.decode('utf-8').strip()
                if line.startswith('data: '):
                    payload = line[6:]

                    if payload == '[DONE]':
                        yield {'done': True}
                        break

                    data = json.loads(payload)
                    content = data.get('choices', [{}])[0].get('delta', {}).get('content', '')

                    if content:
                        yield content

# Usage
async def main():
    messages = [{"role": "user", "content": "Explain AI"}]
    async for chunk in stream_chat_async(messages):
        if isinstance(chunk, str):
            print(chunk, end='', flush=True)
        else:
            print("\n\nDone!")

asyncio.run(main())

Go

package main

import (
    "bufio"
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
    "strings"
)

type StreamChunk struct {
    Choices []struct {
        Delta struct {
            Content string `json:"content"`
        } `json:"delta"`
        FinishReason *string `json:"finish_reason"`
    } `json:"choices"`
    Model string `json:"model"`
}

func streamChat(messages []Message) error {
    payload, _ := json.Marshal(map[string]interface{}{
        "messages": messages,
    })

    req, _ := http.NewRequest("POST",
        "https://api.withperf.pro/v1/chat/stream",
        bytes.NewBuffer(payload))

    req.Header.Set("Authorization", "Bearer "+API_KEY)
    req.Header.Set("Content-Type", "application/json")

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    scanner := bufio.NewScanner(resp.Body)
    for scanner.Scan() {
        line := scanner.Text()
        if strings.HasPrefix(line, "data: ") {
            payload := line[6:]

            // Check for [DONE] signal
            if payload == "[DONE]" {
                fmt.Println("\nDone!")
                break
            }

            var chunk StreamChunk
            json.Unmarshal([]byte(payload), &chunk)

            if len(chunk.Choices) > 0 {
                content := chunk.Choices[0].Delta.Content
                if content != "" {
                    fmt.Print(content)
                }
            }
        }
    }

    return scanner.Err()
}

React Component Example

'use client';

import { useState } from 'react';

export default function StreamingChat() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState('');
  const [streamingContent, setStreamingContent] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  const sendMessage = async () => {
    if (!input.trim()) return;

    const userMessage = { role: 'user', content: input };
    setMessages(prev => [...prev, userMessage]);
    setInput('');
    setIsStreaming(true);
    setStreamingContent('');

    try {
      const response = await fetch('/api/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          messages: [...messages, userMessage],
        }),
      });

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();
      let fullContent = '';

      while (true) {
        const { done, value } = await reader!.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n');

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const payload = line.slice(6);

            // Check for [DONE] signal
            if (payload === '[DONE]') {
              setMessages(prev => [
                ...prev,
                { role: 'assistant', content: fullContent }
              ]);
              setStreamingContent('');
              break;
            }

            const data = JSON.parse(payload);
            const deltaContent = data.choices?.[0]?.delta?.content || '';

            if (deltaContent) {
              fullContent += deltaContent;
              setStreamingContent(fullContent);
            }
          }
        }
      }
    } catch (error) {
      console.error('Streaming error:', error);
    } finally {
      setIsStreaming(false);
    }
  };

  return (
    <div className="flex flex-col h-screen">
      <div className="flex-1 overflow-y-auto p-4">
        {messages.map((msg, idx) => (
          <div key={idx} className="mb-4">
            <div className="font-bold">{msg.role}:</div>
            <div>{msg.content}</div>
          </div>
        ))}

        {streamingContent && (
          <div className="mb-4">
            <div className="font-bold">assistant:</div>
            <div className="animate-pulse">{streamingContent}</div>
          </div>
        )}
      </div>

      <div className="border-t p-4">
        <input
          type="text"
          value={input}
          onChange={(e) => setInput(e.target.value)}
          onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
          disabled={isStreaming}
          className="w-full border rounded px-3 py-2"
          placeholder="Type a message..."
        />
      </div>
    </div>
  );
}

Error Handling

Connection Errors

try {
  const response = await fetch(url, { ... });

  if (!response.ok) {
    throw new Error(`HTTP ${response.status}: ${response.statusText}`);
  }

  if (!response.body) {
    throw new Error('No response body');
  }

  // Stream processing...
} catch (error) {
  console.error('Streaming failed:', error);
  // Fallback to non-streaming API
  const fallback = await fetch('/v1/chat', { ... });
}

Timeout Handling

const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 30000); // 30s timeout

try {
  const response = await fetch(url, {
    signal: controller.signal,
    ...
  });
  // Process stream...
} catch (error) {
  if (error.name === 'AbortError') {
    console.error('Request timed out');
  }
} finally {
  clearTimeout(timeout);
}

Performance Optimization

Chunking Strategy

Perf optimizes chunk size for balance between latency and throughput:
  • Small prompts: Sends tokens individually for fastest perceived speed
  • Large generations: Batches tokens for network efficiency
  • Adaptive: Adjusts based on connection quality

Buffering

For smoother UI updates, buffer chunks:
let buffer = '';
let lastUpdate = Date.now();

// In your streaming loop:
buffer += data.chunk;

const now = Date.now();
if (now - lastUpdate > 50) { // Update every 50ms
  setContent(prev => prev + buffer);
  buffer = '';
  lastUpdate = now;
}

Rate Limits

Same limits as the Chat API:
TierRequests/MinuteConcurrent Streams
Free603
Pro30010
EnterpriseCustomCustom

Best Practices

1. Show Loading State

{isStreaming && (
  <div className="flex items-center gap-2">
    <Spinner />
    <span>Generating response...</span>
  </div>
)}

2. Handle Stream Interruption

Allow users to stop generation:
const abortController = new AbortController();

// Cancel button handler
const handleCancel = () => {
  abortController.abort();
  setIsStreaming(false);
};

// Pass to fetch
fetch(url, { signal: abortController.signal, ... });

3. Graceful Degradation

Fall back to non-streaming if not supported:
const supportsStreaming = 'ReadableStream' in window;

if (supportsStreaming) {
  // Use streaming API
} else {
  // Use regular Chat API
}

4. Optimize for Mobile

Consider connection quality:
// Detect slow connections
const connection = (navigator as any).connection;
const isSlowConnection = connection?.effectiveType === '2g' ||
                         connection?.effectiveType === 'slow-2g';

if (isSlowConnection) {
  // Use regular API or increase buffer size
}

Comparison: Streaming vs Non-Streaming

FeatureStreamingNon-Streaming
First token latency~200ms~2-5s
Perceived speedImmediateDelayed
Implementation complexityMediumLow
Network efficiencySameSame
Error recoveryMore complexSimple
Best forChat UIs, long responsesBatch processing, short responses

Support