> ## Documentation Index
> Fetch the complete documentation index at: https://docs.withperf.pro/llms.txt
> Use this file to discover all available pages before exploring further.

# Streaming API

> Real-time streaming API documentation

# Streaming API Reference

The Streaming API provides real-time, token-by-token responses for building responsive chat interfaces and interactive applications.

The streaming format is **OpenAI-compatible**, using Server-Sent Events (SSE).

## Endpoint

```
POST https://api.withperf.pro/v1/chat/stream
```

## Authentication

Include your API key in the Authorization header:

```
Authorization: Bearer YOUR_API_KEY
```

## How It Works

The Streaming API returns responses using Server-Sent Events (SSE), sending text chunks as they're generated rather than waiting for the complete response.

### Benefits

* **Lower perceived latency**: Users see responses immediately
* **Better UX**: Progressive rendering feels more responsive
* **Real-time feedback**: Stop generation early if needed
* **Streaming UI**: Perfect for chat interfaces

## Request Body

Same as the [Chat API](./chat), but responses stream incrementally. This includes full support for multimodal content (images, audio, video, documents) in messages.

### Example Request

```bash theme={null}
curl -X POST https://api.withperf.pro/v1/chat/stream \
  -H "Authorization: Bearer pk_test_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "max_cost_per_call": 0.01
  }'
```

## Response Format

The response uses **OpenAI-compatible Server-Sent Events (SSE)** format:

```
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705312200,"model":"claude-sonnet-4-5-20250929","choices":[{"index":0,"delta":{"role":"assistant","content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705312200,"model":"claude-sonnet-4-5-20250929","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705312200,"model":"claude-sonnet-4-5-20250929","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]
```

## Event Types

### Content Chunk

Sent for each token or group of tokens. The first chunk includes the `role`, subsequent chunks only have `content`:

```json theme={null}
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1705312200,
  "model": "claude-sonnet-4-5-20250929",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": "text fragment"
      },
      "finish_reason": null
    }
  ]
}
```

### Final Chunk

Sent when generation is complete with `finish_reason: "stop"`:

```json theme={null}
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1705312200,
  "model": "claude-sonnet-4-5-20250929",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ]
}
```

### Done Signal

After the final chunk, a `[DONE]` message indicates the stream is complete:

```
data: [DONE]
```

## Client Implementation

### JavaScript/TypeScript

```typescript theme={null}
async function streamChat(messages: Message[]) {
  const response = await fetch('https://api.withperf.pro/v1/chat/stream', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ messages }),
  });

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();
  let fullText = '';

  while (true) {
    const { done, value } = await reader!.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const payload = line.slice(6);

        // Check for [DONE] signal
        if (payload === '[DONE]') {
          console.log('Stream complete!');
          break;
        }

        const data = JSON.parse(payload);
        const content = data.choices?.[0]?.delta?.content || '';

        if (content) {
          fullText += content;
          console.log('Partial:', fullText);
        }

        // Check for finish_reason
        if (data.choices?.[0]?.finish_reason === 'stop') {
          console.log('Model:', data.model);
        }
      }
    }
  }

  return fullText;
}
```

### React Hook

```typescript theme={null}
import { useState, useCallback } from 'react';

export function useStreamingChat() {
  const [content, setContent] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);
  const [model, setModel] = useState<string | null>(null);

  const streamMessage = useCallback(async (messages: Message[]) => {
    setContent('');
    setIsStreaming(true);
    setModel(null);

    try {
      const response = await fetch('/api/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ messages }),
      });

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader!.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n');

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const payload = line.slice(6);
            if (payload === '[DONE]') break;

            const data = JSON.parse(payload);
            const deltaContent = data.choices?.[0]?.delta?.content || '';

            if (deltaContent) {
              setContent(prev => prev + deltaContent);
            }

            if (data.choices?.[0]?.finish_reason === 'stop') {
              setModel(data.model);
            }
          }
        }
      }
    } finally {
      setIsStreaming(false);
    }
  }, []);

  return { content, isStreaming, model, streamMessage };
}
```

### Python

```python theme={null}
import requests
import json

def stream_chat(messages):
    url = "https://api.withperf.pro/v1/chat/stream"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {"messages": messages}

    with requests.post(url, json=payload, headers=headers, stream=True) as response:
        full_text = ""

        for line in response.iter_lines():
            if line:
                line = line.decode('utf-8')
                if line.startswith('data: '):
                    payload = line[6:]

                    # Check for [DONE] signal
                    if payload == '[DONE]':
                        print("\n\nStream complete!")
                        break

                    data = json.loads(payload)
                    content = data.get('choices', [{}])[0].get('delta', {}).get('content', '')

                    if content:
                        full_text += content
                        print(content, end='', flush=True)

                    # Check for finish_reason
                    if data.get('choices', [{}])[0].get('finish_reason') == 'stop':
                        print(f"\n\nModel: {data.get('model')}")

        return full_text

# Usage
messages = [{"role": "user", "content": "Tell me a story"}]
result = stream_chat(messages)
```

### Python Async

```python theme={null}
import aiohttp
import asyncio
import json

async def stream_chat_async(messages):
    url = "https://api.withperf.pro/v1/chat/stream"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {"messages": messages}

    async with aiohttp.ClientSession() as session:
        async with session.post(url, json=payload, headers=headers) as response:
            async for line in response.content:
                line = line.decode('utf-8').strip()
                if line.startswith('data: '):
                    payload = line[6:]

                    if payload == '[DONE]':
                        yield {'done': True}
                        break

                    data = json.loads(payload)
                    content = data.get('choices', [{}])[0].get('delta', {}).get('content', '')

                    if content:
                        yield content

# Usage
async def main():
    messages = [{"role": "user", "content": "Explain AI"}]
    async for chunk in stream_chat_async(messages):
        if isinstance(chunk, str):
            print(chunk, end='', flush=True)
        else:
            print("\n\nDone!")

asyncio.run(main())
```

### Go

```go theme={null}
package main

import (
    "bufio"
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
    "strings"
)

type StreamChunk struct {
    Choices []struct {
        Delta struct {
            Content string `json:"content"`
        } `json:"delta"`
        FinishReason *string `json:"finish_reason"`
    } `json:"choices"`
    Model string `json:"model"`
}

func streamChat(messages []Message) error {
    payload, _ := json.Marshal(map[string]interface{}{
        "messages": messages,
    })

    req, _ := http.NewRequest("POST",
        "https://api.withperf.pro/v1/chat/stream",
        bytes.NewBuffer(payload))

    req.Header.Set("Authorization", "Bearer "+API_KEY)
    req.Header.Set("Content-Type", "application/json")

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    scanner := bufio.NewScanner(resp.Body)
    for scanner.Scan() {
        line := scanner.Text()
        if strings.HasPrefix(line, "data: ") {
            payload := line[6:]

            // Check for [DONE] signal
            if payload == "[DONE]" {
                fmt.Println("\nDone!")
                break
            }

            var chunk StreamChunk
            json.Unmarshal([]byte(payload), &chunk)

            if len(chunk.Choices) > 0 {
                content := chunk.Choices[0].Delta.Content
                if content != "" {
                    fmt.Print(content)
                }
            }
        }
    }

    return scanner.Err()
}
```

## React Component Example

```typescript theme={null}
'use client';

import { useState } from 'react';

export default function StreamingChat() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState('');
  const [streamingContent, setStreamingContent] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  const sendMessage = async () => {
    if (!input.trim()) return;

    const userMessage = { role: 'user', content: input };
    setMessages(prev => [...prev, userMessage]);
    setInput('');
    setIsStreaming(true);
    setStreamingContent('');

    try {
      const response = await fetch('/api/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          messages: [...messages, userMessage],
        }),
      });

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();
      let fullContent = '';

      while (true) {
        const { done, value } = await reader!.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n');

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const payload = line.slice(6);

            // Check for [DONE] signal
            if (payload === '[DONE]') {
              setMessages(prev => [
                ...prev,
                { role: 'assistant', content: fullContent }
              ]);
              setStreamingContent('');
              break;
            }

            const data = JSON.parse(payload);
            const deltaContent = data.choices?.[0]?.delta?.content || '';

            if (deltaContent) {
              fullContent += deltaContent;
              setStreamingContent(fullContent);
            }
          }
        }
      }
    } catch (error) {
      console.error('Streaming error:', error);
    } finally {
      setIsStreaming(false);
    }
  };

  return (
    <div className="flex flex-col h-screen">
      <div className="flex-1 overflow-y-auto p-4">
        {messages.map((msg, idx) => (
          <div key={idx} className="mb-4">
            <div className="font-bold">{msg.role}:</div>
            <div>{msg.content}</div>
          </div>
        ))}

        {streamingContent && (
          <div className="mb-4">
            <div className="font-bold">assistant:</div>
            <div className="animate-pulse">{streamingContent}</div>
          </div>
        )}
      </div>

      <div className="border-t p-4">
        <input
          type="text"
          value={input}
          onChange={(e) => setInput(e.target.value)}
          onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
          disabled={isStreaming}
          className="w-full border rounded px-3 py-2"
          placeholder="Type a message..."
        />
      </div>
    </div>
  );
}
```

## Error Handling

### Connection Errors

```typescript theme={null}
try {
  const response = await fetch(url, { ... });

  if (!response.ok) {
    throw new Error(`HTTP ${response.status}: ${response.statusText}`);
  }

  if (!response.body) {
    throw new Error('No response body');
  }

  // Stream processing...
} catch (error) {
  console.error('Streaming failed:', error);
  // Fallback to non-streaming API
  const fallback = await fetch('/v1/chat', { ... });
}
```

### Timeout Handling

```typescript theme={null}
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 30000); // 30s timeout

try {
  const response = await fetch(url, {
    signal: controller.signal,
    ...
  });
  // Process stream...
} catch (error) {
  if (error.name === 'AbortError') {
    console.error('Request timed out');
  }
} finally {
  clearTimeout(timeout);
}
```

## Performance Optimization

### Chunking Strategy

Perf optimizes chunk size for balance between latency and throughput:

* **Small prompts**: Sends tokens individually for fastest perceived speed
* **Large generations**: Batches tokens for network efficiency
* **Adaptive**: Adjusts based on connection quality

### Buffering

For smoother UI updates, buffer chunks:

```typescript theme={null}
let buffer = '';
let lastUpdate = Date.now();

// In your streaming loop:
buffer += data.chunk;

const now = Date.now();
if (now - lastUpdate > 50) { // Update every 50ms
  setContent(prev => prev + buffer);
  buffer = '';
  lastUpdate = now;
}
```

## Rate Limits

Same limits as the Chat API:

| Tier       | Requests/Minute | Concurrent Streams |
| ---------- | --------------- | ------------------ |
| Free       | 60              | 3                  |
| Pro        | 300             | 10                 |
| Enterprise | Custom          | Custom             |

## Best Practices

### 1. Show Loading State

```typescript theme={null}
{isStreaming && (
  <div className="flex items-center gap-2">
    <Spinner />
    <span>Generating response...</span>
  </div>
)}
```

### 2. Handle Stream Interruption

Allow users to stop generation:

```typescript theme={null}
const abortController = new AbortController();

// Cancel button handler
const handleCancel = () => {
  abortController.abort();
  setIsStreaming(false);
};

// Pass to fetch
fetch(url, { signal: abortController.signal, ... });
```

### 3. Graceful Degradation

Fall back to non-streaming if not supported:

```typescript theme={null}
const supportsStreaming = 'ReadableStream' in window;

if (supportsStreaming) {
  // Use streaming API
} else {
  // Use regular Chat API
}
```

### 4. Optimize for Mobile

Consider connection quality:

```typescript theme={null}
// Detect slow connections
const connection = (navigator as any).connection;
const isSlowConnection = connection?.effectiveType === '2g' ||
                         connection?.effectiveType === 'slow-2g';

if (isSlowConnection) {
  // Use regular API or increase buffer size
}
```

## Comparison: Streaming vs Non-Streaming

| Feature                       | Streaming                | Non-Streaming                     |
| ----------------------------- | ------------------------ | --------------------------------- |
| **First token latency**       | \~200ms                  | \~2-5s                            |
| **Perceived speed**           | Immediate                | Delayed                           |
| **Implementation complexity** | Medium                   | Low                               |
| **Network efficiency**        | Same                     | Same                              |
| **Error recovery**            | More complex             | Simple                            |
| **Best for**                  | Chat UIs, long responses | Batch processing, short responses |

## Related Endpoints

* [Chat API](./chat) - Non-streaming version
* [Metrics API](./metrics) - Analytics and monitoring
* [Logs API](./logs) - Debugging and audit trails

## Support

* **Documentation**: [docs.withperf.pro](https://docs.withperf.pro)
* **Email**: [support@withperf.pro](mailto:support@withperf.pro)
* **Examples**: [github.com/perf/examples](https://github.com/perf/examples)
