# Streaming API

> Real-time streaming API documentation

# Streaming API Reference

The Streaming API provides real-time, token-by-token responses for building responsive chat interfaces and interactive applications.

## Endpoint

```
POST https://api.withperf.pro/v1/chat/stream
```

## Authentication

Include your API key in the Authorization header:

```
Authorization: Bearer YOUR_API_KEY
```

## How It Works

The Streaming API returns responses using Server-Sent Events (SSE), sending text chunks as they're generated rather than waiting for the complete response.

### Benefits

* **Lower perceived latency**: Users see responses immediately
* **Better UX**: Progressive rendering feels more responsive
* **Real-time feedback**: Stop generation early if needed
* **Streaming UI**: Perfect for chat interfaces

## Request Body

Same as the [Chat API](./chat), but responses stream incrementally.

### Example Request

```bash  theme={null}
curl -X POST https://api.withperf.pro/v1/chat/stream \
  -H "Authorization: Bearer pk_test_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "max_cost_per_call": 0.01
  }'
```

## Response Format

The response uses Server-Sent Events (SSE) with the following format:

```
data: {"chunk": "Quantum", "done": false}

data: {"chunk": " computing", "done": false}

data: {"chunk": " uses", "done": false}

data: {"chunk": " quantum", "done": false}

data: {"chunk": " mechanics...", "done": false}

data: {
  "chunk": "",
  "done": true,
  "model_used": "claude-sonnet-4-5",
  "billing": {
    "cost_usd": 0.00234
  },
  "tokens": {
    "input": 15,
    "output": 156,
    "total": 171
  },
  "metadata": {
    "call_id": "call_xyz789",
    "task_type": "writing",
    "latency_ms": 1823,
    "timestamp": "2024-01-15T10:30:00Z"
  }
}
```

## Event Types

### Content Chunk (done: false)

Sent for each token or group of tokens:

```json  theme={null}
{
  "chunk": "text fragment",
  "done": false
}
```

### Final Event (done: true)

Sent when generation is complete with full metadata:

```json  theme={null}
{
  "chunk": "",
  "done": true,
  "model_used": "gpt-4o",
  "billing": { "cost_usd": 0.00123 },
  "tokens": { "input": 20, "output": 100, "total": 120 },
  "metadata": { ... }
}
```

## Client Implementation

### JavaScript/TypeScript

```typescript  theme={null}
async function streamChat(messages: Message[]) {
  const response = await fetch('https://api.withperf.pro/v1/chat/stream', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ messages }),
  });

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();
  let fullText = '';

  while (true) {
    const { done, value } = await reader!.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = JSON.parse(line.slice(6));

        if (!data.done) {
          fullText += data.chunk;
          console.log('Partial:', fullText);
        } else {
          console.log('Complete!');
          console.log('Model:', data.model_used);
          console.log('Cost:', data.billing.cost_usd);
        }
      }
    }
  }

  return fullText;
}
```

### React Hook

```typescript  theme={null}
import { useState, useCallback } from 'react';

export function useStreamingChat() {
  const [content, setContent] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);
  const [metadata, setMetadata] = useState<any>(null);

  const streamMessage = useCallback(async (messages: Message[]) => {
    setContent('');
    setIsStreaming(true);

    try {
      const response = await fetch('/api/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ messages }),
      });

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader!.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n');

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const data = JSON.parse(line.slice(6));

            if (!data.done) {
              setContent(prev => prev + data.chunk);
            } else {
              setMetadata({
                model: data.model_used,
                cost: data.billing.cost_usd,
                tokens: data.tokens,
              });
            }
          }
        }
      }
    } finally {
      setIsStreaming(false);
    }
  }, []);

  return { content, isStreaming, metadata, streamMessage };
}
```

### Python

```python  theme={null}
import requests
import json

def stream_chat(messages):
    url = "https://api.withperf.pro/v1/chat/stream"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {"messages": messages}

    with requests.post(url, json=payload, headers=headers, stream=True) as response:
        full_text = ""

        for line in response.iter_lines():
            if line:
                line = line.decode('utf-8')
                if line.startswith('data: '):
                    data = json.loads(line[6:])

                    if not data.get('done'):
                        chunk = data.get('chunk', '')
                        full_text += chunk
                        print(chunk, end='', flush=True)
                    else:
                        print(f"\n\nModel: {data['model_used']}")
                        print(f"Cost: ${data['billing']['cost_usd']:.5f}")
                        print(f"Tokens: {data['tokens']['total']}")

        return full_text

# Usage
messages = [{"role": "user", "content": "Tell me a story"}]
result = stream_chat(messages)
```

### Python Async

```python  theme={null}
import aiohttp
import asyncio
import json

async def stream_chat_async(messages):
    url = "https://api.withperf.pro/v1/chat/stream"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {"messages": messages}

    async with aiohttp.ClientSession() as session:
        async with session.post(url, json=payload, headers=headers) as response:
            full_text = ""

            async for line in response.content:
                line = line.decode('utf-8').strip()
                if line.startswith('data: '):
                    data = json.loads(line[6:])

                    if not data.get('done'):
                        chunk = data.get('chunk', '')
                        full_text += chunk
                        yield chunk
                    else:
                        yield {
                            'done': True,
                            'metadata': data
                        }

# Usage
async def main():
    messages = [{"role": "user", "content": "Explain AI"}]
    async for chunk in stream_chat_async(messages):
        if isinstance(chunk, str):
            print(chunk, end='', flush=True)
        else:
            print(f"\n\nDone! Cost: ${chunk['metadata']['billing']['cost_usd']:.5f}")

asyncio.run(main())
```

### Go

```go  theme={null}
package main

import (
    "bufio"
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
    "strings"
)

type StreamChunk struct {
    Chunk string `json:"chunk"`
    Done  bool   `json:"done"`
}

func streamChat(messages []Message) error {
    payload, _ := json.Marshal(map[string]interface{}{
        "messages": messages,
    })

    req, _ := http.NewRequest("POST",
        "https://api.withperf.pro/v1/chat/stream",
        bytes.NewBuffer(payload))

    req.Header.Set("Authorization", "Bearer "+API_KEY)
    req.Header.Set("Content-Type", "application/json")

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    scanner := bufio.NewScanner(resp.Body)
    for scanner.Scan() {
        line := scanner.Text()
        if strings.HasPrefix(line, "data: ") {
            var chunk StreamChunk
            json.Unmarshal([]byte(line[6:]), &chunk)

            if !chunk.Done {
                fmt.Print(chunk.Chunk)
            } else {
                fmt.Println("\nDone!")
            }
        }
    }

    return scanner.Err()
}
```

## React Component Example

```typescript  theme={null}
'use client';

import { useState } from 'react';

export default function StreamingChat() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState('');
  const [streamingContent, setStreamingContent] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  const sendMessage = async () => {
    if (!input.trim()) return;

    const userMessage = { role: 'user', content: input };
    setMessages(prev => [...prev, userMessage]);
    setInput('');
    setIsStreaming(true);
    setStreamingContent('');

    try {
      const response = await fetch('/api/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          messages: [...messages, userMessage],
        }),
      });

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();
      let fullContent = '';

      while (true) {
        const { done, value } = await reader!.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n');

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const data = JSON.parse(line.slice(6));

            if (!data.done) {
              fullContent += data.chunk;
              setStreamingContent(fullContent);
            } else {
              // Streaming complete
              setMessages(prev => [
                ...prev,
                { role: 'assistant', content: fullContent }
              ]);
              setStreamingContent('');
            }
          }
        }
      }
    } catch (error) {
      console.error('Streaming error:', error);
    } finally {
      setIsStreaming(false);
    }
  };

  return (
    <div className="flex flex-col h-screen">
      <div className="flex-1 overflow-y-auto p-4">
        {messages.map((msg, idx) => (
          <div key={idx} className="mb-4">
            <div className="font-bold">{msg.role}:</div>
            <div>{msg.content}</div>
          </div>
        ))}

        {streamingContent && (
          <div className="mb-4">
            <div className="font-bold">assistant:</div>
            <div className="animate-pulse">{streamingContent}</div>
          </div>
        )}
      </div>

      <div className="border-t p-4">
        <input
          type="text"
          value={input}
          onChange={(e) => setInput(e.target.value)}
          onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
          disabled={isStreaming}
          className="w-full border rounded px-3 py-2"
          placeholder="Type a message..."
        />
      </div>
    </div>
  );
}
```

## Error Handling

### Connection Errors

```typescript  theme={null}
try {
  const response = await fetch(url, { ... });

  if (!response.ok) {
    throw new Error(`HTTP ${response.status}: ${response.statusText}`);
  }

  if (!response.body) {
    throw new Error('No response body');
  }

  // Stream processing...
} catch (error) {
  console.error('Streaming failed:', error);
  // Fallback to non-streaming API
  const fallback = await fetch('/v1/chat', { ... });
}
```

### Timeout Handling

```typescript  theme={null}
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 30000); // 30s timeout

try {
  const response = await fetch(url, {
    signal: controller.signal,
    ...
  });
  // Process stream...
} catch (error) {
  if (error.name === 'AbortError') {
    console.error('Request timed out');
  }
} finally {
  clearTimeout(timeout);
}
```

## Performance Optimization

### Chunking Strategy

Perf optimizes chunk size for balance between latency and throughput:

* **Small prompts**: Sends tokens individually for fastest perceived speed
* **Large generations**: Batches tokens for network efficiency
* **Adaptive**: Adjusts based on connection quality

### Buffering

For smoother UI updates, buffer chunks:

```typescript  theme={null}
let buffer = '';
let lastUpdate = Date.now();

// In your streaming loop:
buffer += data.chunk;

const now = Date.now();
if (now - lastUpdate > 50) { // Update every 50ms
  setContent(prev => prev + buffer);
  buffer = '';
  lastUpdate = now;
}
```

## Rate Limits

Same limits as the Chat API:

| Tier       | Requests/Minute | Concurrent Streams |
| ---------- | --------------- | ------------------ |
| Free       | 60              | 3                  |
| Pro        | 300             | 10                 |
| Enterprise | Custom          | Custom             |

## Best Practices

### 1. Show Loading State

```typescript  theme={null}
{isStreaming && (
  <div className="flex items-center gap-2">
    <Spinner />
    <span>Generating response...</span>
  </div>
)}
```

### 2. Handle Stream Interruption

Allow users to stop generation:

```typescript  theme={null}
const abortController = new AbortController();

// Cancel button handler
const handleCancel = () => {
  abortController.abort();
  setIsStreaming(false);
};

// Pass to fetch
fetch(url, { signal: abortController.signal, ... });
```

### 3. Graceful Degradation

Fall back to non-streaming if not supported:

```typescript  theme={null}
const supportsStreaming = 'ReadableStream' in window;

if (supportsStreaming) {
  // Use streaming API
} else {
  // Use regular Chat API
}
```

### 4. Optimize for Mobile

Consider connection quality:

```typescript  theme={null}
// Detect slow connections
const connection = (navigator as any).connection;
const isSlowConnection = connection?.effectiveType === '2g' ||
                         connection?.effectiveType === 'slow-2g';

if (isSlowConnection) {
  // Use regular API or increase buffer size
}
```

## Comparison: Streaming vs Non-Streaming

| Feature                       | Streaming                | Non-Streaming                     |
| ----------------------------- | ------------------------ | --------------------------------- |
| **First token latency**       | \~200ms                  | \~2-5s                            |
| **Perceived speed**           | Immediate                | Delayed                           |
| **Implementation complexity** | Medium                   | Low                               |
| **Network efficiency**        | Same                     | Same                              |
| **Error recovery**            | More complex             | Simple                            |
| **Best for**                  | Chat UIs, long responses | Batch processing, short responses |

## Related Endpoints

* [Chat API](./chat) - Non-streaming version
* [Metrics API](./metrics) - Analytics and monitoring
* [Logs API](./logs) - Debugging and audit trails

## Support

* **Documentation**: [docs.withperf.pro](https://docs.withperf.pro)
* **Email**: [support@withperf.pro](mailto:support@withperf.pro)
* **Examples**: [github.com/perf/examples](https://github.com/perf/examples)


---

> To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.withperf.pro/llms.txt