🦞
llmrouter

Intelligent LLM proxy that routes requests to appropriate models
SKILL.md
---
name: llmrouter
description: Intelligent LLM proxy that routes requests to appropriate models based on complexity. Save money by using cheaper models for simple tasks. Tested with Anthropic, OpenAI, Gemini, Kimi/Moonshot, and Ollama.
homepage: https://github.com/alexrudloff/llmrouter
metadata: {"openclaw":{"emoji":"🔀","homepage":"https://github.com/alexrudloff/llmrouter","os":["darwin","linux"],"requires":{"bins":["python3"],"anyBins":["pip","pip3"]},"primaryEnv":"ANTHROPIC_API_KEY"}}
---

# LLM Router

An intelligent proxy that classifies incoming requests by complexity and routes them to appropriate LLM models. Use cheaper/faster models for simple tasks and reserve expensive models for complex ones.

**Works with [OpenClaw](https://github.com/openclaw/openclaw)** to reduce token usage and API costs by routing simple requests to smaller models.

**Status:** Tested with Anthropic, OpenAI, Google Gemini, Kimi/Moonshot, and Ollama.

## Quick Start

### Prerequisites

1. **Python 3.10+** with pip
2. **Ollama** (optional - only if using local classification)
3. **Anthropic API key** or Claude Code OAuth token (or other provider key)

### Setup

```bash
# Clone if not already present
git clone https://github.com/alexrudloff/llmrouter.git
cd llmrouter

# Create virtual environment (required on modern Python)
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Pull classifier model (if using local classification)
ollama pull qwen2.5:3b

# Copy and customize config
cp config.yaml.example config.yaml
# Edit config.yaml with your API key and model preferences
```

### Verify Installation

```bash
# Start the server
source venv/bin/activate
python server.py

# In another terminal, test health endpoint
curl http://localhost:4001/health
# Should return: {"status": "ok", ...}
```

### Start the Server

```bash
python server.py
```

Options:
- `--port PORT` - Port to listen on (default: 4001)
- `--host HOST` - Host to bind (default: 127.0.0.1)
- `--config PATH` - Config file path (default: config.yaml)
- `--log` - Enable verbose logging
- `--openclaw` - Enable OpenClaw compatibility (rewrites model name in system prompt)

## Configuration

Edit `config.yaml` to customize:

### Model Routing

```yaml
# Anthropic routing
models:
  super_easy: "anthropic:claude-haiku-4-5-20251001"
  easy: "anthropic:claude-haiku-4-5-20251001"
  medium: "anthropic:claude-sonnet-4-20250514"
  hard: "anthropic:claude-opus-4-20250514"
  super_hard: "anthropic:claude-opus-4-20250514"

# OpenAI routing
models:
  super_easy: "openai:gpt-4o-mini"
  easy: "openai:gpt-4o-mini"
  medium: "openai:gpt-4o"
  hard: "openai:o3-mini"
  super_hard: "openai:o3"

# Google Gemini routing
models:
  super_easy: "google:gemini-2.0-flash"
  easy: "google:gemini-2.0-flash"
  medium: "google:gemini-2.0-flash"
  hard: "google:gemini-2.0-flash"
  super_hard: "google:gemini-2.0-flash"
```

**Note:** Reasoning models are auto-detected and use correct API params.

### Classifier

Three options for classifying request complexity:

**Local (default)** - Free, requires Ollama:
```yaml
classifier:
  provider: "local"
  model: "qwen2.5:3b"
```

**Anthropic** - Uses Haiku, fast and cheap:
```yaml
classifier:
  provider: "anthropic"
  model: "claude-haiku-4-5-20251001"
```

**OpenAI** - Uses GPT-4o-mini:
```yaml
classifier:
  provider: "openai"
  model: "gpt-4o-mini"
```

**Google** - Uses Gemini:
```yaml
classifier:
  provider: "google"
  model: "gemini-2.0-flash"
```

**Kimi** - Uses Moonshot:
```yaml
classifier:
  provider: "kimi"
  model: "moonshot-v1-8k"
```

Use remote (anthropic/openai/google/kimi) if your machine can't run local models.

### Supported Providers

- `anthropic:claude-*` - Anthropic Claude models (tested)
- `openai:gpt-*`, `openai:o1-*`, `openai:o3-*` - OpenAI models (tested)
- `google:gemini-*` - Google Gemini models (tested)
- `kimi:kimi-k2.5`, `kimi:moonshot-*` - Kimi/Moonshot models (tested)
- `local:model-name` - Local Ollama models (tested)

## Complexity Levels

| Level | Use Case | Default Model |
|-------|----------|---------------|
| super_easy | Greetings, acknowledgments | Haiku |
| easy | Simple Q&A, reminders | Haiku |
| medium | Coding, emails, research | Sonnet |
| hard | Complex reasoning, debugging | Opus |
| super_hard | System architecture, proofs | Opus |

## Customizing Classification

Edit `ROUTES.md` to tune how messages are classified. The classifier reads the table in this file to determine complexity levels.

## API Usage

The router exposes an OpenAI-compatible API:

```bash
curl http://localhost:4001/v1/chat/completions \
  -H "Authorization: Bearer $ANTHROPIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llm-router",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
```

## Testing Classification

```bash
python classifier.py "Write a Python sort function"
# Output: medium

python classifier.py --test
# Runs test suite
```

## Running as macOS Service

Create `~/Library/LaunchAgents/com.llmrouter.plist`:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.llmrouter</string>
    <key>ProgramArguments</key>
    <array>
        <string>/path/to/llmrouter/venv/bin/python</string>
        <string>/path/to/llmrouter/server.py</string>
        <string>--openclaw</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>WorkingDirectory</key>
    <string>/path/to/llmrouter</string>
    <key>StandardOutPath</key>
    <string>/path/to/llmrouter/logs/stdout.log</string>
    <key>StandardErrorPath</key>
    <string>/path/to/llmrouter/logs/stderr.log</string>
</dict>
</plist>
```

**Important:** Replace `/path/to/llmrouter` with your actual install path. Must use the venv python, not system python.

```bash
# Create logs directory
mkdir -p ~/path/to/llmrouter/logs

# Load the service
launchctl load ~/Library/LaunchAgents/com.llmrouter.plist

# Verify it's running
curl http://localhost:4001/health

# To stop/restart
launchctl unload ~/Library/LaunchAgents/com.llmrouter.plist
launchctl load ~/Library/LaunchAgents/com.llmrouter.plist
```

## OpenClaw Configuration

Add the router as a provider in `~/.openclaw/openclaw.json`:

```json
{
  "models": {
    "providers": {
      "localrouter": {
        "baseUrl": "http://localhost:4001/v1",
        "apiKey": "via-router",
        "api": "openai-completions",
        "models": [
          {
            "id": "llm-router",
            "name": "LLM Router (Auto-routes by complexity)",
            "reasoning": false,
            "input": ["text", "image"],
            "cost": {
              "input": 0,
              "output": 0,
              "cacheRead": 0,
              "cacheWrite": 0
            },
            "contextWindow": 200000,
            "maxTokens": 8192
          }
        ]
      }
    }
  }
}
```

**Note:** Cost is set to 0 because actual costs depend on which model the router selects. The router logs which model handled each request.

### Set as Default Model (Optional)

To use the router for all agents by default, add:

```json
{
  "agents": {
    "defaults": {
      "model": {
        "primary": "localrouter/llm-router"
      }
    }
  }
}
```

### Using with OAuth Tokens

If your `config.yaml` uses an Anthropic OAuth token from OpenClaw's `~/.openclaw/auth-profiles.json`, the router automatically handles Claude Code identity headers.

### OpenClaw Compatibility Mode (Required)

**If using with OpenClaw, you MUST start the server with `--openclaw`:**

```bash
python server.py --openclaw
```

This flag enables compatibility features required for OpenClaw:
- Rewrites model names in responses so OpenClaw shows the actual model being used
- Handles tool name and ID remapping for proper tool call routing

Without this flag, you may encounter errors when using the router with OpenClaw.

## Common Tasks

- **Check server status**: `curl http://localhost:4001/health`
- **View current config**: `cat config.yaml`
- **Test a classification**: `python classifier.py "your message"`
- **Run classification tests**: `python classifier.py --test`
- **Restart server**: Stop and run `python server.py` again
- **View logs** (if running as service): `tail -f logs/stdout.log`

## Troubleshooting

### "externally-managed-environment" error
Python 3.11+ requires virtual environments. Create one:
```bash
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

### "Connection refused" on port 4001
Server isn't running. Start it:
```bash
source venv/bin/activate && python server.py
```

### Classification returns wrong complexity
Edit `ROUTES.md` to tune classification rules. The classifier reads this file to determine complexity levels.

### Ollama errors / "model not found"
Ensure Ollama is running and the model is pulled:
```bash
ollama serve  # Start Ollama if not running
ollama pull qwen2.5:3b
```

### OAuth token not working
Ensure your token in `config.yaml` starts with `sk-ant-oat`. The router auto-detects OAuth tokens and adds required identity headers.

### LaunchAgent not starting
Check logs and ensure paths are absolute:
```bash
cat ~/Library/LaunchAgents/com.llmrouter.plist  # Verify paths
cat /path/to/llmrouter/logs/stderr.log  # Check for errors
```