Back to Skills
    🦞

    llmrouter

    Intelligent LLM proxy that routes requests to appropriate models

    By @alexrudloff
    View on GitHub
    SKILL.md
    ---
    name: llmrouter
    description: Intelligent LLM proxy that routes requests to appropriate models based on complexity. Save money by using cheaper models for simple tasks. Tested with Anthropic, OpenAI, Gemini, Kimi/Moonshot, and Ollama.
    homepage: https://github.com/alexrudloff/llmrouter
    metadata: {"openclaw":{"emoji":"🔀","homepage":"https://github.com/alexrudloff/llmrouter","os":["darwin","linux"],"requires":{"bins":["python3"],"anyBins":["pip","pip3"]},"primaryEnv":"ANTHROPIC_API_KEY"}}
    ---
    
    # LLM Router
    
    An intelligent proxy that classifies incoming requests by complexity and routes them to appropriate LLM models. Use cheaper/faster models for simple tasks and reserve expensive models for complex ones.
    
    **Works with [OpenClaw](https://github.com/openclaw/openclaw)** to reduce token usage and API costs by routing simple requests to smaller models.
    
    **Status:** Tested with Anthropic, OpenAI, Google Gemini, Kimi/Moonshot, and Ollama.
    
    ## Quick Start
    
    ### Prerequisites
    
    1. **Python 3.10+** with pip
    2. **Ollama** (optional - only if using local classification)
    3. **Anthropic API key** or Claude Code OAuth token (or other provider key)
    
    ### Setup
    
    ```bash
    # Clone if not already present
    git clone https://github.com/alexrudloff/llmrouter.git
    cd llmrouter
    
    # Create virtual environment (required on modern Python)
    python3 -m venv venv
    source venv/bin/activate
    
    # Install dependencies
    pip install -r requirements.txt
    
    # Pull classifier model (if using local classification)
    ollama pull qwen2.5:3b
    
    # Copy and customize config
    cp config.yaml.example config.yaml
    # Edit config.yaml with your API key and model preferences
    ```
    
    ### Verify Installation
    
    ```bash
    # Start the server
    source venv/bin/activate
    python server.py
    
    # In another terminal, test health endpoint
    curl http://localhost:4001/health
    # Should return: {"status": "ok", ...}
    ```
    
    ### Start the Server
    
    ```bash
    python server.py
    ```
    
    Options:
    - `--port PORT` - Port to listen on (default: 4001)
    - `--host HOST` - Host to bind (default: 127.0.0.1)
    - `--config PATH` - Config file path (default: config.yaml)
    - `--log` - Enable verbose logging
    - `--openclaw` - Enable OpenClaw compatibility (rewrites model name in system prompt)
    
    ## Configuration
    
    Edit `config.yaml` to customize:
    
    ### Model Routing
    
    ```yaml
    # Anthropic routing
    models:
      super_easy: "anthropic:claude-haiku-4-5-20251001"
      easy: "anthropic:claude-haiku-4-5-20251001"
      medium: "anthropic:claude-sonnet-4-20250514"
      hard: "anthropic:claude-opus-4-20250514"
      super_hard: "anthropic:claude-opus-4-20250514"
    
    # OpenAI routing
    models:
      super_easy: "openai:gpt-4o-mini"
      easy: "openai:gpt-4o-mini"
      medium: "openai:gpt-4o"
      hard: "openai:o3-mini"
      super_hard: "openai:o3"
    
    # Google Gemini routing
    models:
      super_easy: "google:gemini-2.0-flash"
      easy: "google:gemini-2.0-flash"
      medium: "google:gemini-2.0-flash"
      hard: "google:gemini-2.0-flash"
      super_hard: "google:gemini-2.0-flash"
    ```
    
    **Note:** Reasoning models are auto-detected and use correct API params.
    
    ### Classifier
    
    Three options for classifying request complexity:
    
    **Local (default)** - Free, requires Ollama:
    ```yaml
    classifier:
      provider: "local"
      model: "qwen2.5:3b"
    ```
    
    **Anthropic** - Uses Haiku, fast and cheap:
    ```yaml
    classifier:
      provider: "anthropic"
      model: "claude-haiku-4-5-20251001"
    ```
    
    **OpenAI** - Uses GPT-4o-mini:
    ```yaml
    classifier:
      provider: "openai"
      model: "gpt-4o-mini"
    ```
    
    **Google** - Uses Gemini:
    ```yaml
    classifier:
      provider: "google"
      model: "gemini-2.0-flash"
    ```
    
    **Kimi** - Uses Moonshot:
    ```yaml
    classifier:
      provider: "kimi"
      model: "moonshot-v1-8k"
    ```
    
    Use remote (anthropic/openai/google/kimi) if your machine can't run local models.
    
    ### Supported Providers
    
    - `anthropic:claude-*` - Anthropic Claude models (tested)
    - `openai:gpt-*`, `openai:o1-*`, `openai:o3-*` - OpenAI models (tested)
    - `google:gemini-*` - Google Gemini models (tested)
    - `kimi:kimi-k2.5`, `kimi:moonshot-*` - Kimi/Moonshot models (tested)
    - `local:model-name` - Local Ollama models (tested)
    
    ## Complexity Levels
    
    | Level | Use Case | Default Model |
    |-------|----------|---------------|
    | super_easy | Greetings, acknowledgments | Haiku |
    | easy | Simple Q&A, reminders | Haiku |
    | medium | Coding, emails, research | Sonnet |
    | hard | Complex reasoning, debugging | Opus |
    | super_hard | System architecture, proofs | Opus |
    
    ## Customizing Classification
    
    Edit `ROUTES.md` to tune how messages are classified. The classifier reads the table in this file to determine complexity levels.
    
    ## API Usage
    
    The router exposes an OpenAI-compatible API:
    
    ```bash
    curl http://localhost:4001/v1/chat/completions \
      -H "Authorization: Bearer $ANTHROPIC_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "llm-router",
        "messages": [{"role": "user", "content": "Hello!"}]
      }'
    ```
    
    ## Testing Classification
    
    ```bash
    python classifier.py "Write a Python sort function"
    # Output: medium
    
    python classifier.py --test
    # Runs test suite
    ```
    
    ## Running as macOS Service
    
    Create `~/Library/LaunchAgents/com.llmrouter.plist`:
    
    ```xml
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
        <key>Label</key>
        <string>com.llmrouter</string>
        <key>ProgramArguments</key>
        <array>
            <string>/path/to/llmrouter/venv/bin/python</string>
            <string>/path/to/llmrouter/server.py</string>
            <string>--openclaw</string>
        </array>
        <key>RunAtLoad</key>
        <true/>
        <key>KeepAlive</key>
        <true/>
        <key>WorkingDirectory</key>
        <string>/path/to/llmrouter</string>
        <key>StandardOutPath</key>
        <string>/path/to/llmrouter/logs/stdout.log</string>
        <key>StandardErrorPath</key>
        <string>/path/to/llmrouter/logs/stderr.log</string>
    </dict>
    </plist>
    ```
    
    **Important:** Replace `/path/to/llmrouter` with your actual install path. Must use the venv python, not system python.
    
    ```bash
    # Create logs directory
    mkdir -p ~/path/to/llmrouter/logs
    
    # Load the service
    launchctl load ~/Library/LaunchAgents/com.llmrouter.plist
    
    # Verify it's running
    curl http://localhost:4001/health
    
    # To stop/restart
    launchctl unload ~/Library/LaunchAgents/com.llmrouter.plist
    launchctl load ~/Library/LaunchAgents/com.llmrouter.plist
    ```
    
    ## OpenClaw Configuration
    
    Add the router as a provider in `~/.openclaw/openclaw.json`:
    
    ```json
    {
      "models": {
        "providers": {
          "localrouter": {
            "baseUrl": "http://localhost:4001/v1",
            "apiKey": "via-router",
            "api": "openai-completions",
            "models": [
              {
                "id": "llm-router",
                "name": "LLM Router (Auto-routes by complexity)",
                "reasoning": false,
                "input": ["text", "image"],
                "cost": {
                  "input": 0,
                  "output": 0,
                  "cacheRead": 0,
                  "cacheWrite": 0
                },
                "contextWindow": 200000,
                "maxTokens": 8192
              }
            ]
          }
        }
      }
    }
    ```
    
    **Note:** Cost is set to 0 because actual costs depend on which model the router selects. The router logs which model handled each request.
    
    ### Set as Default Model (Optional)
    
    To use the router for all agents by default, add:
    
    ```json
    {
      "agents": {
        "defaults": {
          "model": {
            "primary": "localrouter/llm-router"
          }
        }
      }
    }
    ```
    
    ### Using with OAuth Tokens
    
    If your `config.yaml` uses an Anthropic OAuth token from OpenClaw's `~/.openclaw/auth-profiles.json`, the router automatically handles Claude Code identity headers.
    
    ### OpenClaw Compatibility Mode (Required)
    
    **If using with OpenClaw, you MUST start the server with `--openclaw`:**
    
    ```bash
    python server.py --openclaw
    ```
    
    This flag enables compatibility features required for OpenClaw:
    - Rewrites model names in responses so OpenClaw shows the actual model being used
    - Handles tool name and ID remapping for proper tool call routing
    
    Without this flag, you may encounter errors when using the router with OpenClaw.
    
    ## Common Tasks
    
    - **Check server status**: `curl http://localhost:4001/health`
    - **View current config**: `cat config.yaml`
    - **Test a classification**: `python classifier.py "your message"`
    - **Run classification tests**: `python classifier.py --test`
    - **Restart server**: Stop and run `python server.py` again
    - **View logs** (if running as service): `tail -f logs/stdout.log`
    
    ## Troubleshooting
    
    ### "externally-managed-environment" error
    Python 3.11+ requires virtual environments. Create one:
    ```bash
    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    ```
    
    ### "Connection refused" on port 4001
    Server isn't running. Start it:
    ```bash
    source venv/bin/activate && python server.py
    ```
    
    ### Classification returns wrong complexity
    Edit `ROUTES.md` to tune classification rules. The classifier reads this file to determine complexity levels.
    
    ### Ollama errors / "model not found"
    Ensure Ollama is running and the model is pulled:
    ```bash
    ollama serve  # Start Ollama if not running
    ollama pull qwen2.5:3b
    ```
    
    ### OAuth token not working
    Ensure your token in `config.yaml` starts with `sk-ant-oat`. The router auto-detects OAuth tokens and adds required identity headers.
    
    ### LaunchAgent not starting
    Check logs and ensure paths are absolute:
    ```bash
    cat ~/Library/LaunchAgents/com.llmrouter.plist  # Verify paths
    cat /path/to/llmrouter/logs/stderr.log  # Check for errors
    ```