Back to Skills
    🦞

    llm-supervisor

    Graceful rate limit handling with Ollama fallback.

    By @dhardie
    View on GitHub
    SKILL.md
    ---
    name: llm-supervisor
    description: Graceful rate limit handling with Ollama fallback. Notifies on rate limits, offers local model switch with confirmation for code tasks.
    ---
    
    # LLM Supervisor 🔮
    
    Handles rate limits and model fallbacks gracefully.
    
    ## Behavior
    
    ### On Rate Limit / Overload Errors
    
    When I encounter rate limits or overload errors from cloud providers (Anthropic, OpenAI):
    
    1. **Tell the user immediately** — Don't silently fail or retry endlessly
    2. **Offer local fallback** — Ask if they want to switch to Ollama
    3. **Wait for confirmation** — Never auto-switch for code generation tasks
    
    ### Confirmation Required
    
    Before using local models for code generation, ask:
    > "Cloud is rate-limited. Switch to local Ollama (`qwen2.5:7b`)? Reply 'yes' to confirm."
    
    For simple queries (chat, summaries), can switch without confirmation if user previously approved.
    
    ## Commands
    
    ### `/llm status`
    Report current state:
    - Which provider is active (cloud/local)
    - Ollama availability and models
    - Recent rate limit events
    
    ### `/llm switch local`
    Manually switch to Ollama for the session.
    
    ### `/llm switch cloud`
    Switch back to cloud provider.
    
    ## Using Ollama
    
    ```bash
    # Check available models
    ollama list
    
    # Run a query
    ollama run qwen2.5:7b "your prompt here"
    
    # For longer prompts, use stdin
    echo "your prompt" | ollama run qwen2.5:7b
    ```
    
    ## Installed Models
    
    Check with `ollama list`. Configured default: `qwen2.5:7b`
    
    ## State Tracking
    
    Track in memory during session:
    - `currentProvider`: "cloud" | "local"  
    - `lastRateLimitAt`: timestamp or null
    - `localConfirmedForCode`: boolean
    
    Reset to cloud at session start.