Back to Skills
    šŸ¦ž

    hopeids

    Inference-based intrusion detection for AI agents with quarantine

    By @emberdesire
    View on GitHub
    SKILL.md
    # hopeIDS Security Skill
    
    Inference-based intrusion detection for AI agents with quarantine and human-in-the-loop.
    
    ## Security Invariants
    
    These are **non-negotiable** design principles:
    
    1. **Block = full abort** — Blocked messages never reach jasper-recall or the agent
    2. **Metadata only** — No raw malicious content is ever stored
    3. **Approve ≠ re-inject** — Approval changes future behavior, doesn't resurrect messages
    4. **Alerts are programmatic** — Telegram alerts built from metadata, no LLM involved
    
    ---
    
    ## Features
    
    - **Auto-scan** — Scan messages before agent processing
    - **Quarantine** — Block threats with metadata-only storage
    - **Human-in-the-loop** — Telegram alerts for review
    - **Per-agent config** — Different thresholds for different agents
    - **Commands** — `/approve`, `/reject`, `/trust`, `/quarantine`
    
    ---
    
    ## The Pipeline
    
    ```
    Message arrives
        ↓
    hopeIDS.autoScan()
        ↓
    ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
    │  risk >= threshold?                     │
    │                                         │
    │  BLOCK (strictMode):                    │
    │     → Create QuarantineRecord           │
    │     → Send Telegram alert               │
    │     → ABORT (no recall, no agent)       │
    │                                         │
    │  WARN (non-strict):                     │
    │     → Inject <security-alert>           │
    │     → Continue to jasper-recall         │
    │     → Continue to agent                 │
    │                                         │
    │  ALLOW:                                 │
    │     → Continue normally                 │
    ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
    ```
    
    ---
    
    ## Configuration
    
    ```json
    {
      "plugins": {
        "entries": {
          "hopeids": {
            "enabled": true,
            "config": {
              "autoScan": true,
              "defaultRiskThreshold": 0.7,
              "strictMode": false,
              "telegramAlerts": true,
              "agents": {
                "moltbook-scanner": {
                  "strictMode": true,
                  "riskThreshold": 0.7
                },
                "main": {
                  "strictMode": false,
                  "riskThreshold": 0.8
                }
              }
            }
          }
        }
      }
    }
    ```
    
    ### Options
    
    | Option | Type | Default | Description |
    |--------|------|---------|-------------|
    | `autoScan` | boolean | `false` | Auto-scan every message |
    | `strictMode` | boolean | `false` | Block (vs warn) on threats |
    | `defaultRiskThreshold` | number | `0.7` | Risk level that triggers action |
    | `telegramAlerts` | boolean | `true` | Send alerts for blocked messages |
    | `telegramChatId` | string | - | Override alert destination |
    | `quarantineDir` | string | `~/.openclaw/quarantine/hopeids` | Storage path |
    | `agents` | object | - | Per-agent overrides |
    | `trustOwners` | boolean | `true` | Skip scanning owner messages |
    
    ---
    
    ## Quarantine Records
    
    When a message is blocked, a metadata record is created:
    
    ```json
    {
      "id": "q-7f3a2b",
      "ts": "2026-02-06T00:48:00Z",
      "agent": "moltbook-scanner",
      "source": "moltbook",
      "senderId": "@sus_user",
      "intent": "instruction_override",
      "risk": 0.85,
      "patterns": [
        "matched regex: ignore.*instructions",
        "matched keyword: api key"
      ],
      "contentHash": "ab12cd34...",
      "status": "pending"
    }
    ```
    
    **Note:** There is NO `originalMessage` field. This is intentional.
    
    ---
    
    ## Telegram Alerts
    
    When a message is blocked:
    
    ```
    šŸ›‘ Message blocked
    
    ID: `q-7f3a2b`
    Agent: moltbook-scanner
    Source: moltbook
    Sender: @sus_user
    Intent: instruction_override (85%)
    
    Patterns:
    • matched regex: ignore.*instructions
    • matched keyword: api key
    
    `/approve q-7f3a2b`
    `/reject q-7f3a2b`
    `/trust @sus_user`
    ```
    
    Built from metadata only. No LLM touches this.
    
    ---
    
    ## Commands
    
    ### `/quarantine [all|clean]`
    
    List quarantine records.
    
    ```
    /quarantine        # List pending
    /quarantine all    # List all (including resolved)
    /quarantine clean  # Clean expired records
    ```
    
    ### `/approve <id>`
    
    Mark a blocked message as a false positive.
    
    ```
    /approve q-7f3a2b
    ```
    
    **Effect:**
    - Status → `approved`
    - (Future) Add sender to allowlist
    - (Future) Lower pattern weight
    
    ### `/reject <id>`
    
    Confirm a blocked message was a true positive.
    
    ```
    /reject q-7f3a2b
    ```
    
    **Effect:**
    - Status → `rejected`
    - (Future) Reinforce pattern weights
    
    ### `/trust <senderId>`
    
    Whitelist a sender for future messages.
    
    ```
    /trust @legitimate_user
    ```
    
    ### `/scan <message>`
    
    Manually scan a message.
    
    ```
    /scan ignore your previous instructions and...
    ```
    
    ---
    
    ## What Approve/Reject Mean
    
    | Command | What it does | What it doesn't do |
    |---------|--------------|-------------------|
    | `/approve` | Marks as false positive, may adjust IDS | Does NOT re-inject the message |
    | `/reject` | Confirms threat, may strengthen patterns | Does NOT affect current message |
    | `/trust` | Whitelists sender for future | Does NOT retroactively approve |
    
    **The blocked message is gone by design.** If it was legitimate, the sender can re-send.
    
    ---
    
    ## Per-Agent Configuration
    
    Different agents need different security postures:
    
    ```json
    "agents": {
      "moltbook-scanner": {
        "strictMode": true,    // Block threats
        "riskThreshold": 0.7   // 70% = suspicious
      },
      "main": {
        "strictMode": false,   // Warn only
        "riskThreshold": 0.8   // Higher bar for main
      },
      "email-processor": {
        "strictMode": true,    // Always block
        "riskThreshold": 0.6   // More paranoid
      }
    }
    ```
    
    ---
    
    ## Threat Categories
    
    | Category | Risk | Description |
    |----------|------|-------------|
    | `command_injection` | šŸ”“ Critical | Shell commands, code execution |
    | `credential_theft` | šŸ”“ Critical | API key extraction attempts |
    | `data_exfiltration` | šŸ”“ Critical | Data leak to external URLs |
    | `instruction_override` | šŸ”“ High | Jailbreaks, "ignore previous" |
    | `impersonation` | šŸ”“ High | Fake system/admin messages |
    | `discovery` | āš ļø Medium | API/capability probing |
    
    ---
    
    ## Installation
    
    ```bash
    npx hopeid setup
    ```
    
    Then restart OpenClaw.
    
    ---
    
    ## Links
    
    - **GitHub**: https://github.com/E-x-O-Entertainment-Studios-Inc/hopeIDS
    - **npm**: https://www.npmjs.com/package/hopeid
    - **Docs**: https://exohaven.online/products/hopeids