Back to Skills
    🦞

    agent-observability-dashboard

    Unified observability

    By @orosha-ai
    View on GitHub
    SKILL.md
    # Agent Observability Dashboard πŸ“Š
    
    Unified observability for OpenClaw agents β€” metrics, traces, and performance insights.
    
    ## What It Does
    
    OpenClaw agents need production-grade visibility. Multiple platforms exist (Langfuse, Langsmith, AgentOps) but no unified view.
    
    **Agent Observability Dashboard** provides:
    - **Metrics tracking** β€” Latency, success rate, token usage, error counts
    - **Trace visualization** β€” Tool chains, decision flows, session timelines
    - **Cross-agent aggregation** β€” Compare performance across multiple agents/sessions
    - **Exportable reports** β€” JSON, CSV, markdown for human review
    - **Alert thresholds** β€” Notify when metrics exceed limits
    
    ## Problem It Solves
    
    - No centralized view of OpenClaw agent performance
    - Hard to debug across multiple tool calls
    - No way to compare agents or track regressions
    - Production monitoring is enterprise-grade; agents need the same
    
    ## Usage
    
    ```bash
    # Start dashboard server
    python3 scripts/observability.py --dashboard
    
    # Record metrics from a session
    python3 scripts/observability.py --record --session agent:main --latency 1.5 --success true
    
    # View session trace
    python3 scripts/observability.py --trace --session agent:main:12345
    
    # Get performance report
    python3 scripts/observability.py --report --period 24h
    
    # Export to CSV
    python3 scripts/observability.py --export metrics.csv
    
    # Set alert thresholds
    python3 scripts/observability.py --alert --metric latency --threshold 5.0
    ```
    
    ## Metrics Tracked
    
    | Category | Metric | Description |
    |-----------|---------|-------------|
    | **Performance** | Latency | Tool call latency (ms) |
    | | Throughput | Calls per second |
    | **Success** | Success Rate | % of successful tool calls |
    | | Error Count | Failed operations |
    | **Cost** | Token Usage | Input + output tokens |
    | | API Cost | Estimated cost in USD |
    | **Quality** | Hallucinations | Detected false outputs |
    | | Corrections Needed | User corrections |
    
    ## Trace Format
    
    Each tool call is logged with:
    - Timestamp
    - Agent session ID
    - Tool name + parameters
    - Latency
    - Success/failure
    - Token usage
    - Error details (if failed)
    
    Example trace:
    ```json
    {
      "session_id": "agent:main:12345",
      "trace": [
        {
          "timestamp": "2026-01-31T14:00:00Z",
          "tool": "web_search",
          "params": {"query": "agent observability"},
          "latency_ms": 1234,
          "success": true,
          "tokens_used": 150
        },
        {
          "timestamp": "2026-01-31T14:00:02Z",
          "tool": "memory_write",
          "params": {"content": "..."},
          "latency_ms": 45,
          "success": true,
          "tokens_used": 0
        }
      ]
    }
    ```
    
    ## Architecture
    
    ```
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Instrumentationβ”‚  ← Auto-capture from OpenClaw logs
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Metrics Store  β”‚  ← SQLite/InfluxDB for time-series
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Analytics      β”‚  ← Aggregations, trends, anomalies
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Dashboard UI  β”‚  ← Web interface (Flask/FastAPI)
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    ```
    
    ## Requirements
    
    - Python 3.9+
    - flask (for dashboard web UI)
    - pandas (for analytics)
    - influxdb-client (optional, for production storage)
    
    ## Installation
    
    ```bash
    # Clone repo
    git clone https://github.com/orosha-ai/agent-observability-dashboard
    
    # Install dependencies
    pip install flask pandas influxdb-client
    
    # Run dashboard
    python3 scripts/observability.py --dashboard
    # Open http://localhost:5000
    ```
    
    ## Inspiration
    
    - **Dynatrace AI Observability App** β€” Enterprise-grade unified observability
    - **Langfuse vs AgentOps benchmarks** β€” Comparison of platforms
    - **Microsoft .NET tracing guide** β€” Practical implementation patterns
    - **OpenLLMetry** β€” OpenTelemetry integration for LLMs
    
    ## Local-Only Promise
    
    - Metrics stored locally (SQLite/InfluxDB)
    - Dashboard runs locally
    - No data sent to external services
    
    ## Version History
    
    - **v0.1** β€” MVP: Metrics tracking, trace visualization, dashboard UI
    - Roadmap: InfluxDB integration, anomaly detection, multi-agent comparison