Back to Skills
    šŸ¦ž

    skill-reviewer

    Review and audit agent skills (SKILL.md files)

    By @gitgoodordietrying
    View on GitHub
    SKILL.md
    ---
    name: skill-reviewer
    description: Review and audit agent skills (SKILL.md files) for quality, correctness, and effectiveness. Use when evaluating a skill before publishing, reviewing someone else's skill, scoring skill quality, identifying defects in skill content, or improving an existing skill.
    metadata: {"clawdbot":{"emoji":"šŸ”","requires":{"anyBins":["npx"]},"os":["linux","darwin","win32"]}}
    ---
    
    # Skill Reviewer
    
    Audit agent skills (SKILL.md files) for quality, correctness, and completeness. Provides a structured review framework with scoring rubric, defect checklists, and improvement recommendations.
    
    ## When to Use
    
    - Reviewing a skill before publishing to the registry
    - Evaluating a skill you downloaded from the registry
    - Auditing your own skills for quality improvements
    - Comparing skills in the same category
    - Deciding whether a skill is worth installing
    
    ## Review Process
    
    ### Step 1: Structural Check
    
    Verify the skill has the required structure. Read the file and check each item:
    
    ```
    STRUCTURAL CHECKLIST:
    [ ] Valid YAML frontmatter (opens and closes with ---)
    [ ] `name` field present and is a valid slug (lowercase, hyphenated)
    [ ] `description` field present and non-empty
    [ ] `metadata` field present with valid JSON
    [ ] `metadata.clawdbot.emoji` is a single emoji
    [ ] `metadata.clawdbot.requires.anyBins` lists real CLI tools
    [ ] Title heading (# Title) immediately after frontmatter
    [ ] Summary paragraph after title
    [ ] "When to Use" section present
    [ ] At least 3 main content sections
    [ ] "Tips" section present at the end
    ```
    
    ### Step 2: Frontmatter Quality
    
    #### Description field audit
    
    The description is the most impactful field. Evaluate it against these criteria:
    
    ```
    DESCRIPTION SCORING:
    
    [2] Starts with what the skill does (active verb)
        GOOD: "Write Makefiles for any project type."
        BAD:  "This skill covers Makefiles."
        BAD:  "A comprehensive guide to Make."
    
    [2] Includes trigger phrases ("Use when...")
        GOOD: "Use when setting up build automation, defining multi-target builds"
        BAD:  No trigger phrases at all
    
    [2] Specific scope (mentions concrete tools, languages, or operations)
        GOOD: "SQLite/PostgreSQL/MySQL — schema design, queries, CTEs, window functions"
        BAD:  "Database stuff"
    
    [1] Reasonable length (50-200 characters)
        TOO SHORT: "Make things" (no search surface)
        TOO LONG:  300+ characters (gets truncated)
    
    [1] Contains searchable keywords naturally
        GOOD: "cron jobs, systemd timers, scheduling"
        BAD:  Keywords stuffed unnaturally
    
    Score: __/8
    ```
    
    #### Metadata audit
    
    ```
    METADATA SCORING:
    
    [1] emoji is relevant to the skill topic
    [1] requires.anyBins lists tools the skill actually uses (not generic tools like "bash")
    [1] os array is accurate (don't claim win32 if commands are Linux-only)
    [1] JSON is valid (test with a JSON parser)
    
    Score: __/4
    ```
    
    ### Step 3: Content Quality
    
    #### Example density
    
    Count code blocks and total lines:
    
    ```
    EXAMPLE DENSITY:
    
    Lines:       ___
    Code blocks: ___
    Ratio:       1 code block per ___ lines
    
    TARGET: 1 code block per 8-15 lines
    < 8  lines per block: possibly over-fragmented
    > 20 lines per block: needs more examples
    ```
    
    #### Example quality
    
    For each code block, check:
    
    ```
    EXAMPLE QUALITY CHECKLIST:
    
    [ ] Language tag specified (```bash, ```python, etc.)
    [ ] Command is syntactically correct
    [ ] Output shown in comments where helpful
    [ ] Uses realistic values (not foo/bar/baz)
    [ ] No placeholder values left (TODO, FIXME, xxx)
    [ ] Self-contained (doesn't depend on undefined variables)
        OR setup is shown/referenced
    [ ] Covers the common case (not just edge cases)
    ```
    
    Score each example 0-3:
    - **0**: Broken or misleading
    - **1**: Works but minimal (no output, no context)
    - **2**: Good (correct, has output or explanation)
    - **3**: Excellent (copy-pasteable, realistic, covers edge case)
    
    #### Section organization
    
    ```
    ORGANIZATION SCORING:
    
    [2] Organized by task/scenario (not by abstract concept)
        GOOD: "## Encode and Decode" → "## Inspect Characters" → "## Convert Formats"
        BAD:  "## Theory" → "## Types" → "## Advanced"
    
    [2] Most common operations come first
        GOOD: Basic usage → Variations → Advanced → Edge cases
        BAD:  Configuration → Theory → Finally the basic usage
    
    [1] Sections are self-contained (can be used independently)
    
    [1] Consistent depth (not mixing h2 with h4 randomly)
    
    Score: __/6
    ```
    
    #### Cross-platform accuracy
    
    ```
    PLATFORM CHECKLIST:
    
    [ ] macOS differences noted where relevant
        (sed -i '' vs sed -i, brew vs apt, BSD vs GNU flags)
    [ ] Linux distro variations noted (apt vs yum vs pacman)
    [ ] Windows compatibility addressed if os includes "win32"
    [ ] Tool version assumptions stated (Docker v2 syntax, Python 3.x)
    ```
    
    ### Step 4: Actionability Assessment
    
    The core question: can an agent follow these instructions to produce correct results?
    
    ```
    ACTIONABILITY SCORING:
    
    [3] Instructions are imperative ("Run X", "Create Y")
        NOT: "You might consider..." or "It's recommended to..."
    
    [3] Steps are ordered logically (prerequisites before actions)
    
    [2] Error cases addressed (what to do when something fails)
    
    [2] Output/result described (how to verify it worked)
    
    Score: __/10
    ```
    
    ### Step 5: Tips Section Quality
    
    ```
    TIPS SCORING:
    
    [2] 5-10 tips present
    
    [2] Tips are non-obvious (not "read the documentation")
        GOOD: "The number one Makefile bug: spaces instead of tabs"
        BAD:  "Make sure to test your code"
    
    [2] Tips are specific and actionable
        GOOD: "Use flock to prevent overlapping cron runs"
        BAD:  "Be careful with concurrent execution"
    
    [1] No tips contradict the main content
    
    [1] Tips cover gotchas/footguns specific to this topic
    
    Score: __/8
    ```
    
    ## Scoring Summary
    
    ```
    SKILL REVIEW SCORECARD
    ═══════════════════════════════════════
    Skill: [name]
    Reviewer: [agent/human]
    Date: [date]
    
    Category              Score    Max
    ─────────────────────────────────────
    Structure             __       11
    Description           __        8
    Metadata              __        4
    Example density       __        3*
    Example quality       __        3*
    Organization          __        6
    Actionability         __       10
    Tips                  __        8
    ─────────────────────────────────────
    TOTAL                 __       53+
    
    * Example density and quality are per-sample,
      not summed. Use the average across all examples.
    
    RATING:
      45+  Excellent — publish-ready
      35-44 Good — minor improvements needed
      25-34 Fair — significant gaps to address
      < 25  Poor — needs major rework
    
    VERDICT: [PUBLISH / REVISE / REWORK]
    ```
    
    ## Common Defects
    
    ### Critical (blocks publishing)
    
    ```
    DEFECT: Invalid frontmatter
    DETECT: YAML parse error, missing required fields
    FIX:    Validate YAML, ensure name/description/metadata all present
    
    DEFECT: Broken code examples
    DETECT: Syntax errors, undefined variables, wrong flags
    FIX:    Test every command in a clean environment
    
    DEFECT: Wrong tool requirements
    DETECT: metadata.requires lists tools not used in content, or omits tools that are used
    FIX:    Grep content for command names, update requires to match
    
    DEFECT: Misleading description
    DETECT: Description promises coverage the content doesn't deliver
    FIX:    Align description with actual content, or add missing content
    ```
    
    ### Major (should fix before publishing)
    
    ```
    DEFECT: No "When to Use" section
    IMPACT: Agent doesn't know when to activate the skill
    FIX:    Add 4-8 bullet points describing trigger scenarios
    
    DEFECT: Text walls without examples
    DETECT: Any section > 10 lines with no code block
    FIX:    Add concrete examples for every concept described
    
    DEFECT: Examples missing language tags
    DETECT: ``` without language identifier
    FIX:    Add bash, python, javascript, yaml, etc. to every code fence
    
    DEFECT: No Tips section
    IMPACT: Missing the distilled expertise that makes a skill valuable
    FIX:    Add 5-10 non-obvious, actionable tips
    
    DEFECT: Abstract organization
    DETECT: Sections named "Theory", "Overview", "Background", "Introduction"
    FIX:    Reorganize by task/operation: what the user is trying to DO
    ```
    
    ### Minor (nice to fix)
    
    ```
    DEFECT: Placeholder values
    DETECT: foo, bar, baz, example.com, 1.2.3.4, TODO, FIXME
    FIX:    Replace with realistic values (myapp, api.example.com, 192.168.1.100)
    
    DEFECT: Inconsistent formatting
    DETECT: Mixed heading levels, inconsistent code block style
    FIX:    Standardize heading hierarchy and formatting
    
    DEFECT: Missing cross-references
    DETECT: Mentions tools/concepts covered by other skills without referencing them
    FIX:    Add "See the X skill for more on Y" notes
    
    DEFECT: Outdated commands
    DETECT: docker-compose (v1), python (not python3), npm -g without npx alternative
    FIX:    Update to current tool versions and syntax
    ```
    
    ## Comparative Review
    
    When comparing skills in the same category:
    
    ```
    COMPARATIVE CRITERIA:
    
    1. Coverage breadth
       Which skill covers more use cases?
    
    2. Example quality
       Which has more runnable, realistic examples?
    
    3. Depth on common operations
       Which handles the 80% case better?
    
    4. Edge case coverage
       Which addresses more gotchas and failure modes?
    
    5. Cross-platform support
       Which works across more environments?
    
    6. Freshness
       Which uses current tool versions and syntax?
    
    WINNER: [skill A / skill B / tie]
    REASON: [1-2 sentence justification]
    ```
    
    ## Quick Review Template
    
    For a fast review when you don't need full scoring:
    
    ```markdown
    ## Quick Review: [skill-name]
    
    **Structure**: [OK / Issues: ...]
    **Description**: [Strong / Weak: reason]
    **Examples**: [X code blocks across Y lines — density OK/low/high]
    **Actionability**: [Agent can/cannot follow these instructions because...]
    **Top defect**: [The single most impactful thing to fix]
    **Verdict**: [PUBLISH / REVISE / REWORK]
    ```
    
    ## Review Workflow
    
    ### Reviewing your own skill before publishing
    
    ```bash
    # 1. Validate frontmatter
    head -20 skills/my-skill/SKILL.md
    # Visually confirm YAML is valid
    
    # 2. Count code blocks
    grep -c '```' skills/my-skill/SKILL.md
    # Divide total lines by thi
    
    ... (truncated)