Back to Skills
    🦞

    agent-browser

    Automates browser interactions for web testing, form

    By @murphykobe
    View on GitHub
    SKILL.md
    ---
    name: agent-browser
    description: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
    allowed-tools: Bash(agent-browser:*)
    ---
    
    # Browser Automation with agent-browser
    
    ## Quick start
    
    ```bash
    agent-browser open <url>        # Navigate to page
    agent-browser snapshot -i       # Get interactive elements with refs
    agent-browser click @e1         # Click element by ref
    agent-browser fill @e2 "text"   # Fill input by ref
    agent-browser close             # Close browser
    ```
    
    ## Core workflow
    
    1. Navigate: `agent-browser open <url>`
    2. Snapshot: `agent-browser snapshot -i` (returns elements with refs like `@e1`, `@e2`)
    3. Interact using refs from the snapshot
    4. Re-snapshot after navigation or significant DOM changes
    
    ## Commands
    
    ### Navigation
    
    ```bash
    agent-browser open <url>      # Navigate to URL (aliases: goto, navigate)
                                  # Supports: https://, http://, file://, about:, data://
                                  # Auto-prepends https:// if no protocol given
    agent-browser back            # Go back
    agent-browser forward         # Go forward
    agent-browser reload          # Reload page
    agent-browser close           # Close browser (aliases: quit, exit)
    agent-browser connect 9222    # Connect to browser via CDP port
    ```
    
    ### Snapshot (page analysis)
    
    ```bash
    agent-browser snapshot            # Full accessibility tree
    agent-browser snapshot -i         # Interactive elements only (recommended)
    agent-browser snapshot -c         # Compact output
    agent-browser snapshot -d 3       # Limit depth to 3
    agent-browser snapshot -s "#main" # Scope to CSS selector
    ```
    
    ### Interactions (use @refs from snapshot)
    
    ```bash
    agent-browser click @e1           # Click
    agent-browser dblclick @e1        # Double-click
    agent-browser focus @e1           # Focus element
    agent-browser fill @e2 "text"     # Clear and type
    agent-browser type @e2 "text"     # Type without clearing
    agent-browser press Enter         # Press key (alias: key)
    agent-browser press Control+a     # Key combination
    agent-browser keydown Shift       # Hold key down
    agent-browser keyup Shift         # Release key
    agent-browser hover @e1           # Hover
    agent-browser check @e1           # Check checkbox
    agent-browser uncheck @e1         # Uncheck checkbox
    agent-browser select @e1 "value"  # Select dropdown option
    agent-browser select @e1 "a" "b"  # Select multiple options
    agent-browser scroll down 500     # Scroll page (default: down 300px)
    agent-browser scrollintoview @e1  # Scroll element into view (alias: scrollinto)
    agent-browser drag @e1 @e2        # Drag and drop
    agent-browser upload @e1 file.pdf # Upload files
    ```
    
    ### Get information
    
    ```bash
    agent-browser get text @e1        # Get element text
    agent-browser get html @e1        # Get innerHTML
    agent-browser get value @e1       # Get input value
    agent-browser get attr @e1 href   # Get attribute
    agent-browser get title           # Get page title
    agent-browser get url             # Get current URL
    agent-browser get count ".item"   # Count matching elements
    agent-browser get box @e1         # Get bounding box
    agent-browser get styles @e1      # Get computed styles (font, color, bg, etc.)
    ```
    
    ### Check state
    
    ```bash
    agent-browser is visible @e1      # Check if visible
    agent-browser is enabled @e1      # Check if enabled
    agent-browser is checked @e1      # Check if checked
    ```
    
    ### Screenshots & PDF
    
    ```bash
    agent-browser screenshot          # Screenshot to stdout
    agent-browser screenshot path.png # Save to file
    agent-browser screenshot --full   # Full page
    agent-browser pdf output.pdf      # Save as PDF
    ```
    
    ### Video recording
    
    ```bash
    agent-browser record start ./demo.webm    # Start recording (uses current URL + state)
    agent-browser click @e1                   # Perform actions
    agent-browser record stop                 # Stop and save video
    agent-browser record restart ./take2.webm # Stop current + start new recording
    ```
    
    Recording creates a fresh context but preserves cookies/storage from your session. If no URL is provided, it
    automatically returns to your current page. For smooth demos, explore first, then start recording.
    
    ### Wait
    
    ```bash
    agent-browser wait @e1                     # Wait for element
    agent-browser wait 2000                    # Wait milliseconds
    agent-browser wait --text "Success"        # Wait for text (or -t)
    agent-browser wait --url "**/dashboard"    # Wait for URL pattern (or -u)
    agent-browser wait --load networkidle      # Wait for network idle (or -l)
    agent-browser wait --fn "window.ready"     # Wait for JS condition (or -f)
    ```
    
    ### Mouse control
    
    ```bash
    agent-browser mouse move 100 200      # Move mouse
    agent-browser mouse down left         # Press button
    agent-browser mouse up left           # Release button
    agent-browser mouse wheel 100         # Scroll wheel
    ```
    
    ### Semantic locators (alternative to refs)
    
    ```bash
    agent-browser find role button click --name "Submit"
    agent-browser find text "Sign In" click
    agent-browser find text "Sign In" click --exact      # Exact match only
    agent-browser find label "Email" fill "user@test.com"
    agent-browser find placeholder "Search" type "query"
    agent-browser find alt "Logo" click
    agent-browser find title "Close" click
    agent-browser find testid "submit-btn" click
    agent-browser find first ".item" click
    agent-browser find last ".item" click
    agent-browser find nth 2 "a" hover
    ```
    
    ### Browser settings
    
    ```bash
    agent-browser set viewport 1920 1080          # Set viewport size
    agent-browser set device "iPhone 14"          # Emulate device
    agent-browser set geo 37.7749 -122.4194       # Set geolocation (alias: geolocation)
    agent-browser set offline on                  # Toggle offline mode
    agent-browser set headers '{"X-Key":"v"}'     # Extra HTTP headers
    agent-browser set credentials user pass       # HTTP basic auth (alias: auth)
    agent-browser set media dark                  # Emulate color scheme
    agent-browser set media light reduced-motion  # Light mode + reduced motion
    ```
    
    ### Cookies & Storage
    
    ```bash
    agent-browser cookies                     # Get all cookies
    agent-browser cookies set name value      # Set cookie
    agent-browser cookies clear               # Clear cookies
    agent-browser storage local               # Get all localStorage
    agent-browser storage local key           # Get specific key
    agent-browser storage local set k v       # Set value
    agent-browser storage local clear         # Clear all
    ```
    
    ### Network
    
    ```bash
    agent-browser network route <url>              # Intercept requests
    agent-browser network route <url> --abort      # Block requests
    agent-browser network route <url> --body '{}'  # Mock response
    agent-browser network unroute [url]            # Remove routes
    agent-browser network requests                 # View tracked requests
    agent-browser network requests --filter api    # Filter requests
    ```
    
    ### Tabs & Windows
    
    ```bash
    agent-browser tab                 # List tabs
    agent-browser tab new [url]       # New tab
    agent-browser tab 2               # Switch to tab by index
    agent-browser tab close           # Close current tab
    agent-browser tab close 2         # Close tab by index
    agent-browser window new          # New window
    ```
    
    ### Frames
    
    ```bash
    agent-browser frame "#iframe"     # Switch to iframe
    agent-browser frame main          # Back to main frame
    ```
    
    ### Dialogs
    
    ```bash
    agent-browser dialog accept [text]  # Accept dialog
    agent-browser dialog dismiss        # Dismiss dialog
    ```
    
    ### JavaScript
    
    ```bash
    agent-browser eval "document.title"   # Run JavaScript
    ```
    
    ## Global options
    
    ```bash
    agent-browser --session <name> ...    # Isolated browser session
    agent-browser --json ...              # JSON output for parsing
    agent-browser --headed ...            # Show browser window (not headless)
    agent-browser --full ...              # Full page screenshot (-f)
    agent-browser --cdp <port> ...        # Connect via Chrome DevTools Protocol
    agent-browser -p <provider> ...       # Cloud browser provider (--provider)
    agent-browser --proxy <url> ...       # Use proxy server
    agent-browser --headers <json> ...    # HTTP headers scoped to URL's origin
    agent-browser --executable-path <p>   # Custom browser executable
    agent-browser --extension <path> ...  # Load browser extension (repeatable)
    agent-browser --help                  # Show help (-h)
    agent-browser --version               # Show version (-V)
    agent-browser <command> --help        # Show detailed help for a command
    ```
    
    ### Proxy support
    
    ```bash
    agent-browser --proxy http://proxy.com:8080 open example.com
    agent-browser --proxy http://user:pass@proxy.com:8080 open example.com
    agent-browser --proxy socks5://proxy.com:1080 open example.com
    ```
    
    ## Environment variables
    
    ```bash
    AGENT_BROWSER_SESSION="mysession"            # Default session name
    AGENT_BROWSER_EXECUTABLE_PATH="/path/chrome" # Custom browser path
    AGENT_BROWSER_EXTENSIONS="/ext1,/ext2"       # Comma-separated extension paths
    AGENT_BROWSER_PROVIDER="your-cloud-browser-provider"  # Cloud browser provider (select browseruse or browserbase)
    AGENT_BROWSER_STREAM_PORT="9223"             # WebSocket streaming port
    AGENT_BROWSER_HOME="/path/to/agent-browser"  # Custom install location (for daemon.js)
    ```
    
    ## Example: Form submission
    
    ```bash
    agent-browser open https://example.com/form
    agent-browser snapshot -i
    # Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]
    
    agent-browser fill @e1 "user@example.com"
    agent-browser fill @e2 "password123"
    agent-browser click @e3
    agent-browser wait --load networkidle
    agent-browser snapshot -i  # Check result
    ```
    
    ## Example: Authentication with saved state
    
    ```bash
    # Login once
    agent-browser open https://app.example.com/login
    agent-browser snapshot -i
    agent-browser fill @e1 "username"
    agent-browser fill @e2 "passwor
    
    ... (truncated)