---
name: atl-browser
description: Mobile browser and native app automation via ATL (iOS Simulator). Navigate, click, screenshot, and automate web and native app tasks on iPhone/iPad simulators.
metadata:
openclaw:
emoji: "š±"
requires:
bins: ["xcrun", "xcodebuild", "curl"]
install:
- id: "atl-clone"
kind: "shell"
command: "git clone https://github.com/JordanCoin/Atl ~/Atl"
label: "Clone ATL repository"
- id: "atl-setup"
kind: "shell"
command: "~/.openclaw/skills/atl-browser/scripts/setup.sh"
label: "Build and install ATL to simulator"
---
# ATL ā Agent Touch Layer
> The automation layer between AI agents and iOS
ATL provides HTTP-based automation for iOS Simulator ā both **browser** (mobile Safari) and **native apps**. Think Playwright, but for mobile.
## š Two Servers: Browser & Native
ATL uses **two separate servers** for browser and native app automation:
| Server | Port | Use Case | Key Commands |
|--------|------|----------|--------------|
| **Browser** | `9222` | Web automation in mobile Safari | `goto`, `markElements`, `clickMark`, `evaluate` |
| **Native** | `9223` | iOS app automation (Settings, Contacts, any app) | `openApp`, `snapshot`, `tapRef`, `find` |
```
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā BROWSER SERVER (9222) ā NATIVE SERVER (9223) ā
ā (mobile Safari/WebView) ā (iOS apps via XCTest) ā
ā ā ā
ā markElements + clickMark ā snapshot + tapRef ā
ā CSS selectors ā accessibility tree ā
ā DOM evaluation ā element references ā
ā tap, swipe, screenshot ā tap, swipe, screenshot ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
```
**Why two ports?** Native app automation requires XCTest APIs (XCUIApplication, XCUIElement) which are only available in UI Test bundles. The native server runs as a UI Test that exposes an HTTP API.
### Starting the Servers
```bash
# Browser server (starts automatically with AtlBrowser app)
xcrun simctl launch booted com.atl.browser
curl http://localhost:9222/ping # ā {"status":"ok"}
# Native server (run as UI Test)
cd ~/Atl/core/AtlBrowser
xcodebuild test -workspace AtlBrowser.xcworkspace \
-scheme AtlBrowser \
-destination 'id=<SIMULATOR_UDID>' \
-only-testing:AtlBrowserUITests/NativeServer/testNativeServer &
# Wait for it to start, then:
curl http://localhost:9223/ping # ā {"status":"ok","mode":"native"}
```
### Quick Port Reference
| Task | Port | Example |
|------|------|---------|
| Browse websites | 9222 | `curl localhost:9222/command -d '{"method":"goto",...}'` |
| Open native app | 9223 | `curl localhost:9223/command -d '{"method":"openApp",...}'` |
| Screenshot (browser) | 9222 | `curl localhost:9222/command -d '{"method":"screenshot"}'` |
| Screenshot (native) | 9223 | `curl localhost:9223/command -d '{"method":"screenshot"}'` |
---
## š± Native App Automation (Port 9223)
Native automation uses **port 9223** and automates **any iOS app** using the accessibility tree ā no DOM, no JavaScript, just direct element interaction.
### Opening & Closing Apps
```bash
# Open an app by bundle ID
curl -s -X POST http://localhost:9223/command \
-d '{"method":"openApp","params":{"bundleId":"com.apple.Preferences"}}'
# ā {"success":true,"result":{"bundleId":"com.apple.Preferences","mode":"native","state":"running"}}
# Check current app state
curl -s -X POST http://localhost:9223/command \
-d '{"method":"appState"}'
# ā {"success":true,"result":{"mode":"native","bundleId":"com.apple.Preferences","state":"running"}}
# Close current app
curl -s -X POST http://localhost:9223/command \
-d '{"method":"closeApp"}'
# ā {"success":true,"result":{"closed":true}}
```
### Common Bundle IDs
| App | Bundle ID |
|-----|-----------|
| Settings | `com.apple.Preferences` |
| Contacts | `com.apple.MobileAddressBook` |
| Calculator | `com.apple.calculator` |
| Calendar | `com.apple.mobilecal` |
| Photos | `com.apple.mobileslideshow` |
| Notes | `com.apple.mobilenotes` |
| Reminders | `com.apple.reminders` |
| Clock | `com.apple.mobiletimer` |
| Maps | `com.apple.Maps` |
| Safari | `com.apple.mobilesafari` |
### The `snapshot` Command
`snapshot` returns the accessibility tree ā all visible elements with their properties and tap-able references.
```bash
curl -s -X POST http://localhost:9223/command \
-d '{"method":"snapshot","params":{"interactiveOnly":true}}' | jq '.result'
```
**Example output:**
```json
{
"count": 12,
"elements": [
{
"ref": "e0",
"type": "cell",
"label": "Wi-Fi",
"value": "MyNetwork",
"identifier": "",
"x": 0,
"y": 142,
"width": 393,
"height": 44,
"isHittable": true,
"isEnabled": true
},
{
"ref": "e1",
"type": "cell",
"label": "Bluetooth",
"value": "On",
"identifier": "",
"x": 0,
"y": 186,
"width": 393,
"height": 44,
"isHittable": true,
"isEnabled": true
},
{
"ref": "e2",
"type": "button",
"label": "Back",
"value": null,
"identifier": "Back",
"x": 0,
"y": 44,
"width": 80,
"height": 44,
"isHittable": true,
"isEnabled": true
}
]
}
```
**Parameters:**
- `interactiveOnly` (bool, default: `false`) ā Only return hittable elements
- `maxDepth` (int, optional) ā Limit tree traversal depth
### The `tapRef` Command
Tap an element by its reference from the last `snapshot`:
```bash
# Take snapshot first
curl -s -X POST http://localhost:9223/command \
-d '{"method":"snapshot","params":{"interactiveOnly":true}}'
# Tap element e0 (Wi-Fi cell from example above)
curl -s -X POST http://localhost:9223/command \
-d '{"method":"tapRef","params":{"ref":"e0"}}'
# ā {"success":true}
```
### The `find` Command
Find and interact with elements by text ā no need to parse snapshot manually:
```bash
# Find and tap "Wi-Fi"
curl -s -X POST http://localhost:9223/command \
-d '{"method":"find","params":{"text":"Wi-Fi","action":"tap"}}'
# ā {"success":true,"result":{"found":true,"ref":"e0"}}
# Check if an element exists
curl -s -X POST http://localhost:9223/command \
-d '{"method":"find","params":{"text":"Bluetooth","action":"exists"}}'
# ā {"success":true,"result":{"found":true,"ref":"e1"}}
# Find and fill a text field
curl -s -X POST http://localhost:9223/command \
-d '{"method":"find","params":{"text":"First name","action":"fill","value":"John"}}'
# Get element info without interacting
curl -s -X POST http://localhost:9223/command \
-d '{"method":"find","params":{"text":"Cancel","action":"get"}}'
# ā {"success":true,"result":{"found":true,"ref":"e5","element":{...}}}
```
**Parameters:**
- `text` (string) ā Text to search for (matches label, value, or identifier)
- `action` (string) ā One of: `tap`, `fill`, `exists`, `get`
- `value` (string, optional) ā Text to fill (required for `action:"fill"`)
- `by` (string, optional) ā Narrow search: `label`, `value`, `identifier`, `type`, or `any` (default)
---
## š Native App Workflow Example
Here's a complete flow: open Settings, navigate to Wi-Fi, take a screenshot:
```bash
# 1. Open Settings app
curl -s -X POST http://localhost:9223/command \
-d '{"method":"openApp","params":{"bundleId":"com.apple.Preferences"}}'
# 2. Wait for app to launch
sleep 1
# 3. Take snapshot to see available elements
curl -s -X POST http://localhost:9223/command \
-d '{"method":"snapshot","params":{"interactiveOnly":true}}' | jq '.result.elements[:5]'
# 4. Find and tap Wi-Fi
curl -s -X POST http://localhost:9223/command \
-d '{"method":"find","params":{"text":"Wi-Fi","action":"tap"}}'
# 5. Wait for navigation
sleep 0.5
# 6. Take screenshot of Wi-Fi settings
curl -s -X POST http://localhost:9223/command \
-d '{"method":"screenshot"}' | jq -r '.result.data' | base64 -d > /tmp/wifi-settings.png
# 7. Navigate back (swipe right from left edge)
curl -s -X POST http://localhost:9223/command \
-d '{"method":"swipe","params":{"direction":"right"}}'
# 8. Close the app
curl -s -X POST http://localhost:9223/command \
-d '{"method":"closeApp"}'
```
### Helper Script Version
```bash
source ~/.openclaw/skills/atl-browser/scripts/atl-helper.sh
atl_openapp "com.apple.Preferences"
sleep 1
atl_find "Wi-Fi" tap
sleep 0.5
atl_screenshot /tmp/wifi-settings.png
atl_swipe right
atl_closeapp
```
---
## š” Core Insight: Vision-Free Automation
ATL's killer feature is **spatial understanding without vision models**:
```
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā markElements + captureForVision = COMPLETE PAGE KNOWLEDGE ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
1. markElements ā Numbers every interactive element [1] [2] [3]
2. captureForVision ā PDF with text layer + element coordinates
3. tap x=234 y=567 ā Pixel-perfect touch at exact position
```
**Why this matters:**
- **No vision API calls** ā zero token cost for "seeing" the page
- **Faster** ā no round-trip to GPT-4V/Claude Vision
- **Deterministic** ā same page = same coordinates, every time
- **Reliable** ā pixel-perfect coordinates vs. vision interpretation
### The Vision-Free Workflow
```bash
# 1. Mark elements (adds numbered labels + stores coordinates)
curl -s -X POST http://localhost:9222/command \
-d '{"id":"1","method":"markElements","params":{}}'
# 2. Capture PDF with text layer (machine-readable, has coordinates)
curl -s -X POST http://localhost:9222/command \
-d '{"id":"2","method":"captureForVision","params":{"savePath":"/tmp","name":"page"}}' \
| jq -r '.result.path'
# ā /tmp/page.pdf (text-selectable, contains element positions)
# 3. Get specific element's position by mark label
curl -s -X POST http://localhost:9222/command \
-d '{"id":"3","method":"getMarkInfo","params":{"label":5}}' | jq '
... (truncated)AI advertising agents that automates ad campaigns across Google Ads, Meta Ads, LinkedIn Ads, and TikTok Ads. Creates campaigns, reads live performance data, researches keywords with real CPC data, optimizes budgets, and manages ads through natural language via the Adspirer MCP server. 103 tools across 4 ad platforms.
Self-orchestrating multi-agent development workflows.
Complete guide for creating and deploying browser automation functions
Comprehensive guide for building AI workflows, agents