🦞
avatar-video-messages

Generate and send video messages
SKILL.md
---
name: video-message
description: Generate and send video messages with a lip-syncing VRM avatar. Use when user asks for video message, avatar video, video reply, or when TTS should be delivered as video instead of audio.
metadata:
  {
    "openclaw":
      {
        "emoji": "🎥",
        "requires": { "bins": ["ffmpeg", "avatarcam"] },
        "install":
          [
            {
              "id": "npm",
              "kind": "npm",
              "package": "@thewulf7/openclaw-avatarcam",
              "global": true,
              "bins": ["avatarcam"],
              "label": "Install avatarcam (npm)",
            },
            {
              "id": "brew",
              "kind": "brew",
              "formula": "ffmpeg",
              "bins": ["ffmpeg"],
              "label": "Install ffmpeg (brew)",
            },
            {
              "id": "apt",
              "kind": "apt",
              "packages": ["xvfb", "xauth"],
              "label": "Install headless X dependencies (Linux only)",
            },
          ],
      },
  }
---

# Video Message

Generate avatar video messages from text or audio. Outputs as Telegram video notes (circular format).

## Installation

```bash
npm install -g openclaw-avatarcam
```

## Configuration

Configure in `TOOLS.md`:

```markdown
### Video Message (avatarcam)
- avatar: default.vrm
- background: #00FF00
```

### Settings Reference

| Setting | Default | Description |
|---------|---------|-------------|
| `avatar` | `default.vrm` | VRM avatar file path |
| `background` | `#00FF00` | Color (hex) or image path |

## Prerequisites

### System Dependencies

| Platform | Command |
|----------|---------|
| **macOS** | `brew install ffmpeg` |
| **Linux** | `sudo apt-get install -y xvfb xauth ffmpeg` |
| **Windows** | Install ffmpeg and add to PATH |
| **Docker** | See Docker section below |

> **Note:** macOS and Windows don't need xvfb — they have native display support.

### Docker Users
Add to `OPENCLAW_DOCKER_APT_PACKAGES`:
```
build-essential procps curl file git ca-certificates xvfb xauth libgbm1 libxss1 libatk1.0-0 libatk-bridge2.0-0 libgdk-pixbuf2.0-0 libgtk-3-0 libasound2 libnss3 ffmpeg
```

## Usage

```bash
# With color background
avatarcam --audio voice.mp3 --output video.mp4 --background "#00FF00"

# With image background
avatarcam --audio voice.mp3 --output video.mp4 --background "./bg.png"

# With custom avatar
avatarcam --audio voice.mp3 --output video.mp4 --avatar "./custom.vrm"
```

## Sending as Video Note

Use OpenClaw's `message` tool with `asVideoNote`:

```
message action=send filePath=/tmp/video.mp4 asVideoNote=true
```

## Workflow

1. **Read config** from TOOLS.md (avatar, background)
2. **Generate TTS** if given text: `tts text="..."` → audio path
3. **Run avatarcam** with audio + settings → MP4 output
4. **Send as video note** via `message action=send filePath=... asVideoNote=true`
5. **Return NO_REPLY** after sending

## Example Flow

User: "Send me a video message saying hello"

```bash
# 1. TTS
tts text="Hello! How are you today?" → /tmp/voice.mp3

# 2. Generate video
avatarcam --audio /tmp/voice.mp3 --output /tmp/video.mp4 --background "#00FF00"

# 3. Send as video note
message action=send filePath=/tmp/video.mp4 asVideoNote=true

# 4. Reply
NO_REPLY
```

## Technical Details

| Setting | Value |
|---------|-------|
| Resolution | 384x384 (square) |
| Frame rate | 30fps constant |
| Max duration | 60 seconds |
| Video codec | H.264 (libx264) |
| Audio codec | AAC |
| Quality | CRF 18 (high quality) |
| Container | MP4 |

### Processing Pipeline
1. Electron renders VRM avatar with lip sync at 1280x720
2. WebM captured via `canvas.captureStream(30)`
3. FFmpeg processes: crop → fps normalize → scale → encode
4. Message tool sends via Telegram `sendVideoNote` API

## Platform Support

| Platform | Display | Notes |
|----------|---------|-------|
| macOS | Native Quartz | No extra deps |
| Linux | xvfb (headless) | `apt install xvfb` |
| Windows | Native | No extra deps |

## Headless Rendering

Avatarcam auto-detects headless environments:
- Uses `xvfb-run` when `$DISPLAY` is not set (Linux only)
- macOS/Windows use native display
- GPU stall warnings are safe to ignore
- Generation time: ~1.5x realtime (20s audio ≈ 30s processing)

## Notes

- Config is read from TOOLS.md
- Clean up temp files after sending: `rm /tmp/video*.mp4`
- For regular video (not circular), omit `asVideoNote=true`