Back to Skills
    🦞

    meta-video-ad-analyzer

    Extract and analyze content from video ads

    By @fortytwode
    View on GitHub
    SKILL.md
    ---
    name: video-ad-analyzer
    version: 1.0.0
    description: Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions.
    ---
    
    # Video Ad Analyzer
    
    AI-powered video content extraction using Google Gemini Vision.
    
    ## What This Skill Does
    
    - **Frame Extraction**: Smart sampling with scene change detection
    - **OCR Text Detection**: Extract text overlays using EasyOCR
    - **Audio Transcription**: Convert speech to text with Google Cloud Speech
    - **AI Scene Analysis**: Describe each scene using Gemini Vision
    - **Native Video Analysis**: Direct video understanding for longer content
    - **Thumbnail Generation**: Auto-generate thumbnails from first frame
    
    ## Setup
    
    ### 1. Environment Variables
    
    ```bash
    # Required for Gemini Vision
    GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
    
    # Required for audio transcription
    # (same service account needs Speech-to-Text API enabled)
    ```
    
    ### 2. Dependencies
    
    ```bash
    pip install opencv-python pillow easyocr ffmpeg-python google-cloud-speech vertexai google-api-python-client
    ```
    
    Also requires `ffmpeg` and `ffprobe` installed on system.
    
    ## Usage
    
    ### Basic Video Analysis
    
    ```python
    from scripts.video_extractor import VideoExtractor
    from scripts.models import ExtractedVideoContent
    import vertexai
    from vertexai.generative_models import GenerativeModel
    
    # Initialize Vertex AI
    vertexai.init(project="your-project-id", location="us-central1")
    gemini_model = GenerativeModel("gemini-1.5-flash")
    
    # Create extractor
    extractor = VideoExtractor(gemini_model=gemini_model)
    
    # Analyze video
    result = extractor.extract_content("/path/to/video.mp4")
    
    print(f"Duration: {result.duration}s")
    print(f"Scenes: {len(result.scene_timeline)}")
    print(f"Text overlays: {len(result.text_timeline)}")
    print(f"Transcript: {result.transcript[:200]}...")
    ```
    
    ### Extract Only Frames
    
    ```python
    frames, timestamps, text_timeline, scene_timeline, thumbnail = extractor.extract_smart_frames(
        "/path/to/video.mp4",
        scene_interval=2,    # Check for scene changes every 2s
        text_interval=0.5    # Check for text every 0.5s
    )
    ```
    
    ### Analyze Images
    
    ```python
    # Works with images too
    result = extractor.extract_content("/path/to/image.jpg")
    print(result.scene_timeline[0]['description'])
    ```
    
    ## Output Structure
    
    ```python
    ExtractedVideoContent(
        video_path="/path/to/video.mp4",
        duration=30.5,
        transcript="Here's what we found...",
        text_timeline=[
            {"at": 0.0, "text": ["Download Now"]},
            {"at": 5.5, "text": ["50% Off Today"]}
        ],
        scene_timeline=[
            {"timestamp": 0.0, "description": "Woman using phone app..."},
            {"timestamp": 2.0, "description": "Product showcase with features..."}
        ],
        thumbnail_url="/static/thumbnails/video_thumb.jpg",
        extraction_complete=True
    )
    ```
    
    ## Key Features
    
    | Feature | Description |
    |---------|-------------|
    | Scene Detection | Histogram-based change detection (threshold=65) |
    | OCR Confidence | Tiered thresholds (0.5 high, 0.3 low) |
    | AI Proofreading | Gemini cleans up OCR errors |
    | Source Reconciliation | Merges OCR + Vision text intelligently |
    | Native Video | Direct Gemini analysis for <20MB files |
    
    ## Prompts
    
    Customize AI behavior by editing prompts in the `prompts/` folder:
    
    - `scene_analysis.md` - Frame analysis prompts
    - `scene_reconciliation.md` - Scene enrichment prompts
    
    ## Common Questions This Answers
    
    - "What text appears in this video ad?"
    - "Describe each scene in this creative"
    - "What does the narrator say?"
    - "Extract the call-to-action from this ad"