Back to Skills
    🊞

    clonev

    Clone any voice and generate speech using Coqui XTTS v2.

    By @instant-picture
    View on GitHub
    SKILL.md
    ---
    name: clonev
    description: Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) Clone their voice or someone else's voice, (2) Generate speech that sounds like a specific person, (3) Create personalized voice messages, (4) Multi-lingual voice cloning (speak any language with cloned voice).
    ---
    
    # CloneV Skill - Voice Cloning Made Simple
    
    ## ⚠ CRITICAL INSTRUCTIONS FOR AI MODELS
    
    **DO NOT** try to use Docker containers directly.
    **DO NOT** try to interact with `coqui-xtts` container - it is broken and restarting.
    **DO NOT** try to use APIs or servers.
    
    **ONLY USE THE SCRIPT:** `scripts/clonev.sh`
    
    The script handles everything automatically. Just call it with text, voice sample, and language.
    
    ---
    
    ## What This Skill Does
    
    Clones any voice from a short audio sample and generates new speech in that voice.
    
    **Input**: 
    - Text to speak
    - Voice sample (WAV file, 6-30 seconds)
    - Language code
    
    **Output**: OGG voice file (cloned voice speaking the text)
    
    **Works with**: Any voice! Yours, a celebrity, a character, etc.
    
    ---
    
    ## The ONE Command You Need
    
    ```bash
    $(scripts/clonev.sh "Your text here" /path/to/voice_sample.wav language)
    ```
    
    That's it! Nothing else needed.
    
    ---
    
    ## Step-by-Step Usage (FOR AI MODELS)
    
    ### Step 1: Get the required inputs
    - Text to speak (from user)
    - Path to voice sample WAV file (from user)
    - Language code (from user or default to `en`)
    
    ### Step 2: Run the script
    ```bash
    VOICE_FILE=$(scripts/clonev.sh "TEXT_HERE" "/path/to/sample.wav" LANGUAGE)
    ```
    
    ### Step 3: Use the output
    The variable `$VOICE_FILE` now contains the path to the generated OGG file.
    
    ---
    
    ## Complete Working Examples
    
    ### Example 1: Clone voice and send to Telegram
    ```bash
    # Generate cloned voice
    VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Hello, this is my cloned voice!" "/mnt/c/TEMP/Recording 25.wav" en)
    
    # Send to Telegram (as voice message)
    message action=send channel=telegram asVoice=true filePath="$VOICE"
    ```
    
    ### Example 2: Clone voice in Czech
    ```bash
    # Generate Czech voice
    VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj, tohle je můj hlas" "/mnt/c/TEMP/Recording 25.wav" cs)
    
    # Send
    message action=send channel=telegram asVoice=true filePath="$VOICE"
    ```
    
    ### Example 3: Full workflow with check
    ```bash
    #!/bin/bash
    
    # Generate voice
    VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Task completed!" "/path/to/sample.wav" en)
    
    # Verify file was created
    if [ -f "$VOICE" ]; then
        echo "Success! Voice file: $VOICE"
        ls -lh "$VOICE"
    else
        echo "Error: Voice file not created"
    fi
    ```
    
    ---
    
    ## Common Language Codes
    
    | Code | Language | Example Usage |
    |------|----------|---------------|
    | `en` | English | `scripts/clonev.sh "Hello" sample.wav en` |
    | `cs` | Czech | `scripts/clonev.sh "Ahoj" sample.wav cs` |
    | `de` | German | `scripts/clonev.sh "Hallo" sample.wav de` |
    | `fr` | French | `scripts/clonev.sh "Bonjour" sample.wav fr` |
    | `es` | Spanish | `scripts/clonev.sh "Hola" sample.wav es` |
    
    Full list: en, cs, de, fr, es, it, pl, pt, tr, ru, nl, ar, zh, ja, hu, ko
    
    ---
    
    ## Voice Sample Requirements
    
    - **Format**: WAV file
    - **Length**: 6-30 seconds (optimal: 10-15 seconds)
    - **Quality**: Clear audio, no background noise
    - **Content**: Any speech (the actual words don't matter)
    
    **Good samples**:
    - ✅ Recording of someone speaking clearly
    - ✅ No music or noise in background
    - ✅ Consistent volume
    
    **Bad samples**:
    - ❌ Music or songs
    - ❌ Heavy background noise
    - ❌ Very short (< 6 seconds)
    - ❌ Very long (> 30 seconds)
    
    ---
    
    ## ⚠ Important Notes
    
    ### Model Download
    - First use downloads ~1.87GB model (one-time)
    - Model is stored at: `/mnt/c/TEMP/Docker-containers/coqui-tts/models-xtts/`
    - Status: ✅ Already downloaded
    
    ### Processing Time
    - Takes 20-40 seconds depending on text length
    - This is normal - voice cloning is computationally intensive
    
    ---
    
    ## Troubleshooting
    
    ### "Command not found"
    Make sure you're in the skill directory or use full path:
    ```bash
    /home/bernie/clawd/skills/clonev/scripts/clonev.sh "text" sample.wav en
    ```
    
    ### "Voice sample not found"
    - Check the path to the WAV file
    - Use absolute paths (starting with `/`)
    - Ensure file exists: `ls -la /path/to/sample.wav`
    
    ### "Model not found"
    The model should auto-download. If not:
    ```bash
    cd /mnt/c/TEMP/Docker-containers/coqui-tts
    docker run --rm --entrypoint "" \
      -v $(pwd)/models-xtts:/root/.local/share/tts \
      ghcr.io/coqui-ai/tts:latest \
      python3 -c "from TTS.api import TTS; TTS('tts_models/multilingual/multi-dataset/xtts_v2')"
    ```
    
    ### Poor voice quality
    - Use clearer voice sample
    - Ensure no background noise
    - Try different sample (some voices clone better)
    
    ---
    
    ## Quick Reference Card (FOR AI MODELS)
    
    ```
    USER: "Clone my voice and say 'hello'"
    → Get: sample path, text="hello", language="en"
    → Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "hello" "/path/to/sample.wav" en)
    → Result: $VOICE contains path to OGG file
    → Send: message action=send channel=telegram asVoice=true filePath="$VOICE"
    ```
    
    ```
    USER: "Make me speak Czech"
    → Get: sample path, text="Ahoj", language="cs"  
    → Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj" "/path/to/sample.wav" cs)
    → Send: message action=send channel=telegram asVoice=true filePath="$VOICE"
    ```
    
    ---
    
    ## Output Location
    
    Generated files are saved to:
    ```
    /mnt/c/TEMP/Docker-containers/coqui-tts/output/clonev_output.ogg
    ```
    
    The script returns this path, so you can use it directly.
    
    ---
    
    ## Summary
    
    1. **ONLY use the script**: `scripts/clonev.sh`
    2. **NEVER** try to use Docker containers directly
    3. **NEVER** try to interact with the `coqui-xtts` container
    4. Script handles everything automatically
    5. Returns path to OGG file ready to send
    
    **Simple. Just use the script.**
    
    ---
    
    *Clone any voice. Speak any language. Just use the script.*