Back to Skills
    🦞

    audiopod

    Use AudioPod AI's API for audio processing tasks including AI music

    By @rakesh1002
    View on GitHub
    SKILL.md
    ---
    name: audiopod
    description: Use AudioPod AI's API for audio processing tasks including AI music generation (text-to-music, text-to-rap, instrumentals, samples, vocals), stem separation, text-to-speech, noise reduction, speech-to-text transcription, speaker separation, and media extraction. Use when the user needs to generate music/songs/rap from text, split a song into stems/vocals/instruments, generate speech from text, clean up noisy audio, transcribe audio/video, or extract audio from YouTube/URLs. Requires AUDIOPOD_API_KEY env var or pass api_key directly.
    ---
    
    # AudioPod AI
    
    Full audio processing API: music generation, stem separation, TTS, noise reduction, transcription, speaker separation, wallet management.
    
    ## Setup
    
    ```bash
    pip install audiopod  # Python
    npm install audiopod  # Node.js
    ```
    
    Auth: set `AUDIOPOD_API_KEY` env var or pass to client constructor.
    
    ### Getting an API Key
    1. Sign up at https://audiopod.ai/auth/signup (free, no credit card required)
    2. Go to https://www.audiopod.ai/dashboard/account/api-keys
    3. Click "Create API Key" and copy the key (starts with `ap_`)
    4. Add funds to your wallet at https://www.audiopod.ai/dashboard/account/wallet (pay-as-you-go, no subscription)
    
    ```python
    from audiopod import AudioPod
    client = AudioPod()  # uses AUDIOPOD_API_KEY env var
    # or: client = AudioPod(api_key="ap_...")
    ```
    
    ---
    
    ## AI Music Generation
    
    Generate songs, rap, instrumentals, samples, and vocals from text prompts.
    
    **Tasks:** `text2music` (song with vocals), `text2rap` (rap), `prompt2instrumental` (instrumental), `lyric2vocals` (vocals only), `text2samples` (loops/samples), `audio2audio` (style transfer), `songbloom`
    
    ### Python SDK
    
    ```python
    # Generate a full song with lyrics
    result = client.music.song(
        prompt="Upbeat pop, synth, drums, 120 bpm, female vocals, radio-ready",
        lyrics="Verse 1:\nWalking down the street on a sunny day\n\nChorus:\nWe're on fire tonight!",
        duration=60
    )
    print(result["output_url"])
    
    # Generate rap
    result = client.music.rap(
        prompt="Lo-Fi Hip Hop, 100 BPM, male rap, melancholy, keyboard chords",
        lyrics="Verse 1:\nStarted from the bottom, now we climbing...",
        duration=60
    )
    
    # Generate instrumental (no lyrics needed)
    result = client.music.instrumental(
        prompt="Atmospheric ambient soundscape, uplifting, driving mood",
        duration=30
    )
    
    # Generic generate with explicit task
    result = client.music.generate(
        prompt="Electronic dance music, high energy",
        task="text2samples",  # any task type
        duration=30
    )
    
    # Async: submit then poll
    job = client.music.create(
        prompt="Chill lofi beat", 
        duration=30, 
        task="prompt2instrumental"
    )
    result = client.music.wait_for_completion(job["id"], timeout=600)
    
    # Get available genre presets
    presets = client.music.get_presets()
    
    # List/manage jobs
    jobs = client.music.list(skip=0, limit=50)
    job = client.music.get(job_id=123)
    client.music.delete(job_id=123)
    ```
    
    ### cURL
    
    ```bash
    # Song with lyrics
    curl -X POST "https://api.audiopod.ai/api/v1/music/text2music" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"prompt":"upbeat pop, synth, 120bpm, female vocals", "lyrics":"Walking down the street...", "audio_duration":60}'
    
    # Rap
    curl -X POST "https://api.audiopod.ai/api/v1/music/text2rap" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"prompt":"Lo-Fi Hip Hop, male rap, 100 BPM", "lyrics":"Started from the bottom...", "audio_duration":60}'
    
    # Instrumental
    curl -X POST "https://api.audiopod.ai/api/v1/music/prompt2instrumental" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"prompt":"ambient soundscape, uplifting", "audio_duration":30}'
    
    # Samples/loops
    curl -X POST "https://api.audiopod.ai/api/v1/music/text2samples" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"prompt":"drum loop, sad mood", "audio_duration":15}'
    
    # Vocals only
    curl -X POST "https://api.audiopod.ai/api/v1/music/lyric2vocals" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"prompt":"clean vocals, happy", "lyrics":"Eternal chorus of unity...", "audio_duration":30}'
    
    # Check job status / get result
    curl "https://api.audiopod.ai/api/v1/music/jobs/JOB_ID" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Get genre presets
    curl "https://api.audiopod.ai/api/v1/music/presets" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # List jobs
    curl "https://api.audiopod.ai/api/v1/music/jobs?skip=0&limit=50" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Delete job
    curl -X DELETE "https://api.audiopod.ai/api/v1/music/jobs/JOB_ID" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    ```
    
    ### Parameters
    
    | Field | Required | Description |
    |-------|----------|-------------|
    | prompt | yes | Style/genre description |
    | lyrics | for song/rap/vocals | Song lyrics with verse/chorus structure |
    | audio_duration | no | Duration in seconds (default: 30) |
    | genre_preset | no | Genre preset name (from presets endpoint) |
    | display_name | no | Track display name |
    
    ---
    
    ## Stem Separation
    
    Split audio into individual instrument/vocal tracks.
    
    ### Modes
    
    | Mode | Stems | Output | Use Case |
    |------|-------|--------|----------|
    | single | 1 | Specified stem only | Vocal isolation, drum extraction |
    | two | 2 | vocals + instrumental | Karaoke tracks |
    | four | 4 | vocals, drums, bass, other | Standard remixing (default) |
    | six | 6 | + guitar, piano | Full instrument separation |
    | producer | 8 | + kick, snare, hihat | Beat production |
    | studio | 12 | + cymbals, sub_bass, synth | Professional mixing |
    | mastering | 16 | Maximum detail | Forensic analysis |
    
    **Single stem options:** vocals, drums, bass, guitar, piano, other
    
    ### Python SDK
    
    ```python
    # Sync: extract and wait for result
    result = client.stems.separate(
        url="https://youtube.com/watch?v=VIDEO_ID",
        mode="six",
        timeout=600
    )
    for stem, url in result["download_urls"].items():
        print(f"{stem}: {url}")
    
    # From local file
    result = client.stems.separate(file="/path/to/song.mp3", mode="four")
    
    # Single stem extraction
    result = client.stems.separate(
        url="https://youtube.com/watch?v=ID",
        mode="single",
        stem="vocals"
    )
    
    # Async: submit then poll
    job = client.stems.extract(url="https://youtube.com/watch?v=ID", mode="six")
    print(f"Job ID: {job['id']}")
    status = client.stems.status(job["id"])
    # or wait:
    result = client.stems.wait_for_completion(job["id"], timeout=600)
    
    # List available modes
    modes = client.stems.modes()
    
    # Job management
    jobs = client.stems.list(skip=0, limit=50, status="COMPLETED")
    job = client.stems.get(job_id=1234)
    client.stems.delete(job_id=1234)
    ```
    
    ### cURL
    
    ```bash
    # Extract from URL
    curl -X POST "https://api.audiopod.ai/api/v1/stem-extraction/api/extract" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -F "url=https://youtube.com/watch?v=VIDEO_ID" \
      -F "mode=six"
    
    # Extract from file
    curl -X POST "https://api.audiopod.ai/api/v1/stem-extraction/api/extract" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -F "file=@/path/to/song.mp3" \
      -F "mode=four"
    
    # Single stem
    curl -X POST "https://api.audiopod.ai/api/v1/stem-extraction/api/extract" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -F "url=URL" \
      -F "mode=single" \
      -F "stem=vocals"
    
    # Check job status
    curl "https://api.audiopod.ai/api/v1/stem-extraction/status/JOB_ID" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # List available modes
    curl "https://api.audiopod.ai/api/v1/stem-extraction/modes" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # List jobs (filter by status: PENDING, PROCESSING, COMPLETED, FAILED)
    curl "https://api.audiopod.ai/api/v1/stem-extraction/jobs?skip=0&limit=50&status=COMPLETED" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Get specific job
    curl "https://api.audiopod.ai/api/v1/stem-extraction/jobs/JOB_ID" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Delete job
    curl -X DELETE "https://api.audiopod.ai/api/v1/stem-extraction/jobs/JOB_ID" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    ```
    
    ### Response Format
    
    ```json
    {
      "id": 1234,
      "status": "COMPLETED",
      "download_urls": {
        "vocals": "https://...",
        "drums": "https://...",
        "bass": "https://...",
        "other": "https://..."
      },
      "quality_scores": {
        "vocals": 0.95,
        "drums": 0.88
      }
    }
    ```
    
    ---
    
    ## Text to Speech
    
    Generate speech from text with 50+ voices in 60+ languages. Supports voice cloning.
    
    ### Voice Types
    
    - **50+ production-ready voices** — multilingual, supporting 60+ languages with auto-detection
    - **Custom clones** — clone any voice with ~5 seconds of audio sample
    
    ### Python SDK
    
    ```python
    # Generate speech and wait for result
    result = client.voice.generate(
        text="Hello, world! This is a test.",
        voice_id=123,
        speed=1.0
    )
    print(result["output_url"])
    
    # Async: submit then poll
    job = client.voice.speak(
        text="Hello world",
        voice_id=123,
        speed=1.0
    )
    status = client.voice.get_job(job["id"])
    result = client.voice.wait_for_completion(job["id"], timeout=300)
    
    # List all available voices
    voices = client.voice.list()
    for v in voices:
        print(f"{v['id']}: {v['name']}")
    
    # Clone a voice (needs ~5 sec audio sample)
    new_voice = client.voice.create(
        name="My Voice Clone",
        audio_file="./sample.mp3",
        description="Cloned from recording"
    )
    
    # Get/delete voice
    voice = client.voice.get(voice_id=123)
    client.voice.delete(voice_id=123)
    ```
    
    ### cURL (Raw HTTP — most reliable)
    
    ```bash
    # List all voices
    curl "https://api.audiopod.ai/api/v1/voice/voice-profiles" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Generate speech (FORM DATA, not JSON!)
    curl -X POST "https://api.audiopod.ai/api/v1/voice/voices/{VOICE_UUID}/generate" \
      -H "Authorization: Bearer $AUDIOPOD_API_KEY" \
      -d "input_text=Hello world, this is a test" \
      -d "audio_format=mp3" \
      -d "speed=1.0"
    
    # Poll job status
    curl "https://api.audiopod.ai/api/v1/voice/tts-jobs/{JOB_ID}/status" \
      -H "Authorization: Bearer $AUDIOPOD_API_KEY"
    
    # SDK-style endpoints (alternative)
    # Generate via SDK endpoint
    curl -X POST 
    
    ... (truncated)