Back to Skills
    🦞

    doubao-api-open-tts

    Text-to-Speech service using Doubao (Volcano Engine)

    By @xdrshjr
    View on GitHub
    SKILL.md
    ---
    name: doubao-open-tts
    description: Text-to-Speech service using Doubao (Volcano Engine) API with 200+ voices, interactive voice selection, and multilingual support
    license: MIT
    compatibility: opencode
    metadata:
      category: audio
      language: python
      provider: volcano-engine
    ---
    
    # Doubao Open TTS
    
    Text-to-Speech (TTS) service using Doubao (Volcano Engine) API V1 interface to convert text into natural-sounding speech.
    
    ## Features
    
    - 🎙️ **200+ Voice Options** - Default: Shiny (灿灿) for general scenarios
    - 🔊 **Multiple Audio Formats** - Supports mp3, pcm, wav
    - ⚡ **Adjustable Parameters** - Speed and volume control
    - 📦 **Dual Interface** - Command-line tool + Python API
    - 🎯 **Voice Categorization** - Browse voices by category
    - 🔐 **Mandatory API Configuration** - Requires API credentials; stops and prompts user if not configured (default voice only available with explicit user confirmation)
    
    ## Quick Start for Agents
    
    **重要提示**: 当检测到没有API配置时,Agent必须**立即停止**并询问用户是否配置API,**不允许**直接使用默认音色生成音频,除非用户明确坚持使用默认音色。
    
    When using this SKILL, the Agent should follow this workflow:
    
    ### Step 1: Check API Configuration
    
    First, check if API credentials are configured:
    
    ```python
    from scripts.tts import check_api_config, setup_api_config
    
    # Check if API is configured
    config = check_api_config()
    if not config:
        # API not configured, MUST stop and ask user
        # DO NOT proceed with default voice
        print("⚠️ API credentials not found. Cannot proceed without API configuration.")
        # ... ask user for credentials or if they want to use default
    ```
    
    ### Step 2: Handle Missing API Configuration
    
    **如果API凭证缺失,Agent必须停止并询问用户**: 用户可以选择配置API,或者明确坚持使用默认音色(仅当用户明确要求时才允许)。
    
    ```
    Agent: I need API credentials to use the Doubao TTS service. 
           Please provide the following information:
    
    1. App ID (APPID)
    2. Access Token
    3. Secret Key
    
    You can get these from: https://console.volcengine.com/
    
    Please enter your App ID:
    
    User: 4269953601
    
    Agent: Please enter your Access Token:
    
    User: uAGE3iP8nJf3ewu-d6U1P6Jthv7i1DH7
    
    Agent: Please enter your Secret Key:
    
    User: Dfta36aYgcdqY-ylHoh9FQyqiERS5iTv
    
    Agent: [Saves credentials to .env file]
           ✅ API configuration saved successfully!
    ```
    
    **用户选择**: 如果用户不想配置API但坚持使用默认音色:
    ```
    User: 我不想配置API,就用默认音色吧
    
    Agent: ⚠️ 确认使用默认音色吗?这将使用内置的默认音色生成音频。
           输入 'yes' 确认使用默认音色,或提供API凭证以获得更好的体验。
    
    User: yes
    
    Agent: [继续执行,使用默认音色]
    ```
    
    ### Step 3: Use the Service
    
    After API is configured OR user explicitly confirmed to use default voice:
    
    ```python
    from scripts.tts import VolcanoTTS
    
    tts = VolcanoTTS()
    output = tts.synthesize("Hello world", output_file="output.mp3")
    ```
    
    ## API Configuration Detection
    
    ### Function: `check_api_config()`
    
    Checks if API credentials are available. Returns config dict or None.
    
    ```python
    from scripts.tts import check_api_config
    
    config = check_api_config()
    if config:
        print(f"App ID: {config['app_id']}")
        print(f"Access Token: {config['access_token'][:10]}...")
        print(f"Secret Key: {config['secret_key'][:10]}...")
    else:
        print("API not configured")
    ```
    
    ### Function: `setup_api_config(app_id, access_token, secret_key, voice_type=None)`
    
    Saves API credentials to the .env file in the SKILL directory.
    
    ```python
    from scripts.tts import setup_api_config
    
    # Save credentials
    setup_api_config(
        app_id="4269953601",
        access_token="uAGE3iP8nJf3ewu-d6U1P6Jthv7i1DH7",
        secret_key="Dfta36aYgcdqY-ylHoh9FQyqiERS5iTv",
        voice_type="zh_female_cancan_mars_bigtts"  # optional
    )
    
    print("✅ Configuration saved to .env file")
    ```
    
    ### Complete Agent Workflow Example
    
    ```python
    from scripts.tts import check_api_config, setup_api_config, VolcanoTTS
    
    def synthesize_with_auto_config(text, output_file="output.mp3", use_default_voice=False):
        """
        Synthesize speech with automatic API configuration.
        
        IMPORTANT: If API is not configured, this function will STOP and ask user.
        It will NOT automatically use default voice unless user explicitly confirms.
        """
        # Step 1: Check if API is configured
        config = check_api_config()
        
        if not config:
            # Step 2: STOP and ask user - DO NOT proceed automatically
            print("🔐 API Configuration Required")
            print("=" * 50)
            print("\n⚠️ No API credentials found. You have two options:")
            print("\nOption 1: Configure API (Recommended)")
            print("  Please visit https://console.volcengine.com/ to get your credentials")
            print("\nOption 2: Use Default Voice")
            print("  ⚠️ Only available if you explicitly confirm")
            
            # Ask user what they want to do
            choice = input("\nEnter '1' to configure API, or '2' to use default voice: ").strip()
            
            if choice == '1':
                # Configure API
                print("\nRequired information:")
                app_id = input("1. Enter your App ID: ").strip()
                access_token = input("2. Enter your Access Token: ").strip()
                secret_key = input("3. Enter your Secret Key: ").strip()
                
                # Optional: ask for preferred voice
                print("\n🎙️ Optional: Select a default voice (press Enter to use Shiny)")
                voice_type = input("Voice type (or voice name): ").strip()
                
                # Save configuration
                setup_api_config(app_id, access_token, secret_key, voice_type or None)
                print("\n✅ Configuration saved!")
                
            elif choice == '2':
                # User explicitly chose to use default voice
                confirm = input("\n⚠️ Are you sure you want to use the default voice? (yes/no): ").strip().lower()
                if confirm != 'yes':
                    print("❌ Cancelled. Please configure API to proceed.")
                    return None
                use_default_voice = True
                print("\n⚠️ Using default voice as requested...")
            else:
                print("❌ Invalid choice. Please configure API to proceed.")
                return None
        
        # Step 3: Use the service
        if use_default_voice:
            # Use default voice (only when user explicitly confirmed)
            tts = VolcanoTTS(use_default=True)
        else:
            tts = VolcanoTTS()
        
        output_path = tts.synthesize(text, output_file=output_file)
        return output_path
    
    # Use it
    output = synthesize_with_auto_config("Hello, this is a test")
    if output:
        print(f"Audio saved to: {output}")
    else:
        print("Operation cancelled - API configuration required")
    ```
    
    ## Configuration Methods
    
    ## Installation
    
    ```bash
    cd skills/volcano-tts
    pip install -r requirements.txt
    ```
    
    ## Configuration
    
    ### Method 1: Environment Variables
    
    ```bash
    export VOLCANO_TTS_APPID="your_app_id"
    export VOLCANO_TTS_ACCESS_TOKEN="your_access_token"
    export VOLCANO_TTS_SECRET_KEY="your_secret_key"
    export VOLCANO_TTS_VOICE_TYPE="zh_female_cancan_mars_bigtts"  # Optional: set default voice
    ```
    
    ### Method 2: .env File
    
    Copy `.env.example` to `.env` and fill in your credentials:
    
    ```bash
    cp .env.example .env
    # Edit the .env file with your credentials
    ```
    
    ## Usage
    
    ### Command Line
    
    ```bash
    # Basic usage (uses default voice: Shiny)
    python scripts/tts.py "Hello, this is a test of Doubao text-to-speech service"
    
    # Specify output file and format
    python scripts/tts.py "Welcome to use TTS" -o output.mp3 -e mp3
    
    # Read text from file
    python scripts/tts.py -f input.txt -o output.mp3
    
    # Adjust parameters
    python scripts/tts.py "Custom voice" --speed 1.2 --volume 0.8 -v zh_female_cancan_mars_bigtts
    
    # List all available voices
    python scripts/tts.py --list-voices
    
    # List voices by category
    python scripts/tts.py --list-voices --category "General-Multilingual"
    
    # Use different cluster
    python scripts/tts.py "Hello" --cluster volcano_tts
    
    # Enable debug mode
    python scripts/tts.py "Test" --debug
    ```
    
    ### Python API
    
    ```python
    from scripts.tts import VolcanoTTS, VOICE_TYPES, VOICE_CATEGORIES
    
    # Initialize client
    tts = VolcanoTTS(
        app_id="your_app_id",
        access_token="your_access_token",
        secret_key="your_secret_key",
        voice_type="zh_female_cancan_mars_bigtts"  # Optional: set default voice
    )
    
    # List available voices
    print("All voices:", tts.list_voices())
    print("General voices:", tts.list_voices("General-Normal"))
    
    # Change voice
    tts.set_voice("zh_male_xudong_conversation_wvae_bigtts")  # Set to "Happy Xiaodong"
    
    # Synthesize speech
    output_path = tts.synthesize(
        text="Hello, this is Doubao text-to-speech",
        voice_type="zh_female_cancan_mars_bigtts",  # Optional: override default
        encoding="mp3",
        cluster="volcano_tts",
        speed=1.0,
        volume=1.0,
        output_file="output.mp3"
    )
    
    print(f"Audio saved to: {output_path}")
    ```
    
    ## Interactive Voice Selection
    
    The SKILL supports interactive voice selection workflow for Agent-User collaboration:
    
    ### Workflow
    
    1. **Agent Prompts User** - Agent asks user to select a voice
    2. **Display Voice Options** - Show recommended voices by category
    3. **User Selection** - User tells Agent their preferred voice
    4. **Agent Calls Skill** - Agent uses the selected voice to generate audio
    
    ### Python API for Interactive Selection
    
    **重要**: 在使用以下代码之前,必须先检查API配置。如果没有配置,必须停止并询问用户。
    
    ```python
    from scripts.tts import (
        get_voice_selection_prompt,
        find_voice_by_name,
        get_voice_info,
        check_api_config,
        VolcanoTTS
    )
    
    # Step 0: Check API configuration FIRST
    config = check_api_config()
    if not config:
        print("⚠️ API credentials not found. Please configure API first.")
        print("Visit: https://console.volcengine.com/")
        # STOP here and ask user to configure API
        # DO NOT proceed with voice selection until API is configured
        # OR user explicitly confirms to use default voice
        
    # Step 1: Get the selection prompt to show user
    prompt = get_voice_selection_prompt()
    print(prompt)
    # Agent displays this to user and waits for response
    
    # Step 2: User responds with their choice (e.g., "Shiny" or "灿灿")
    user_input = "Shiny"  # This comes from user
    
    # Step 3: Find the voice_type from user input
    voice_type, voice_name = find_voice_by_name(user_input)
    if voice_
    
    ... (truncated)