🦞
doubao-api-open-tts

Text-to-Speech service using Doubao (Volcano Engine)
SKILL.md
---
name: doubao-open-tts
description: Text-to-Speech service using Doubao (Volcano Engine) API with 200+ voices, interactive voice selection, and multilingual support
license: MIT
compatibility: opencode
metadata:
  category: audio
  language: python
  provider: volcano-engine
---

# Doubao Open TTS

Text-to-Speech (TTS) service using Doubao (Volcano Engine) API V1 interface to convert text into natural-sounding speech.

## Features

- 🎙️ **200+ Voice Options** - Default: Shiny (灿灿) for general scenarios
- 🔊 **Multiple Audio Formats** - Supports mp3, pcm, wav
- ⚡ **Adjustable Parameters** - Speed and volume control
- 📦 **Dual Interface** - Command-line tool + Python API
- 🎯 **Voice Categorization** - Browse voices by category
- 🔐 **Mandatory API Configuration** - Requires API credentials; stops and prompts user if not configured (default voice only available with explicit user confirmation)

## Quick Start for Agents

**重要提示**: 当检测到没有API配置时，Agent必须**立即停止**并询问用户是否配置API，**不允许**直接使用默认音色生成音频，除非用户明确坚持使用默认音色。

When using this SKILL, the Agent should follow this workflow:

### Step 1: Check API Configuration

First, check if API credentials are configured:

```python
from scripts.tts import check_api_config, setup_api_config

# Check if API is configured
config = check_api_config()
if not config:
    # API not configured, MUST stop and ask user
    # DO NOT proceed with default voice
    print("⚠️ API credentials not found. Cannot proceed without API configuration.")
    # ... ask user for credentials or if they want to use default
```

### Step 2: Handle Missing API Configuration

**如果API凭证缺失，Agent必须停止并询问用户**: 用户可以选择配置API，或者明确坚持使用默认音色（仅当用户明确要求时才允许）。

```
Agent: I need API credentials to use the Doubao TTS service. 
       Please provide the following information:

1. App ID (APPID)
2. Access Token
3. Secret Key

You can get these from: https://console.volcengine.com/

Please enter your App ID:

User: 4269953601

Agent: Please enter your Access Token:

User: uAGE3iP8nJf3ewu-d6U1P6Jthv7i1DH7

Agent: Please enter your Secret Key:

User: Dfta36aYgcdqY-ylHoh9FQyqiERS5iTv

Agent: [Saves credentials to .env file]
       ✅ API configuration saved successfully!
```

**用户选择**: 如果用户不想配置API但坚持使用默认音色:
```
User: 我不想配置API，就用默认音色吧

Agent: ⚠️ 确认使用默认音色吗？这将使用内置的默认音色生成音频。
       输入 'yes' 确认使用默认音色，或提供API凭证以获得更好的体验。

User: yes

Agent: [继续执行，使用默认音色]
```

### Step 3: Use the Service

After API is configured OR user explicitly confirmed to use default voice:

```python
from scripts.tts import VolcanoTTS

tts = VolcanoTTS()
output = tts.synthesize("Hello world", output_file="output.mp3")
```

## API Configuration Detection

### Function: `check_api_config()`

Checks if API credentials are available. Returns config dict or None.

```python
from scripts.tts import check_api_config

config = check_api_config()
if config:
    print(f"App ID: {config['app_id']}")
    print(f"Access Token: {config['access_token'][:10]}...")
    print(f"Secret Key: {config['secret_key'][:10]}...")
else:
    print("API not configured")
```

### Function: `setup_api_config(app_id, access_token, secret_key, voice_type=None)`

Saves API credentials to the .env file in the SKILL directory.

```python
from scripts.tts import setup_api_config

# Save credentials
setup_api_config(
    app_id="4269953601",
    access_token="uAGE3iP8nJf3ewu-d6U1P6Jthv7i1DH7",
    secret_key="Dfta36aYgcdqY-ylHoh9FQyqiERS5iTv",
    voice_type="zh_female_cancan_mars_bigtts"  # optional
)

print("✅ Configuration saved to .env file")
```

### Complete Agent Workflow Example

```python
from scripts.tts import check_api_config, setup_api_config, VolcanoTTS

def synthesize_with_auto_config(text, output_file="output.mp3", use_default_voice=False):
    """
    Synthesize speech with automatic API configuration.
    
    IMPORTANT: If API is not configured, this function will STOP and ask user.
    It will NOT automatically use default voice unless user explicitly confirms.
    """
    # Step 1: Check if API is configured
    config = check_api_config()
    
    if not config:
        # Step 2: STOP and ask user - DO NOT proceed automatically
        print("🔐 API Configuration Required")
        print("=" * 50)
        print("\n⚠️ No API credentials found. You have two options:")
        print("\nOption 1: Configure API (Recommended)")
        print("  Please visit https://console.volcengine.com/ to get your credentials")
        print("\nOption 2: Use Default Voice")
        print("  ⚠️ Only available if you explicitly confirm")
        
        # Ask user what they want to do
        choice = input("\nEnter '1' to configure API, or '2' to use default voice: ").strip()
        
        if choice == '1':
            # Configure API
            print("\nRequired information:")
            app_id = input("1. Enter your App ID: ").strip()
            access_token = input("2. Enter your Access Token: ").strip()
            secret_key = input("3. Enter your Secret Key: ").strip()
            
            # Optional: ask for preferred voice
            print("\n🎙️ Optional: Select a default voice (press Enter to use Shiny)")
            voice_type = input("Voice type (or voice name): ").strip()
            
            # Save configuration
            setup_api_config(app_id, access_token, secret_key, voice_type or None)
            print("\n✅ Configuration saved!")
            
        elif choice == '2':
            # User explicitly chose to use default voice
            confirm = input("\n⚠️ Are you sure you want to use the default voice? (yes/no): ").strip().lower()
            if confirm != 'yes':
                print("❌ Cancelled. Please configure API to proceed.")
                return None
            use_default_voice = True
            print("\n⚠️ Using default voice as requested...")
        else:
            print("❌ Invalid choice. Please configure API to proceed.")
            return None
    
    # Step 3: Use the service
    if use_default_voice:
        # Use default voice (only when user explicitly confirmed)
        tts = VolcanoTTS(use_default=True)
    else:
        tts = VolcanoTTS()
    
    output_path = tts.synthesize(text, output_file=output_file)
    return output_path

# Use it
output = synthesize_with_auto_config("Hello, this is a test")
if output:
    print(f"Audio saved to: {output}")
else:
    print("Operation cancelled - API configuration required")
```

## Configuration Methods

## Installation

```bash
cd skills/volcano-tts
pip install -r requirements.txt
```

## Configuration

### Method 1: Environment Variables

```bash
export VOLCANO_TTS_APPID="your_app_id"
export VOLCANO_TTS_ACCESS_TOKEN="your_access_token"
export VOLCANO_TTS_SECRET_KEY="your_secret_key"
export VOLCANO_TTS_VOICE_TYPE="zh_female_cancan_mars_bigtts"  # Optional: set default voice
```

### Method 2: .env File

Copy `.env.example` to `.env` and fill in your credentials:

```bash
cp .env.example .env
# Edit the .env file with your credentials
```

## Usage

### Command Line

```bash
# Basic usage (uses default voice: Shiny)
python scripts/tts.py "Hello, this is a test of Doubao text-to-speech service"

# Specify output file and format
python scripts/tts.py "Welcome to use TTS" -o output.mp3 -e mp3

# Read text from file
python scripts/tts.py -f input.txt -o output.mp3

# Adjust parameters
python scripts/tts.py "Custom voice" --speed 1.2 --volume 0.8 -v zh_female_cancan_mars_bigtts

# List all available voices
python scripts/tts.py --list-voices

# List voices by category
python scripts/tts.py --list-voices --category "General-Multilingual"

# Use different cluster
python scripts/tts.py "Hello" --cluster volcano_tts

# Enable debug mode
python scripts/tts.py "Test" --debug
```

### Python API

```python
from scripts.tts import VolcanoTTS, VOICE_TYPES, VOICE_CATEGORIES

# Initialize client
tts = VolcanoTTS(
    app_id="your_app_id",
    access_token="your_access_token",
    secret_key="your_secret_key",
    voice_type="zh_female_cancan_mars_bigtts"  # Optional: set default voice
)

# List available voices
print("All voices:", tts.list_voices())
print("General voices:", tts.list_voices("General-Normal"))

# Change voice
tts.set_voice("zh_male_xudong_conversation_wvae_bigtts")  # Set to "Happy Xiaodong"

# Synthesize speech
output_path = tts.synthesize(
    text="Hello, this is Doubao text-to-speech",
    voice_type="zh_female_cancan_mars_bigtts",  # Optional: override default
    encoding="mp3",
    cluster="volcano_tts",
    speed=1.0,
    volume=1.0,
    output_file="output.mp3"
)

print(f"Audio saved to: {output_path}")
```

## Interactive Voice Selection

The SKILL supports interactive voice selection workflow for Agent-User collaboration:

### Workflow

1. **Agent Prompts User** - Agent asks user to select a voice
2. **Display Voice Options** - Show recommended voices by category
3. **User Selection** - User tells Agent their preferred voice
4. **Agent Calls Skill** - Agent uses the selected voice to generate audio

### Python API for Interactive Selection

**重要**: 在使用以下代码之前，必须先检查API配置。如果没有配置，必须停止并询问用户。

```python
from scripts.tts import (
    get_voice_selection_prompt,
    find_voice_by_name,
    get_voice_info,
    check_api_config,
    VolcanoTTS
)

# Step 0: Check API configuration FIRST
config = check_api_config()
if not config:
    print("⚠️ API credentials not found. Please configure API first.")
    print("Visit: https://console.volcengine.com/")
    # STOP here and ask user to configure API
    # DO NOT proceed with voice selection until API is configured
    # OR user explicitly confirms to use default voice
    
# Step 1: Get the selection prompt to show user
prompt = get_voice_selection_prompt()
print(prompt)
# Agent displays this to user and waits for response

# Step 2: User responds with their choice (e.g., "Shiny" or "灿灿")
user_input = "Shiny"  # This comes from user

# Step 3: Find the voice_type from user input
voice_type, voice_name = find_voice_by_name(user_input)
if voice_

... (truncated)