Back to Skills
    🦞

    chichi-speech

    A RESTful service for high-quality text-to-speech using Qwen3

    By @hudeven
    View on GitHub
    SKILL.md
    ---
    name: chichi-speech
    description: A RESTful service for high-quality text-to-speech using Qwen3 and specialized voice cloning. Optimized for reusing a specific voice prompt to avoid re-computation.
    ---
    
    # Chichi Speech Service
    
    This skill provides a FastAPI-based REST service for Qwen3 TTS, specifically configured for reusing a high-quality reference audio prompt for efficient and consistent voice cloning. This service is packaged as an installable CLI.
    
    ## Installation
    
    Prerequisites: `python >= 3.10`.
    
    ```bash
    pip install -e .
    ```
    
    ## Usage
    
    ### 1. Start the Service
    
    The service runs on port **9090** by default.
    
    ```bash
    # Start the server (runs in foreground, use & for background or a separate terminal)
    # Optional: Uudate to your own reference audio and text for voice cloning
    chichi-speech --port 9090 --host 127.0.0.1 --ref-audio "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav" --ref-text "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you."
    ```
    
    ### 2. Verify Service is Running
    Check the health/docs:
    ```bash
    curl http://localhost:9090/docs
    ```
    
    ### 3. Generate Speech
    
    Use cURL:
    ```bash
    curl -X POST "http://localhost:9090/synthesize" \
         -H "Content-Type: application/json" \
         -d '{
               "text": "Nice to meet you",
               "language": "English"
             }' \
         --output output/nice_to_meet.wav
    ```
    
    ## Functionality
    
    -   **Endpoint**: `POST /synthesize`
    -   **Default Port**: 9090
    -   **Voice Cloning**: Uses a pre-computed voice prompt from reference files to ensure the cloned voice is consistent and generation is fast.
    
    ## Requirements
    
    -   Python 3.10+
    -   `qwen-tts` (Qwen3 model library)
    -   Access to a reference audio file for voice cloning.
        -   By default, it uses public sample audio from Qwen3.
        -   **CRITICAL**: You can provide your own reference audio using the `--ref-audio` and `--ref-text` flags.