🦞
doubleword

Create and manage batch inference jobs using the Doubleword API.
SKILL.md
---
name: doubleword-batches
description: Create and manage batch inference jobs using the Doubleword API (api.doubleword.ai). Use when users want to: (1) Process multiple AI requests in batch mode, (2) Submit JSONL batch files for async inference, (3) Monitor batch job progress and retrieve results, (4) Work with OpenAI-compatible batch endpoints, (5) Handle large-scale inference workloads that don't require immediate responses, (6) Use tool calling or structured outputs in batches, (7) Automatically batch API calls with autobatcher.
---

# Doubleword Batch Inference

Process multiple AI inference requests asynchronously using the Doubleword batch API with high throughput and low cost.

## Prerequisites

Before submitting batches, you need:
1. **Doubleword Account** - Sign up at https://app.doubleword.ai/
2. **API Key** - Create one in the API Keys section of your dashboard
3. **Account Credits** - Add credits to process requests (see pricing below)

## When to Use Batches

Batches are ideal for:
- Multiple independent requests that can run simultaneously
- Workloads that don't require immediate responses
- Large volumes that would exceed rate limits if sent individually
- Cost-sensitive workloads (24h window = 50-60% cheaper than realtime)
- Tool calling and structured output generation at scale

## Available Models & Pricing

Pricing is per 1 million tokens (input / output):

**Qwen3-VL-30B-A3B-Instruct-FP8** (mid-size):
- Realtime SLA: $0.16 / $0.80
- 1-hour SLA: $0.07 / $0.30 (56% cheaper)
- 24-hour SLA: $0.05 / $0.20 (69% cheaper)

**Qwen3-VL-235B-A22B-Instruct-FP8** (flagship):
- Realtime SLA: $0.60 / $1.20
- 1-hour SLA: $0.15 / $0.55 (75% cheaper)
- 24-hour SLA: $0.10 / $0.40 (83% cheaper)
- Supports up to 262K total tokens, 16K new tokens per request

**Cost estimation:** Upload files to the Doubleword Console to preview expenses before submitting.

## Quick Start

Two ways to submit batches:

**Via API:**
1. Create JSONL file with requests
2. Upload file to get file ID
3. Create batch using file ID
4. Poll status until complete
5. Download results from output_file_id

**Via Web Console:**
1. Navigate to Batches section at https://app.doubleword.ai/
2. Upload JSONL file
3. Configure batch settings (model, completion window)
4. Monitor progress in real-time dashboard
5. Download results when ready

## Workflow

### Step 1: Create Batch Request File

Create a `.jsonl` file where each line contains a complete, valid JSON object with no line breaks within the object:

```json
{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic/claude-3-5-sonnet", "messages": [{"role": "user", "content": "What is 2+2?"}]}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic/claude-3-5-sonnet", "messages": [{"role": "user", "content": "What is the capital of France?"}]}}
```

**Required fields per line:**
- `custom_id`: Unique identifier (max 64 chars) - use descriptive IDs like `"user-123-question-5"` for easier result mapping
- `method`: Always `"POST"`
- `url`: API endpoint - `"/v1/chat/completions"` or `"/v1/embeddings"`
- `body`: Standard API request with `model` and `messages`

**Optional body parameters:**
- `temperature`: 0-2 (default: 1.0)
- `max_tokens`: Maximum response tokens
- `top_p`: Nucleus sampling parameter
- `stop`: Stop sequences
- `tools`: Tool definitions for tool calling (see Tool Calling section)
- `response_format`: JSON schema for structured outputs (see Structured Outputs section)

**File requirements:**
- Max size: 200MB
- Format: JSONL only (JSON Lines - newline-delimited JSON)
- Each line must be valid JSON with no internal line breaks
- No duplicate `custom_id` values
- Split large batches into multiple files if needed

**Common pitfalls:**
- Line breaks within JSON objects (will cause parsing errors)
- Invalid JSON syntax
- Duplicate `custom_id` values

**Helper script:**
Use `scripts/create_batch_file.py` to generate JSONL files programmatically:

```bash
python scripts/create_batch_file.py output.jsonl
```

Modify the script's `requests` list to generate your specific batch requests.

### Step 2: Upload File

**Via API:**
```bash
curl https://api.doubleword.ai/v1/files \
  -H "Authorization: Bearer $DOUBLEWORD_API_KEY" \
  -F purpose="batch" \
  -F file="@batch_requests.jsonl"
```

**Via Console:**
Upload through the Batches section at https://app.doubleword.ai/

Response contains `id` field - save this file ID for next step.

### Step 3: Create Batch

**Via API:**
```bash
curl https://api.doubleword.ai/v1/batches \
  -H "Authorization: Bearer $DOUBLEWORD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input_file_id": "file-abc123",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h"
  }'
```

**Via Console:**
Configure batch settings in the web interface.

**Parameters:**
- `input_file_id`: File ID from upload step
- `endpoint`: API endpoint (`"/v1/chat/completions"` or `"/v1/embeddings"`)
- `completion_window`: Choose based on urgency and budget:
  - `"24h"`: Best pricing, results within 24 hours (typically faster)
  - `"1h"`: 50% price premium, results within 1 hour (typically faster)
  - Realtime: Limited capacity, highest cost (batch service optimized for async)

Response contains batch `id` - save this for status polling.

**Before submitting, verify:**
- You have access to the specified model
- Your API key is active
- You have sufficient account credits

### Step 4: Poll Status

**Via API:**
```bash
curl https://api.doubleword.ai/v1/batches/batch-xyz789 \
  -H "Authorization: Bearer $DOUBLEWORD_API_KEY"
```

**Via Console:**
Monitor real-time progress in the Batches dashboard.

**Status progression:**
1. `validating` - Checking input file format
2. `in_progress` - Processing requests
3. `completed` - All requests finished

**Other statuses:**
- `failed` - Batch failed (check `error_file_id`)
- `expired` - Batch timed out
- `cancelling`/`cancelled` - Batch cancelled

**Response includes:**
- `output_file_id` - Download results here
- `error_file_id` - Failed requests (if any)
- `request_counts` - Total/completed/failed counts

**Polling frequency:** Check every 30-60 seconds during processing.

**Early access:** Results available via `output_file_id` before batch fully completes - check `X-Incomplete` header.

### Step 5: Download Results

**Via API:**
```bash
curl https://api.doubleword.ai/v1/files/file-output123/content \
  -H "Authorization: Bearer $DOUBLEWORD_API_KEY" \
  > results.jsonl
```

**Via Console:**
Download results directly from the Batches dashboard.

**Response headers:**
- `X-Incomplete: true` - Batch still processing, more results coming
- `X-Last-Line: 45` - Resume point for partial downloads

**Output format (each line):**
```json
{
  "id": "batch-req-abc",
  "custom_id": "request-1",
  "response": {
    "status_code": 200,
    "body": {
      "id": "chatcmpl-xyz",
      "choices": [{
        "message": {
          "role": "assistant",
          "content": "The answer is 4."
        }
      }]
    }
  }
}
```

**Download errors (if any):**
```bash
curl https://api.doubleword.ai/v1/files/file-error123/content \
  -H "Authorization: Bearer $DOUBLEWORD_API_KEY" \
  > errors.jsonl
```

**Error format (each line):**
```json
{
  "id": "batch-req-def",
  "custom_id": "request-2",
  "error": {
    "code": "invalid_request",
    "message": "Missing required parameter"
  }
}
```

## Tool Calling in Batches

Tool calling (function calling) enables models to intelligently select and use external tools. Doubleword maintains full OpenAI compatibility.

**Example batch request with tools:**
```json
{
  "custom_id": "tool-req-1",
  "method": "POST",
  "url": "/v1/chat/completions",
  "body": {
    "model": "anthropic/claude-3-5-sonnet",
    "messages": [{"role": "user", "content": "What's the weather in Paris?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          },
          "required": ["location"]
        }
      }
    }]
  }
}
```

**Use cases:**
- Agents that interact with APIs at scale
- Fetching real-time information for multiple queries
- Executing actions through standardized tool definitions

## Structured Outputs in Batches

Structured outputs guarantee that model responses conform to your JSON Schema, eliminating issues with missing fields or invalid enum values.

**Example batch request with structured output:**
```json
{
  "custom_id": "structured-req-1",
  "method": "POST",
  "url": "/v1/chat/completions",
  "body": {
    "model": "anthropic/claude-3-5-sonnet",
    "messages": [{"role": "user", "content": "Extract key info from: John Doe, 30 years old, lives in NYC"}],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "person_info",
        "schema": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "age": {"type": "integer"},
            "city": {"type": "string"}
          },
          "required": ["name", "age", "city"]
        }
      }
    }
  }
}
```

**Benefits:**
- Guaranteed schema compliance
- No missing required keys
- No hallucinated enum values
- Seamless OpenAI compatibility

## autobatcher: Automatic Batching

autobatcher is a Python client that automatically converts individual API calls into batched requests, reducing costs without code changes.

**Installation:**
```bash
pip install autobatcher
```

**How it works:**
1. **Collection Phase**: Requests accumulate during a time window (default: 1 second) or until batch size threshold
2. **Batch Submission**: Collected requests are submitted together
3. **Result Polling**: System monitors for completed responses
4. **Transparent Response**

... (truncated)