Back to Skills
    🦞

    docstrange

    Document extraction API by Nanonets.

    By @shhdwi
    View on GitHub
    SKILL.md
    ---
    name: docstrange
    description: Document extraction API by Nanonets. Convert PDFs and images to markdown, JSON, or CSV with confidence scoring. Use when you need to OCR documents, extract invoice fields, parse receipts, or convert tables to structured data.
    ---
    
    # DocStrange by Nanonets
    
    Document extraction API — convert PDFs, images, and documents to markdown, JSON, or CSV with per-field confidence scoring.
    
    > **Get your API key:** https://docstrange.nanonets.com/app
    
    ## Quick Start
    
    ```bash
    curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
      -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
      -F "file=@document.pdf" \
      -F "output_format=markdown"
    ```
    
    Response:
    ```json
    {
      "success": true,
      "record_id": "550e8400-e29b-41d4-a716-446655440000",
      "status": "completed",
      "result": {
        "markdown": {
          "content": "# Invoice\n\n**Invoice Number:** INV-2024-001..."
        }
      }
    }
    ```
    
    ## Setup
    
    ### 1. Get Your API Key
    
    ```bash
    # Visit the dashboard
    https://docstrange.nanonets.com/app
    ```
    
    Save your API key:
    ```bash
    export DOCSTRANGE_API_KEY="your_api_key_here"
    ```
    
    ### 2. OpenClaw Configuration (Optional)
    
    Add to your `~/.openclaw/openclaw.json`:
    ```json5
    {
      skills: {
        entries: {
          "docstrange": {
            enabled: true,
            apiKey: "your_api_key_here",
            env: {
              DOCSTRANGE_API_KEY: "your_api_key_here",
            },
          },
        },
      },
    }
    ```
    
    ## Common Tasks
    
    ### Extract to Markdown
    
    ```bash
    curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
      -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
      -F "file=@document.pdf" \
      -F "output_format=markdown"
    ```
    
    Access content: `response["result"]["markdown"]["content"]`
    
    ### Extract JSON Fields
    
    **Simple field list:**
    ```bash
    curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
      -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
      -F "file=@invoice.pdf" \
      -F "output_format=json" \
      -F 'json_options=["invoice_number", "date", "total_amount", "vendor"]' \
      -F "include_metadata=confidence_score"
    ```
    
    **With JSON schema:**
    ```bash
    curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
      -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
      -F "file=@invoice.pdf" \
      -F "output_format=json" \
      -F 'json_options={"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}'
    ```
    
    Response with confidence scores:
    ```json
    {
      "result": {
        "json": {
          "content": {
            "invoice_number": "INV-2024-001",
            "total_amount": 500.00
          },
          "metadata": {
            "confidence_score": {
              "invoice_number": 98,
              "total_amount": 99
            }
          }
        }
      }
    }
    ```
    
    ### Extract Tables to CSV
    
    ```bash
    curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
      -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
      -F "file=@table.pdf" \
      -F "output_format=csv" \
      -F "csv_options=table"
    ```
    
    ### Async Extraction (Large Documents)
    
    For documents >5 pages, use async and poll:
    
    **Queue the document:**
    ```bash
    curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/async" \
      -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
      -F "file=@large-document.pdf" \
      -F "output_format=markdown"
    
    # Returns: {"record_id": "12345", "status": "processing"}
    ```
    
    **Poll for results:**
    ```bash
    curl -X GET "https://extraction-api.nanonets.com/api/v1/extract/results/12345" \
      -H "Authorization: Bearer $DOCSTRANGE_API_KEY"
    
    # Returns: {"status": "completed", "result": {...}}
    ```
    
    ## Advanced Features
    
    ### Bounding Boxes
    Get element coordinates for layout analysis:
    ```bash
    -F "include_metadata=bounding_boxes"
    ```
    
    ### Hierarchy Output
    Extract document structure (sections, tables, key-value pairs):
    ```bash
    -F "json_options=hierarchy_output"
    ```
    
    ### Financial Documents Mode
    Enhanced table and number formatting:
    ```bash
    -F "markdown_options=financial-docs"
    ```
    
    ### Custom Instructions
    Guide extraction with prompts:
    ```bash
    -F "custom_instructions=Focus on financial data. Ignore headers."
    -F "prompt_mode=append"
    ```
    
    ### Multiple Formats
    Request multiple formats in one call:
    ```bash
    -F "output_format=markdown,json"
    ```
    
    ## When to Use
    
    ### Use DocStrange For:
    - Invoice and receipt processing
    - Contract text extraction
    - Bank statement parsing
    - Form digitization
    - Image OCR (scanned documents)
    
    ### Don't Use For:
    - Documents >5 pages with sync (use async)
    - Video/audio transcription
    - Non-document images
    
    ## Best Practices
    
    | Document Size | Endpoint | Notes |
    |---------------|----------|-------|
    | <=5 pages | `/extract/sync` | Immediate response |
    | >5 pages | `/extract/async` | Poll for results |
    
    **JSON Extraction:**
    - Field list: `["field1", "field2"]` — quick extractions
    - JSON schema: `{"type": "object", ...}` — strict typing, nested data
    
    **Confidence Scores:**
    - Add `include_metadata=confidence_score`
    - Scores are 0-100 per field
    - Review fields <80 manually
    
    ## Schema Templates
    
    ### Invoice
    ```json
    {
      "type": "object",
      "properties": {
        "invoice_number": {"type": "string"},
        "date": {"type": "string"},
        "vendor": {"type": "string"},
        "total": {"type": "number"},
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": {"type": "string"},
              "quantity": {"type": "number"},
              "price": {"type": "number"}
            }
          }
        }
      }
    }
    ```
    
    ### Receipt
    ```json
    {
      "type": "object",
      "properties": {
        "merchant": {"type": "string"},
        "date": {"type": "string"},
        "total": {"type": "number"},
        "items": {
          "type": "array",
          "items": {"type": "object", "properties": {"name": {"type": "string"}, "price": {"type": "number"}}}
        }
      }
    }
    ```
    
    ## Troubleshooting
    
    **400 Bad Request:**
    - Provide exactly one input: `file`, `file_url`, or `file_base64`
    - Verify API key is valid
    
    **Sync Timeout:**
    - Use async for documents >5 pages
    - Poll `/extract/results/{record_id}`
    
    **Missing Confidence Scores:**
    - Requires `json_options` (field list or schema)
    - Add `include_metadata=confidence_score`
    
    ## References
    
    - **API Docs:** https://docstrange.nanonets.com/docs
    - **Get API Key:** https://docstrange.nanonets.com/app