Back to Skills
    🦞

    perf-profiler

    Profile and optimize application performance.

    By @gitgoodordietrying
    View on GitHub
    SKILL.md
    ---
    name: perf-profiler
    description: Profile and optimize application performance. Use when diagnosing slow code, measuring CPU/memory usage, generating flame graphs, benchmarking functions, load testing APIs, finding memory leaks, or optimizing database queries.
    metadata: {"clawdbot":{"emoji":"âš¡","requires":{"anyBins":["node","python3","go","curl","ab"]},"os":["linux","darwin","win32"]}}
    ---
    
    # Performance Profiler
    
    Measure, profile, and optimize application performance. Covers CPU profiling, memory analysis, flame graphs, benchmarking, load testing, and language-specific optimization patterns.
    
    ## When to Use
    
    - Diagnosing why an application or function is slow
    - Measuring CPU and memory usage
    - Generating flame graphs to visualize hot paths
    - Benchmarking functions or endpoints
    - Load testing APIs before deployment
    - Finding and fixing memory leaks
    - Optimizing database query performance
    - Comparing performance before and after changes
    
    ## Quick Timing
    
    ### Command-line timing
    
    ```bash
    # Time any command
    time my-command --flag
    
    # More precise: multiple runs with stats
    for i in $(seq 1 10); do
      /usr/bin/time -f "%e" my-command 2>&1
    done | awk '{sum+=$1; sumsq+=$1*$1; count++} END {
      avg=sum/count;
      stddev=sqrt(sumsq/count - avg*avg);
      printf "runs=%d avg=%.3fs stddev=%.3fs\n", count, avg, stddev
    }'
    
    # Hyperfine (better benchmarking tool)
    # Install: https://github.com/sharkdp/hyperfine
    hyperfine 'command-a' 'command-b'
    hyperfine --warmup 3 --runs 20 'my-command'
    hyperfine --export-json results.json 'old-version' 'new-version'
    ```
    
    ### Inline timing (any language)
    
    ```javascript
    // Node.js
    console.time('operation');
    await doExpensiveThing();
    console.timeEnd('operation'); // "operation: 142.3ms"
    
    // High-resolution
    const start = performance.now();
    await doExpensiveThing();
    const elapsed = performance.now() - start;
    console.log(`Elapsed: ${elapsed.toFixed(2)}ms`);
    ```
    
    ```python
    # Python
    import time
    
    start = time.perf_counter()
    do_expensive_thing()
    elapsed = time.perf_counter() - start
    print(f"Elapsed: {elapsed:.4f}s")
    
    # Context manager
    from contextlib import contextmanager
    
    @contextmanager
    def timer(label=""):
        start = time.perf_counter()
        yield
        elapsed = time.perf_counter() - start
        print(f"{label}: {elapsed:.4f}s")
    
    with timer("data processing"):
        process_data()
    ```
    
    ```go
    // Go
    start := time.Now()
    doExpensiveThing()
    fmt.Printf("Elapsed: %v\n", time.Since(start))
    ```
    
    ## Node.js Profiling
    
    ### CPU profiling with V8 inspector
    
    ```bash
    # Generate CPU profile (writes .cpuprofile file)
    node --cpu-prof app.js
    # Open the .cpuprofile in Chrome DevTools > Performance tab
    
    # Profile for a specific duration
    node --cpu-prof --cpu-prof-interval=100 app.js
    
    # Inspect running process
    node --inspect app.js
    # Open chrome://inspect in Chrome, click "inspect"
    # Go to Performance tab, click Record
    ```
    
    ### Heap snapshots (memory)
    
    ```bash
    # Generate heap snapshot
    node --heap-prof app.js
    
    # Take snapshots programmatically
    node -e "
    const v8 = require('v8');
    const fs = require('fs');
    
    // Take snapshot
    const snapshotStream = v8.writeHeapSnapshot();
    console.log('Heap snapshot written to:', snapshotStream);
    "
    
    # Compare heap snapshots to find leaks:
    # 1. Take snapshot A (baseline)
    # 2. Run operations that might leak
    # 3. Take snapshot B
    # 4. In Chrome DevTools > Memory, load both and use "Comparison" view
    ```
    
    ### Memory usage monitoring
    
    ```javascript
    // Print memory usage periodically
    setInterval(() => {
      const usage = process.memoryUsage();
      console.log({
        rss: `${(usage.rss / 1024 / 1024).toFixed(1)}MB`,
        heapUsed: `${(usage.heapUsed / 1024 / 1024).toFixed(1)}MB`,
        heapTotal: `${(usage.heapTotal / 1024 / 1024).toFixed(1)}MB`,
        external: `${(usage.external / 1024 / 1024).toFixed(1)}MB`,
      });
    }, 5000);
    
    // Detect memory growth
    let lastHeap = 0;
    setInterval(() => {
      const heap = process.memoryUsage().heapUsed;
      const delta = heap - lastHeap;
      if (delta > 1024 * 1024) { // > 1MB growth
        console.warn(`Heap grew by ${(delta / 1024 / 1024).toFixed(1)}MB`);
      }
      lastHeap = heap;
    }, 10000);
    ```
    
    ### Node.js benchmarking
    
    ```javascript
    // Simple benchmark function
    function benchmark(name, fn, iterations = 10000) {
      // Warmup
      for (let i = 0; i < 100; i++) fn();
    
      const start = performance.now();
      for (let i = 0; i < iterations; i++) fn();
      const elapsed = performance.now() - start;
    
      console.log(`${name}: ${(elapsed / iterations).toFixed(4)}ms/op (${iterations} iterations in ${elapsed.toFixed(1)}ms)`);
    }
    
    benchmark('JSON.parse', () => JSON.parse('{"key":"value","num":42}'));
    benchmark('regex match', () => /^\d{4}-\d{2}-\d{2}$/.test('2026-02-03'));
    ```
    
    ## Python Profiling
    
    ### cProfile (built-in CPU profiler)
    
    ```bash
    # Profile a script
    python3 -m cProfile -s cumulative my_script.py
    
    # Save to file for analysis
    python3 -m cProfile -o profile.prof my_script.py
    
    # Analyze saved profile
    python3 -c "
    import pstats
    stats = pstats.Stats('profile.prof')
    stats.sort_stats('cumulative')
    stats.print_stats(20)
    "
    
    # Profile a specific function
    python3 -c "
    import cProfile
    from my_module import expensive_function
    
    cProfile.run('expensive_function()', sort='cumulative')
    "
    ```
    
    ### line_profiler (line-by-line)
    
    ```bash
    # Install
    pip install line_profiler
    
    # Add @profile decorator to functions of interest, then:
    kernprof -l -v my_script.py
    ```
    
    ```python
    # Programmatic usage
    from line_profiler import LineProfiler
    
    def process_data(data):
        result = []
        for item in data:           # Is this loop the bottleneck?
            transformed = transform(item)
            if validate(transformed):
                result.append(transformed)
        return result
    
    profiler = LineProfiler()
    profiler.add_function(process_data)
    profiler.enable()
    process_data(large_dataset)
    profiler.disable()
    profiler.print_stats()
    ```
    
    ### Memory profiling (Python)
    
    ```bash
    # memory_profiler
    pip install memory_profiler
    
    # Profile memory line-by-line
    python3 -m memory_profiler my_script.py
    ```
    
    ```python
    from memory_profiler import profile
    
    @profile
    def load_data():
        data = []
        for i in range(1000000):
            data.append({'id': i, 'value': f'item_{i}'})
        return data
    
    # Track memory over time
    import tracemalloc
    
    tracemalloc.start()
    
    # ... run code ...
    
    snapshot = tracemalloc.take_snapshot()
    top_stats = snapshot.statistics('lineno')
    for stat in top_stats[:10]:
        print(stat)
    ```
    
    ### Python benchmarking
    
    ```python
    import timeit
    
    # Time a statement
    result = timeit.timeit('sorted(range(1000))', number=10000)
    print(f"sorted: {result:.4f}s for 10000 iterations")
    
    # Compare two approaches
    setup = "data = list(range(10000))"
    t1 = timeit.timeit('list(filter(lambda x: x % 2 == 0, data))', setup=setup, number=1000)
    t2 = timeit.timeit('[x for x in data if x % 2 == 0]', setup=setup, number=1000)
    print(f"filter: {t1:.4f}s  |  listcomp: {t2:.4f}s  |  speedup: {t1/t2:.2f}x")
    
    # pytest-benchmark
    # pip install pytest-benchmark
    # def test_sort(benchmark):
    #     benchmark(sorted, list(range(1000)))
    ```
    
    ## Go Profiling
    
    ### Built-in pprof
    
    ```go
    // Add to main.go for HTTP-accessible profiling
    import (
        "net/http"
        _ "net/http/pprof"
    )
    
    func main() {
        go func() {
            http.ListenAndServe("localhost:6060", nil)
        }()
        // ... rest of app
    }
    ```
    
    ```bash
    # CPU profile (30 seconds)
    go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
    
    # Memory profile
    go tool pprof http://localhost:6060/debug/pprof/heap
    
    # Goroutine profile
    go tool pprof http://localhost:6060/debug/pprof/goroutine
    
    # Inside pprof interactive mode:
    # top 20          - top functions by CPU/memory
    # list funcName   - source code with annotations
    # web             - open flame graph in browser
    # png > out.png   - save call graph as image
    ```
    
    ### Go benchmarks
    
    ```go
    // math_test.go
    func BenchmarkAdd(b *testing.B) {
        for i := 0; i < b.N; i++ {
            Add(42, 58)
        }
    }
    
    func BenchmarkSort1000(b *testing.B) {
        data := make([]int, 1000)
        for i := range data {
            data[i] = rand.Intn(1000)
        }
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            sort.Ints(append([]int{}, data...))
        }
    }
    ```
    
    ```bash
    # Run benchmarks
    go test -bench=. -benchmem ./...
    
    # Compare before/after
    go test -bench=. -count=5 ./... > old.txt
    # ... make changes ...
    go test -bench=. -count=5 ./... > new.txt
    go install golang.org/x/perf/cmd/benchstat@latest
    benchstat old.txt new.txt
    ```
    
    ## Flame Graphs
    
    ### Generate flame graphs
    
    ```bash
    # Node.js: 0x (easiest)
    npx 0x app.js
    # Opens interactive flame graph in browser
    
    # Node.js: clinic.js (comprehensive)
    npx clinic flame -- node app.js
    npx clinic doctor -- node app.js
    npx clinic bubbleprof -- node app.js
    
    # Python: py-spy (sampling profiler, no code changes needed)
    pip install py-spy
    py-spy record -o flame.svg -- python3 my_script.py
    
    # Profile running Python process
    py-spy record -o flame.svg --pid 12345
    
    # Go: built-in
    go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30
    # Navigate to "Flame Graph" view
    
    # Linux (any process): perf + flamegraph
    perf record -g -p PID -- sleep 30
    perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg
    ```
    
    ### Reading flame graphs
    
    ```
    Key concepts:
    - X-axis: NOT time. It's alphabetical sort of stack frames. Width = % of samples.
    - Y-axis: Stack depth. Top = leaf function (where CPU time is spent).
    - Wide bars at the top = hot functions (optimize these first).
    - Narrow tall stacks = deep call chains (may indicate excessive abstraction).
    
    What to look for:
    1. Wide plateaus at the top → function that dominates CPU time
    2. Multiple paths converging to one function → shared bottleneck
    3. GC/runtime frames taking significant width → memory pressure
    4. Unexpected functions appearing wide → performance bug
    ```
    
    ## Load Testing
    
    ### curl-based quick test
    
    ```bash
    # Single request timing
    curl -o /dev/null -s -w "HTTP %{http_code} | Total: %{time_total}s | TTFB: %{time_starttransfer}s | Connect: %{time_connect}s\n" https://api.example.com/endpoint
    
    # Multiple requests in s
    
    ... (truncated)