Back to Skills
    🦞

    senior-data-scientist

    World-class data science skill

    By @alirezarezvani
    View on GitHub
    SKILL.md
    ---
    name: senior-data-scientist
    description: World-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication. Use when designing experiments, building predictive models, performing causal analysis, or driving data-driven decisions.
    ---
    
    # Senior Data Scientist
    
    World-class senior data scientist skill for production-grade AI/ML/Data systems.
    
    ## Quick Start
    
    ### Main Capabilities
    
    ```bash
    # Core Tool 1
    python scripts/experiment_designer.py --input data/ --output results/
    
    # Core Tool 2  
    python scripts/feature_engineering_pipeline.py --target project/ --analyze
    
    # Core Tool 3
    python scripts/model_evaluation_suite.py --config config.yaml --deploy
    ```
    
    ## Core Expertise
    
    This skill covers world-class capabilities in:
    
    - Advanced production patterns and architectures
    - Scalable system design and implementation
    - Performance optimization at scale
    - MLOps and DataOps best practices
    - Real-time processing and inference
    - Distributed computing frameworks
    - Model deployment and monitoring
    - Security and compliance
    - Cost optimization
    - Team leadership and mentoring
    
    ## Tech Stack
    
    **Languages:** Python, SQL, R, Scala, Go
    **ML Frameworks:** PyTorch, TensorFlow, Scikit-learn, XGBoost
    **Data Tools:** Spark, Airflow, dbt, Kafka, Databricks
    **LLM Frameworks:** LangChain, LlamaIndex, DSPy
    **Deployment:** Docker, Kubernetes, AWS/GCP/Azure
    **Monitoring:** MLflow, Weights & Biases, Prometheus
    **Databases:** PostgreSQL, BigQuery, Snowflake, Pinecone
    
    ## Reference Documentation
    
    ### 1. Statistical Methods Advanced
    
    Comprehensive guide available in `references/statistical_methods_advanced.md` covering:
    
    - Advanced patterns and best practices
    - Production implementation strategies
    - Performance optimization techniques
    - Scalability considerations
    - Security and compliance
    - Real-world case studies
    
    ### 2. Experiment Design Frameworks
    
    Complete workflow documentation in `references/experiment_design_frameworks.md` including:
    
    - Step-by-step processes
    - Architecture design patterns
    - Tool integration guides
    - Performance tuning strategies
    - Troubleshooting procedures
    
    ### 3. Feature Engineering Patterns
    
    Technical reference guide in `references/feature_engineering_patterns.md` with:
    
    - System design principles
    - Implementation examples
    - Configuration best practices
    - Deployment strategies
    - Monitoring and observability
    
    ## Production Patterns
    
    ### Pattern 1: Scalable Data Processing
    
    Enterprise-scale data processing with distributed computing:
    
    - Horizontal scaling architecture
    - Fault-tolerant design
    - Real-time and batch processing
    - Data quality validation
    - Performance monitoring
    
    ### Pattern 2: ML Model Deployment
    
    Production ML system with high availability:
    
    - Model serving with low latency
    - A/B testing infrastructure
    - Feature store integration
    - Model monitoring and drift detection
    - Automated retraining pipelines
    
    ### Pattern 3: Real-Time Inference
    
    High-throughput inference system:
    
    - Batching and caching strategies
    - Load balancing
    - Auto-scaling
    - Latency optimization
    - Cost optimization
    
    ## Best Practices
    
    ### Development
    
    - Test-driven development
    - Code reviews and pair programming
    - Documentation as code
    - Version control everything
    - Continuous integration
    
    ### Production
    
    - Monitor everything critical
    - Automate deployments
    - Feature flags for releases
    - Canary deployments
    - Comprehensive logging
    
    ### Team Leadership
    
    - Mentor junior engineers
    - Drive technical decisions
    - Establish coding standards
    - Foster learning culture
    - Cross-functional collaboration
    
    ## Performance Targets
    
    **Latency:**
    - P50: < 50ms
    - P95: < 100ms
    - P99: < 200ms
    
    **Throughput:**
    - Requests/second: > 1000
    - Concurrent users: > 10,000
    
    **Availability:**
    - Uptime: 99.9%
    - Error rate: < 0.1%
    
    ## Security & Compliance
    
    - Authentication & authorization
    - Data encryption (at rest & in transit)
    - PII handling and anonymization
    - GDPR/CCPA compliance
    - Regular security audits
    - Vulnerability management
    
    ## Common Commands
    
    ```bash
    # Development
    python -m pytest tests/ -v --cov
    python -m black src/
    python -m pylint src/
    
    # Training
    python scripts/train.py --config prod.yaml
    python scripts/evaluate.py --model best.pth
    
    # Deployment
    docker build -t service:v1 .
    kubectl apply -f k8s/
    helm upgrade service ./charts/
    
    # Monitoring
    kubectl logs -f deployment/service
    python scripts/health_check.py
    ```
    
    ## Resources
    
    - Advanced Patterns: `references/statistical_methods_advanced.md`
    - Implementation Guide: `references/experiment_design_frameworks.md`
    - Technical Reference: `references/feature_engineering_patterns.md`
    - Automation Scripts: `scripts/` directory
    
    ## Senior-Level Responsibilities
    
    As a world-class senior professional:
    
    1. **Technical Leadership**
       - Drive architectural decisions
       - Mentor team members
       - Establish best practices
       - Ensure code quality
    
    2. **Strategic Thinking**
       - Align with business goals
       - Evaluate trade-offs
       - Plan for scale
       - Manage technical debt
    
    3. **Collaboration**
       - Work across teams
       - Communicate effectively
       - Build consensus
       - Share knowledge
    
    4. **Innovation**
       - Stay current with research
       - Experiment with new approaches
       - Contribute to community
       - Drive continuous improvement
    
    5. **Production Excellence**
       - Ensure high availability
       - Monitor proactively
       - Optimize performance
       - Respond to incidents