sarak@portfolio:~

$ whoami

Sarak Dahal

$ cat role.txt

$ ls skills/

python/ spark/ aws/ ml_ai/ snowflake/ airflow/ data_architecture/ docker/

$ echo $STATUS

Processing 10M+ records daily | 40% pipeline performance gains | Enterprise data architecture | ML system design

> ABOUT_ME

cat about.md

> MS Data Science @ Regis University (GPA: 3.9) - Completed: Dec 2025

> BS Computer Science - Completed 2021

> 6+ Years Python, SQL & Data Architecture Experience

> Location: Denver, CO

Data Architect with proven expertise designing enterprise-scale data infrastructure, distributed ETL pipelines processing 10M+ records daily, and production ML systems. Specializing in Apache Spark, AWS services (S3, Lambda, EC2, CloudFormation, Glue, Kinesis, Redshift), and modern data architectures with high accuracy and uptime standards.

Experienced in pipeline performance optimization, cloud infrastructure automation, query tuning, and proactive monitoring system design. Published researcher in predictive ML models for Scope 3 emissions data accuracy.

> current_focus

  • $ Enterprise data architecture & governance
  • $ ML/AI system design & deployment
  • $ AWS cloud infrastructure (S3, Lambda, EC2, Glue, Kinesis)
  • $ Distributed data pipeline architecture

> system_metrics

10M+
Records Processed Daily
40%
Pipeline Performance Gain
15%
Data Quality Improvement
6+
Years Experience

> TECHNICAL_STACK

> languages.py

Python (6+ years) 95%
SQL/Spark SQL 90%
PySpark 85%

> aws_services.conf

S3 Lambda EC2 CloudFormation Glue Athena Kinesis Redshift

> big_data.sh

Apache Spark Hadoop Apache Airflow ETL/ELT Batch Processing Streaming Data Modeling

> databases.sql

PostgreSQL MySQL MongoDB Snowflake Oracle SQL Server

> devops.yaml

Docker Git Linux/Unix CI/CD IaC FastAPI Pandas NumPy

> ml_ai.model

PyTorch TensorFlow LightGBM XGBoost LSTM Transformers NLP Reinforcement Learning

> certifications.json

AWS Data Analytics IN_PROGRESS
Databricks Data Engineer IN_PROGRESS

> WORK_HISTORY

vim experience/research_role.py
class GraduateResearchAssistant:
    def __init__(self):
        self.role = "Graduate Research Assistant"
        self.company = "Regis University"
        self.duration = "Sep 2024 - Apr 2025"
        self.location = "Denver, CO"

    def achievements(self):
        return [
            "Designed end-to-end ETL pipeline for Scope 3 emissions (10M+ records)",
            "Reduced processing time by 40% using Spark distributed computation",
            "Developed optimized PySpark jobs for Snowflake data loading",
            "Integrated scalable KNN imputation improving data quality by 15%",
            "Co-authored published research on ML for emissions data accuracy"
        ]

    def tech_stack(self):
        return ["Apache Spark", "Airflow", "PySpark", "Snowflake", "Python"]
vim experience/previous_role.py
class DataEngineer:
    def __init__(self):
        self.role = "Data Engineer"
        self.company = "Appharu"
        self.duration = "Jan 2020 - Jul 2023"
        self.location = "Remote"

    def impact(self):
        return {
            "focus": "ETL pipeline architecture",
            "strengths": "Query optimization & monitoring",
            "databases": "PostgreSQL, MySQL, MongoDB",
            "scope": "Production data systems",
            "automation": "Data quality & validation frameworks"
        }

> PROJECTS

> healthcare_etl/

Enterprise-grade ETL pipeline design for processing large-scale healthcare records using distributed computation. Implemented parallel processing with Apache Spark achieving 40% performance improvement.

data_throughput:
10M+ daily
Apache Spark AWS S3 Snowflake PySpark
VIEW GITHUB

> aws_infra_automation/

Infrastructure as Code solution using AWS CloudFormation, automating provisioning of S3, Lambda, EC2, and RDS. Designed for rapid, repeatable environment deployment.

automation_level:
Fully automated
CloudFormation Lambda EC2 IaC
VIEW GITHUB

> realtime_analytics/

Real-time analytics dashboard integrated with Apache Airflow for automated reporting and Snowflake for analytics queries. Designed for fast query response and automated data refresh.

stack_depth:
Full-stack pipeline
Airflow Snowflake FastAPI Python
VIEW GITHUB

> agentproof/

Open-source AI agent testing framework published on PyPI (pip install agentproof). Architected test harness, assertion engine, and reporting pipeline for production AI agent validation.

test_coverage:
95%
Python pytest AI Agents PyPI
VIEW ON PYPI
cat research.bib

@inproceedings{dahal2024emissions,

title = {"Predictive Models for Scope 3 Emissions: Improving Accuracy with Machine Learning and Financial Data"},
author = {Dahal, S. and Pochampally, A. and Soraf, K.},
conference = {Marketing and Data Sciences},
institution = {Regis University},
year = {2024}

}

> CONNECT

./send_message.sh

$ ping sarak-dahal --resolve

Connection established. Ready to receive transmission...

$ compose_message --interactive