CDC: Centers for Disease Control and Prevention

Software Engineering Undergraduate Researcher

September 2024 – Present | San Diego, CA

Built metaWEPP, a Snakemake/Python metagenomics pipeline for detecting and analyzing novel viruses, which was submitted to top research conferences and journals such as RECOMB and NAR Genomics and Bioinformatics. I tuned hyperparameters to add support for over 16 million reads, achieving 99% classification accuracy. Furthermore, I automated CI/CD with GitHub Actions, built Docker images, and parallelized workflows for a 2X speedup, while automating dataset discovery using Google BigQuery and Python to accelerate validation across 50+ datasets by 3X.