About Me
Epidemiologist, data scientist, developer. Master’s in Epidemiology from McGill. Master’s in Computer and Information Technology from University of Pennsylvania. Co-author of many peer-reviewed publications. Experienced in epidemiologic and biostatistical methods, developing autonomous AI agent systems, and building inferential and predictive models for clinical applications. Expert in diverse real-world data sources (EMR, claims, next-gen sequencing), clinical trial data and standards, and multi-language development (R, Python, SQL, STAN, C++).
Contact
Technical Skills
Domain Expertise: Epidemiology, real-world evidence (RWE), observational study design, oncology drug development, biostatistics, causal inference
Generative AI:
- Frameworks: LangGraph, LangChain
- Context: retrieval-augmented generation (RAG), LLM fine-tuning, Model Context Protocol (MCP) deployment
- Inference: API integration (Portkey, Azure, AWS, OpenAI), local model deployment (ollama, Jan, MLStudio)
- Preferred LLMs and tools: Claude Opus 4.6 + GitHub Copilot or Claude Code (mostly in-line edits, moderate use of agent mode), Gemini 3 Pro (deep research tools)
Programming and Modeling:
- Data science / Statistics: R (expert; base R, tidyverse, data.table, R6), Python (expert; pandas, scikit-learn, pytorch)
- Bayesian modeling: Stan, JAGS
- Database: SQL
- Compiled / low-level: Java, C++
Interactive Applications & Visualization:
- Dashboards: Quarto, R Shiny, Python Streamlit, Python Chainlit
- Visualization: ggplot2, plotly, matplotlib
MLOps & Scientific Reproducibility:
- Environments: Docker/Podman, renv/virtualenv, uv
- Version Control: Git, GitHub / GitLab
Education
Master of Computer and Information Technology | University of Pennsylvania | 2026
Master of Science, Epidemiology | McGill University | 2016
Bachelor of Science, Chemistry | Wake Forest University | 2010
Experience
Principal Data Innovation Specialist | Genentech | 10/2025 - Present
Design and scale innovative solutions to support real-world and clinical data strategy. Lead the global real-world data analytics community (knowledge sharing, networking). Prototype autonomous agentic systems (RAG agents) to automate complex epidemiological workflows and enable and drive adoption of agentic frameworks (Claude code, GitHub copilot + MCP agents, Aider).
Principal Data Scientist | Genentech | 8/2024 - 10/2025
Designed and execute real-world/observational epidemiologic studies using EMR, claims, and NGS data to support product development in oncology. Developed dashboards, R packages, subject-matter expert Docker imgaes to support drug development and improve research reproducibility.
Senior Data Scientist | Genentech | 10/2021 - 8/2024
Data Scientist | Genentech | 3/2020 - 10/2021
Clinical Data Scientist | Verana Health | 11/2019 - 3/2020
Designed and executed real-world data epidemiologic studies in ophthalmology and neurology using proprietary EMR data. Mapped unstructured EMR data to structured clinical features.
Fellow | The Data Incubator | 9/2019 - 11/2019
Participated in an advanced, 8-week data science program designed to transition academic researchers to industry research. Created a Heroku app to predict the likelihood of drug approval from a clinical trial abstract using natural language processing.
Consultant | IQVIA | 8/2017 – 9/2019
Designed and managed studies of drug safety and effectiveness in secondary datasets for market access (single arm, historical comparator), label expansion, and post-marketing surveillance. Researched rare disease prevalence through literature reviews and steady-state disease modeling. Evaluated risk evaluation and mitigation strategy effectiveness.
Research Assistant | Lady Davis Institute | 7/2016 - 7/2017
Developed a method for a unique missing data problem in distributed data and evaluated its effectiveness in >1000GB of simulated patient-level data via a super-computer and in real EMR data from 59,957 patients in the UK. Designed a study advocating for increased study population restriction to reduce bias.
Research Assistant | Jill Baumgartner’s Group | 8/2014 – 6/2016
Conducted household air pollution measurements in China, creating an R program at the field site to convert raw sensor data from to climate modeling inputs. Analyzed the composition of 20 air pollution samples to identify pollution sources that generate high oxidative potential with factor analysis.
Senior Media Analyst | iCrossing | 11/2012 – 7/2014
Analyzed marketing data using a proprietary software.
Research Fellow | New York University | 8/2011 – 8/2012
Developed methods for chemical synthesis and instructed classes and laboratories.
Fulbright Fellow | US Department of State | 9/2010 – 5/2011
Taught English to French high schoolers and conducted sociologic researche informed by primary data collection.