About Me
Hello, I’m Shihui (Shirley) Zhu, 祝诗慧!
Welcome to my personal website 
I’m a data scientist in the AstraZeneca R&D graduate program, currently based in Gaithersburg, MD, U.S., with two years of experience driving innovation at the intersection of data science and healthcare. I have a strong passion for harnessing data-driven solutions to advance medical research and ultimately improve patient outcomes.
My expertise spans statistical analysis, machine learning, and artificial intelligence applied to real-world, biological, imaging, and clinical datasets. I have led and delivered innovative projects, including the deployment of GenAI models across therapeutic areas such as respiratory disease and oncology. I am skilled in Python, R, SQL, and AWS, and excel at collaborating within cross-functional teams to translate complex methodologies into actionable, impactful insights.
Originally from Beijing, China, I pursued my higher education in the U.S., making me fluent in both Mandarin and English. Outside of work, I enjoy reading detective novels, swimming, and spending quality time with friends. I am actively exploring career opportunities in Beijing, Shanghai, or the U.S.—if my background interested with you, please feel free to get in touch :)
Education
-
Columbia University in the City of New York, New York, NY
September 2021 – May 2023
Master of Science: Biostatistics | GPA: 3.9
Relevant Coursework
Probability, Biostatistics, Regression Analysis, Data Mining, Statistical Inference, Data Science
-
University of California-San Diego, San Diego, CA
September 2017 – June 2021
Bachelor of Science: Mathematics-Computer Science, and Bioengineering: Bioinformatics | GPA: 3.7
Relevant Coursework
Linear Algebra, Numerical Analysis, Data Structure & Algorithm, OOP, Modeling and Computation in Bioengineering, Statistics, Bioinformatics
Skills
-
Programming Languages
- Proficient: Python, R, SQL, Bash/zsh scripting
- Experienced: Java, SAS, C/C++
-
Techniques
- Statistical analysis (regression, GLM, mixed effect model, survival analysis)
- ML/DL algorithms (classification, clustering, predictions)
- Bioinformatics data analysis (STAR, BLAST, etc.)
- Real-World Data analysis (claims and EHR data)
- Generative AI (Chatbot, LLM agents, LLMs fine-tuning, prompt engineering, etc.)
-
Tools & Technologies
- AWS (AWS sagemaker, bedrock, Redshift)
- Data analysis (
pandas
, matplotlib
, plotly
, statsmodels
), Machine learning & Deep Learning (scikit-learn
, Pytorch
, TensorFlow
)
- LLM (
LangChain
, LangGraph
, LlamaIndex
), NLP (NLTK
, spaCy
),
- R (
caret
, ggplot2
, glm
, etc.),
- Project management (Agile, Jira, Confluence)
Recent Posts
less than 1 minute read
Pharmacoepidemiology analysis of respiratory diseases (Asthma, COPD and Bronchiectasis)
1 minute read
A proof-of-concept (PofC) chatbot using agentic AI and AWS to accelerate clinical trial protocol document workflows
1 minute read
A PofC Computer Vision project that automated pre-screening pipeline for HRD in ovarian cancer patients using commerical imaging data
1 minute read
Implemented a novel ML pipeline via unsupervised-learning and DL models, addressing on automated gating of single-cell flow cytometry data for leukemia diagn...