About Me

Hello, I’m Shihui (Shirley) Zhu, 祝诗慧!

Welcome to my personal website :stuck_out_tongue:

I’m a data scientist in the AstraZeneca R&D graduate program, currently based in Gaithersburg, MD, U.S., with two years of experience driving innovation at the intersection of data science and healthcare. I have a strong passion for harnessing data-driven solutions to advance medical research and ultimately improve patient outcomes.

My expertise spans statistical analysis, machine learning, and artificial intelligence applied to real-world, biological, imaging, and clinical datasets. I have led and delivered innovative projects, including the deployment of GenAI models across therapeutic areas such as respiratory disease and oncology. I am skilled in Python, R, SQL, and AWS, and excel at collaborating within cross-functional teams to translate complex methodologies into actionable, impactful insights.

Originally from Beijing, China, I pursued my higher education in the U.S., making me fluent in both Mandarin and English. Outside of work, I enjoy reading detective novels, swimming, and spending quality time with friends. I am actively exploring career opportunities in Beijing, Shanghai, or the U.S.—if my background interested with you, please feel free to get in touch :)

Education

  • Columbia University in the City of New York, New York, NY
    September 2021 – May 2023
    Master of Science: Biostatistics  |  GPA: 3.9
    Relevant Coursework Probability, Biostatistics, Regression Analysis, Data Mining, Statistical Inference, Data Science
  • University of California-San Diego, San Diego, CA September 2017 – June 2021
    Bachelor of Science: Mathematics-Computer Science, and Bioengineering: Bioinformatics  |  GPA: 3.7
    Relevant Coursework Linear Algebra, Numerical Analysis, Data Structure & Algorithm, OOP, Modeling and Computation in Bioengineering, Statistics, Bioinformatics

Skills

  • Programming Languages
    • Proficient: Python, R, SQL, Bash/zsh scripting
    • Experienced: Java, SAS, C/C++
  • Techniques
    • Statistical analysis (regression, GLM, mixed effect model, survival analysis)
    • ML/DL algorithms (classification, clustering, predictions)
    • Bioinformatics data analysis (STAR, BLAST, etc.)
    • Real-World Data analysis (claims and EHR data)
    • Generative AI (Chatbot, LLM agents, LLMs fine-tuning, prompt engineering, etc.)
  • Tools & Technologies
    • AWS (AWS sagemaker, bedrock, Redshift)
    • Data analysis (pandas, matplotlib, plotly, statsmodels), Machine learning & Deep Learning (scikit-learn, Pytorch, TensorFlow)
    • LLM (LangChain, LangGraph, LlamaIndex), NLP (NLTK, spaCy),
    • R (caret, ggplot2, glm, etc.),
    • Project management (Agile, Jira, Confluence)

Recent Posts