Senior Design Project: Machine learning Classification for Diagnostics of Leukemia Based on Single Cell Cytometry Data Analysis
Implemented a novel ML pipeline via unsupervised-learning and DL models, addressing on automated gating of single-cell flow cytometry data for leukemia diagnosis
Background
Leukemia is a group of malignant disorders affecting the blood and blood-forming tissues in the bone marrow, lymphatic system, and spleen. There are four primary types of leukemia, including acute lymphocytic/lymphoblastic leukemia (ALL), acute myelogenous leukemia (AML) or acute non-lymphoblastic leukemia (ANLL), chronic myelogenous leukemia (CML), and chronic lymphocytic leukemia (CLL) with a number of subtypes Reference
Summary
This project aimed to develop an improved automated method for the diagnosis of leukemia, specifically Chronic Lymphocytic Leukemia (CLL) and Acute Lymphocytic Leukemia (ALL), by addressing the limitations of current diagnostic techniques. Traditional diagnosis relies on flow cytometry with a sequential manual gating process, which is prone to inconsistencies, inherited errors, and missed classifications due to its dependence on physician expertise and rigid boundaries. While a new automated gating technique, DAFi, has been proposed to mitigate some of these issues, it still requires manual intervention and user-provided examples.
Our objective explored a more accurate and consistent way for clustering flow cytometry data by implementing dimentionality reduction (Density-Biased UMAP) and deep learning using a small dataset of 60 patients. We have shown that with a limited size of samples, introducing multiple UMAP output generated by different templates increased the accuracy of predictions (~77%) made via CNN model for CLL diagnosis. For ALL diagnosis, using multiple templates for prediction and single-positive voting also improved model performance, which achieved a lower false-negative rate i.e. greater chances to diagnose hard cases.
This could enable effective diagnosis for large pools of patients, accommodate high-dimensional data, and ultimately lead to more precise and reliable diagnoses for blood-borne malignancies.
Info Notice: Results are confidential thus not shown here
Link to senior project page link