Machine Learning Methods in Clinical Flow Cytometry
<p>A diagrammatic overview of the model development workflow for supervised machine learning, from data annotation through training and validation to production-ready inference.</p> "> Figure 2
<p>Unsupervised learning algorithms. Labelled cells with unknown identities are analyzed by flow cytometry and partitioned in the feature space based on marker expression. Clustering methods are used to group data points based on their marker profiles, while dimensionality reduction can be applied to high-parameter datasets to simplify their representation and enhance the biological interpretability. These tools output cell annotations to identify distinct populations, enabling the characterization of disease states and facilitating novel discoveries.</p> "> Figure 3
<p>Twin networks: one neural network model is learned on paired examples. Pairs are weakly supervised, only requiring a label for if they are the “same” or not. N.b.: this supervised signal could be developed via an unsupervised method.</p> "> Figure 4
<p>AI model lifecycle from initial data analysis to model development and operationalization.</p> ">
1. Introduction
2. A Primer on Machine Learning for the Flow Cytometrist
3. Supervised Machine Learning Methods
3.1. Survey of Commonly Used Supervised ML Methods in Flow Cytometry
3.2. Support Vector Machines (SVMs)
3.3. Ensemble Techniques: Decision Trees, Random Forests, and Gradient-Boosted Trees
3.4. Neural Networks (NNs)
3.5. Multiple Instance Learning (MIL)
4. Unsupervised Machine Learning Methods
4.1. Clustering Algorithms
4.2. Linear Dimensionality Reduction Techniques
4.3. Non-Linear Dimensionality Reduction Techniques
5. Weakly Supervised Methods
5.1. Common Methods Used in Weakly Supervised Models
5.2. Representation Learning
5.3. Generative Models
5.4. Foundation Models and Transfer Learning
6. Tools and Infrastructure
6.1. Programming Languages
6.2. Models, Products, and Lifecycles
6.3. Infrastructure
7. Clinical Implementation
8. Potential Uses for Discovery
9. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Lansdorp, P.M.; Smith, C.; Safford, M.; Terstappen, L.W.; Thomas, T.E. Single laser three color immunofluorescence staining procedures based on energy transfer between phycoerythrin and cyanine 5. Cytometry 1991, 12, 723–730. [Google Scholar] [CrossRef] [PubMed]
- Perfetto, S.P.; Chattopadhyay, P.K.; Roederer, M. Seventeen-colour flow cytometry: Unravelling the immune system. Nat. Rev. Immunol. 2004, 4, 648–655. [Google Scholar] [CrossRef] [PubMed]
- Nolan, J.P.; Condello, D. Spectral flow cytometry. Curr. Protoc. Cytom. 2013, 1, 1.27.1–1.27.13. [Google Scholar] [CrossRef] [PubMed]
- Sahir, F.; Mateo, J.M.; Steinhoff, M.; Siveen, K.S. Development of a 43 color panel for the characterization of conventional and unconventional T-cell subsets, B cells, NK cells, monocytes, dendritic cells, and innate lymphoid cells using spectral flow cytometry. Cytom. A 2020, 105, 404–410. [Google Scholar] [CrossRef]
- Flores-Montero, J.; Sanoja-Flores, L.; Paiva, B.; Puig, N.; García-Sánchez, O.; Böttcher, S.; van der Velden, V.H.J.; Pérez-Morán, J.-J.; Vidriales, M.-B.; García-Sanz, R.; et al. Next Generation Flow for highly sensitive and standardized detection of minimal residual disease in multiple myeloma. Leukemia 2017, 31, 2094–2103. [Google Scholar] [CrossRef]
- Wolniak, K.; Goolsby, C.; Choi, S.; Ali, A.; Serdy, N.; Stetler-Stevenson, M. Report of the results of the International Clinical Cytometry Society and American Society for Clinical Pathology workload survey of clinical flow cytometry laboratories. Cytom. B Clin. Cytom. 2017, 92, 525–533. [Google Scholar] [CrossRef]
- Ding, M.; Edwards, B.S. High-Throughput Flow Cytometry in Drug Discovery. SLAS Discov. 2018, 23, 599–602. [Google Scholar] [CrossRef]
- Aghaeepour, N.; Finak, G.; The FlowCAP Consortium; The DREAM Consortium; Hoos, H.; Mosmann, T.R.; Brinkman, R.; Gottardo, R.; Scheuermann, R.H. Critical assessment of automated flow cytometry data analysis techniques. Nat. Methods 2013, 10, 228–238. [Google Scholar] [CrossRef]
- Aghaeepour, N.; Chattopadhyay, P.; Chikina, M.; Dhaene, T.; Van Gassen, S.; Kursa, M.; Lambrecht, B.N.; Malek, M.; McLachlan, G.J.; Qian, Y.; et al. A benchmark for evaluation of algorithms for identification of cellular correlates of clinical outcomes. Cytom. A 2016, 89, 16–21. [Google Scholar] [CrossRef]
- Dean, P.N.; Bagwell, C.B.; Lindmo, T.; Murphy, R.F.; Salzman, G.C. Introduction to flow cytometry data file standard. Cytometry 1990, 11, 321–322. [Google Scholar] [CrossRef]
- Spidlen, J.; Moore, W.; Parks, D.; Goldberg, M.; Blenman, K.; Cavenaugh, J.S.; ISAC Data Standards Task Force; Brinkman, R. Data File Standard for Flow Cytometry, Version FCS 3.2. Cytom. A 2021, 99, 100–102. [Google Scholar] [CrossRef] [PubMed]
- White, S.; Quinn, J.; Enzor, J.; Staats, J.; Mosier, S.M.; Almarode, J.; Denny, T.N.; Weinhold, K.J.; Ferrari, G.; Chan, C. FlowKit: A Python Toolkit for Integrated Manual and Automated Cytometry Analysis Workflows. Front. Immunol. 2021, 12, 768541. [Google Scholar] [CrossRef] [PubMed]
- Ellis, B.; Haaland, P.; Hahne, F.; Le Meur, N.; Gopalakrishnan, N.; Spidlen, J.; Jiang, M.; Finak, G. flowCore: Basic Structures for Flow Cytometry Data, R Package Version 2.14.1. 2024. Available online: https://bioconductor.org/packages/release/bioc/html/flowCore.html (accessed on 27 January 2025).
- Monaco, G.; Chen, H.; Poidinger, M.; Chen, J.; de Magalhães, J.P.; Larbi, A. flowAI: Automatic and interactive anomaly discerning tools for flow cytometry data. Bioinformatics 2016, 32, 2473–2480. [Google Scholar] [CrossRef] [PubMed]
- Fletez-Brant, K.; Špidlen, J.; Brinkman, R.R.; Roederer, M.; Chattopadhyay, P.K. flowClean: Automated Identification and Removal of Fluorescence Anomalies in Flow Cytometry Data. Cytom. A 2016, 89, 461–471. [Google Scholar] [CrossRef] [PubMed]
- Van Gassen, S.; Callebaut, B.; Van Helden, M.J.; Lambrecht, B.N.; Demeester, P.; Dhaene, T.; Saeys, Y. FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytom. A 2015, 87, 636–645. [Google Scholar] [CrossRef]
- McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv 2020, arXiv:1802.03426. [Google Scholar]
- Matthews, B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 1975, 405, 442–451. [Google Scholar] [CrossRef]
- Guido, R.; Ferrisi, S.; Lofaro, D.; Conforti, D. An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review. Information 2024, 15, 235. [Google Scholar] [CrossRef]
- Ng, D.P.; Wu, D.; Wood, B.L.; Fromm, J.R. Computer-aided detection of rare tumor populations in flow cytometry: An example with classic Hodgkin lymphoma. Am. J. Clin. Pathol. 2015, 144, 517–524. [Google Scholar] [CrossRef]
- Monaghan, S.A.; Li, J.-L.; Liu, Y.-C.; Ko, M.-Y.; Boyiadzis, M.; Chang, T.-Y.; Wang, Y.-F.; Lee, C.-C.; Swerdlow, S.H.; Ko, B.-S. A Machine Learning Approach to the Classification of Acute Leukemias and Distinction from Nonneoplastic Cytopenias Using Flow Cytometry Data. Am. J. Clin. Pathol. 2022, 157, 546–553. [Google Scholar] [CrossRef]
- Mienye, I.D.; Jere, N. A Survey of Decision Trees: Concepts, Algorithms, and Applications. IEEE Access 2024, 12, 86716–86727. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Ng, D.P.; Zuromski, L.M. Augmented Human Intelligence and Automated Diagnosis in Flow Cytometry for Hematologic Malignancies. Am. J. Clin. Pathol. 2021, 155, 597–605. [Google Scholar] [CrossRef] [PubMed]
- Zuromski, L.M.; Durtschi, J.; Aziz, A.; Chumley, J.; Dewey, M.; English, P.; Morrison, M.; Simmon, K.; Whipple, B.; O’Fallon, B.; et al. Clinical Validation of a Real-Time Machine Learning-based System for the Detection of Acute Myeloid Leukemia by Flow Cytometry. arXiv 2024, arXiv:2409.11350. [Google Scholar]
- McElfresh, D.; Khandagale, S.; Valverde, J.; Prasad, C.V.; Feuer, B.; Hegde, C.; Ramakrishnan, G.; Goldblum, M.; White, C. When Do Neural Nets Outperform Boosted Trees on Tabular Data? arXiv 2024, arXiv:2305. [Google Scholar]
- Simonson, P.D.; Wu, Y.; Wu, D.; Fromm, J.R.; Lee, A.Y. De Novo Identification and Visualization of Important Cell Populations for Classic Hodgkin Lymphoma Using Flow Cytometry and Machine Learning. Am. J. Clin. Pathol. 2021, 156, 1092–1102. [Google Scholar] [CrossRef]
- Zhao, M.; Mallesh, N.; Höllein, A.; Schabath, R.; Haferlach, C.; Haferlach, T.; Elsner, F.; Lüling, H.; Krawitz, P.; Kern, W. Hematologist-Level Classification of Mature B-Cell Neoplasm Using Deep Learning on Multiparameter Flow Cytometry Data. Cytom. A 2020, 97, 1073–1080. [Google Scholar] [CrossRef]
- Mallesh, N.; Zhao, M.; Meintker, L.; Höllein, A.; Elsner, F.; Lüling, H.; Haferlach, T.; Kern, W.; Westermann, J.; Brossart, P.; et al. Knowledge transfer to enhance the performance of deep learning models for automated classification of B cell neoplasms. Patterns 2021, 2, 100351. [Google Scholar] [CrossRef]
- Hu, Z.; Tang, A.; Singh, J.; Bhattacharya, S.; Butte, A.J. A robust and interpretable end-to-end deep learning model for cytometry data. Proc. Natl. Acad. Sci. USA 2020, 117, 21373–21380. [Google Scholar] [CrossRef]
- Salama, M.E.; Otteson, G.E.; Camp, J.J.; Seheult, J.N.; Jevremovic, D.; Holmes, D.R.; Olteanu, H.; Shi, M. Artificial Intelligence Enhances Diagnostic Flow Cytometry Workflow in the Detection of Minimal Residual Disease of Chronic Lymphocytic Leukemia. Cancers 2022, 14, 2537. [Google Scholar] [CrossRef]
- Ilse, M.; Tomczak, J.; Welling, M. Attention-based Deep Multiple Instance Learning. In Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 2127–2136. Available online: https://proceedings.mlr.press/v80/ilse18a.html (accessed on 13 November 2024).
- Cheplygina, V.; Tax, D.M.J.; Loog, M. On classification with bags, groups and sets. Pattern Recognit. Lett. 2015, 59, 11–17. [Google Scholar] [CrossRef]
- Lewis, J.E.; Cooper, L.A.D.; Jaye, D.L.; Pozdnyakova, O. Automated Deep Learning-Based Diagnosis and Molecular Characterization of Acute Myeloid Leukemia Using Flow Cytometry. Mod. Pathol. 2024, 37, 100373. [Google Scholar] [CrossRef] [PubMed]
- Ge, Y.; Sealfon, S.C. flowPeaks: A fast unsupervised clustering for flow cytometry data via K-means and density peak finding. Bioinformatics 2012, 28, 2052–2058. [Google Scholar] [CrossRef] [PubMed]
- Wong, N.; Kim, D.; Robinson, Z.; Huang, C.; Conboy, I.M. K-means quantization for a web-based open-source flow cytometry analysis platform. Sci. Rep. 2021, 11, 6735. [Google Scholar] [CrossRef] [PubMed]
- Ye, X.; Ho, J.W.K. Ultrafast clustering of single-cell flow cytometry data using FlowGrid. BMC Syst. Biol. 2019, 13 (Suppl. S2), 35. [Google Scholar] [CrossRef]
- Rubbens, P.; Props, R.; Kerckhof, F.-M.; Boon, N.; Waegeman, W. PhenoGMM: Gaussian Mixture Modeling of Cytometry Data Quantifies Changes in Microbial Community Structure. mSphere 2021, 6, e00530-20. [Google Scholar] [CrossRef]
- Ionita, M.; Schretzenmair, R.; Jones, D.; Moore, J.; Wang, L.; Rogers, W. Tailor: Targeting heavy tails in flow cytometry data with fast, interpretable mixture modeling. Cytom. Part A 2021, 99, 133–144. [Google Scholar] [CrossRef]
- Ko, B.-S.; Wang, Y.-F.; Li, J.-L.; Li, C.-C.; Weng, P.-F.; Hsu, S.-C.; Hou, H.-A.; Huang, H.-H.; Yao, M.; Lin, C.-T.; et al. Clinically validated machine learning algorithm for detecting residual diseases with multicolor flow cytometry analysis in acute myeloid leukemia and myelodysplastic syndrome. EBioMedicine 2018, 37, 91–100. [Google Scholar] [CrossRef]
- Saeys, Y.; Van Gassen, S.; Lambrecht, B.N. Computational flow cytometry: Helping to make sense of high-dimensional immunology data. Nat. Rev. Immunol. 2016, 16, 449–462. [Google Scholar] [CrossRef]
- May, M.; Hewitt, T.; Mashford, B.; Hammill, D.; Davies, A.; Andrews, T.D. Benchmark of Wide Range of Pairwise Distance Metrics for Automated Classification of Mouse Mutant Phenotypes from Flow Cytometry Data. bioRxiv 2025, preprint. [Google Scholar] [CrossRef]
- Orlova, D.Y.; Herzenberg, L.A.; Walther, G. Science not art: Statistically sound methods for identifying subsets in multi-dimensional flow and mass cytometry data sets. Nat. Rev. Immunol. 2018, 18, 77. [Google Scholar] [CrossRef]
- Gachon, E.; Bigot, J.; Cazelles, E.; Bidet, A.; Vial, J.P.; Dumas, P.Y.; Mimoun, A. Low dimensional representation of multi-patient flow cytometry datasets using optimal transport for minimal residual disease detection in leukemia. arXiv 2024, arXiv:2407.17329. [Google Scholar]
- Kalina, T.; Flores-Montero, J.; van der Velden, V.H.J.; Martin-Ayuso, M.; Böttcher, S.; Ritgen, M.; Almeida, J.; Lhermitte, L.; Asnafi, V.; Mendonça, A.; et al. EuroFlow standardization of flow cytometer instrument settings and immunophenotyping protocols. Leukemia 2012, 26, 1986–2010. [Google Scholar] [CrossRef] [PubMed]
- Jiménez-Sánchez, D.; Ariz, M.; Morgado, J.M.; Cortés-Domínguez, I.; Ortiz-De-Solórzano, C. NMF-RI: Blind spectral unmixing of highly mixed multispectral flow and image cytometry data. Bioinformatics 2020, 36, 1590–1598. [Google Scholar] [CrossRef] [PubMed]
- Becht, E.; McInnes, L.; Healy, J.; Dutertre, C.-A.; Kwok, I.W.H.; Ng, L.G.; Ginhoux, F.; Newell, E.W. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 2018, 37, 38–44. [Google Scholar] [CrossRef]
- van den Akker, T.; Patel, S.; Simonson, P. Development of a generalizable UMAP-based approach for comparing clinical flow cytometry data with application to NPM1-mutated AML cohorts. Am. J. Clin. Pathol. 2022, 158 (Suppl. 1), S26–S27. [Google Scholar] [CrossRef]
- Belkina, A.C.; Ciccolella, C.O.; Anno, R.; Halpert, R.; Spidlen, J.; Snyder-Cappione, J.E. Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nat Commun. 2019, 10, 5415. [Google Scholar] [CrossRef]
- Linderman, G.C.; Rachh, M.; Hoskins, J.G.; Steinerberger, S.; Kluger, Y. Fast Interpolation-based t-SNE for Improved Visualization of Single-Cell RNA-Seq Data. Nat. Methods. 2019, 16, 243–245. [Google Scholar] [CrossRef]
- Zhu, X.; Ghahramani, Z. Learning from Labeled and Unlabeled Data with Label Propagation. 2002. Available online: https://www.semanticscholar.org/paper/Learning-from-labeled-and-unlabeled-data-with-label-Zhu-Ghahramani/2a4ca461fa847e8433bab67e7bfe4620371c1f77 (accessed on 13 November 2024).
- Iscen, A.; Tolias, G.; Avrithis, Y.; Chum, O. Label Propagation for Deep Semi-Supervised Learning. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5070–5079. [Google Scholar]
- Zhang, Z.; Luo, D.; Zhong, X.; Choi, J.H.; Ma, Y.; Wang, S.; Mahrt, E.; Guo, W.; Stawiski, E.W.; Modrusan, Z.; et al. SCINA: A Semi-Supervised Subtyping Algorithm of Single Cells and Bulk Samples. Genes 2019, 10, 531. [Google Scholar] [CrossRef]
- Krishnan, R.; Rajpurkar, P.; Topol, E.J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng. 2022, 6, 1346–1352. [Google Scholar] [CrossRef]
- Zhai, X.; Oliver, A.; Kolesnikov, A.; Beyer, L. S4L: Self-Supervised Semi-Supervised Learning. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1476–1485. [Google Scholar]
- Maron, O.; Lozano-Pérez, T. A Framework for Multiple-Instance Learning. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1997; Volume 10, Available online: https://proceedings.neurips.cc/paper_files/paper/1997/hash/82965d4ed8150294d4330ace00821d77-Abstract.html (accessed on 13 November 2024).
- Koch, G.R. Siamese Neural Networks for One-Shot Image Recognition. 2015. Available online: https://www.semanticscholar.org/paper/Siamese-Neural-Networks-for-One-Shot-Image-Koch/f216444d4f2959b4520c61d20003fa30a199670a (accessed on 13 November 2024).
- Chen, C.J.; Yi, H.; Stanley, N. Conditional Similarity Triplets Enable Covariate-Informed Representations of Single-Cell Data. arXiv 2024, arXiv:2406.08638. [Google Scholar]
- Hoffer, E.; Ailon, N. Deep Metric Learning Using Triplet Network. In Similarity-Based Pattern Recognition; Feragen, A., Pelillo, M., Loog, M., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 84–92. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. arXiv 2020, arXiv:2002.05709. [Google Scholar]
- Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. arXiv 2014, arXiv:1206.5538. [Google Scholar] [CrossRef] [PubMed]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2014, arXiv:1312.6114. [Google Scholar]
- Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders. arXiv 2019, arXiv:1906.02691. [Google Scholar]
- Champion, K.; Lusch, B.; Kutz, J.N.; Brunton, S.L. Data-driven discovery of coordinates and governing equations. Proc. Natl. Acad. Sci. USA 2019, 116, 22445–22451. [Google Scholar] [CrossRef]
- Scheithe, J.; Licandro, R.; Rota, P.; Reiter, M.; Diem, M.; Kampel, M. Monitoring Acute Lymphoblastic Leukemia Therapy with Stacked Denoising Autoencoders. In Computer Aided Intervention and Diagnostics in Clinical and Medical Images; Peter, J.D., Fernandes, S.L., Eduardo Thomaz, C., Viriri, S., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 189–197. [Google Scholar] [CrossRef]
- Inecik, K.; Meric, A.; König, L.; Theis, F.J. flowVI: Flow Cytometry Variational Inference. bioRxiv 2023, preprint. [Google Scholar] [CrossRef]
- Driessen, A.; Unger, S.; Nguyen, A.P.; Ries, R.E.; Meshinchi, S.; Kreutmair, S.; Alberti, C.; Sumazin, P.; Aplenc, R.; Redell, M.S.; et al. Identification of single-cell blasts in pediatric acute myeloid leukemia using an autoencoder. Life Sci. Alliance 2024, 7, e202402674. [Google Scholar] [CrossRef]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
- Weijler, L.; Kowarsch, F.; Reiter, M.; Hermosilla, P.; Maurer-Granofszky, M.; Dworzak, M. FATE: Feature-Agnostic Transformer-based Encoder for learning generalized embedding spaces in flow cytometry data. arXiv 2023, arXiv:2311.03314. [Google Scholar]
- Kraus, O.; Kenyon-Dean, K.; Saberian, S.; Fallah, M.; McLean, P.; Leung, J.; Sharma, V.; Khan, A.; Balakrishnan, J.; Celik, S.; et al. Masked Autoencoders are Scalable Learners of Cellular Morphology. arXiv 2023, arXiv:2309.16064. [Google Scholar]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked Autoencoders Are Scalable Vision Learners. arXiv 2021, arXiv:2111.06377. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv 2022, arXiv:2112.10752. [Google Scholar]
- Corbetta, M.; Shulman, G.L. Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 2002, 3, 201–215. [Google Scholar] [CrossRef]
- Szałata, A.; Hrovatin, K.; Becker, S.; Tejada-Lapuerta, A.; Cui, H.; Wang, B.; Theis, F.J. Transformers in single-cell omics: A review and new perspectives. Nat Methods 2024, 21, 1430–1443. [Google Scholar] [CrossRef]
- Lee, J.; Lee, Y.; Kim, J.; Kosiorek, A.R.; Choi, S.; Teh, Y.W. Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks. arXiv 2019, arXiv:1810.00825. [Google Scholar]
- Xu, H.; Usuyama, N.; Bagga, J.; Zhang, S.; Rao, R.; Naumann, T.; Wong, C.; Gero, Z.; González, J.; Gu, Y.; et al. A whole-slide foundation model for digital pathology from real-world data. Nature 2024, 630, 181–188. [Google Scholar] [CrossRef]
- Zimmermann, E.; Vorontsov, E.; Viret, J.; Casson, A.; Zelechowski, M.; Shaikovski, G.; Tenenholtz, N.; Hall, J.; Klimstra, D.; Yousfi, R.; et al. Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology. arXiv 2024, arXiv:2408.00738. [Google Scholar]
- Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap your own latent: A new approach to self-supervised Learning. arXiv 2020, arXiv:2006.07733. [Google Scholar]
- Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; et al. DINOv2: Learning Robust Visual Features without Supervision. arXiv 2024, arXiv:2304.07193. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv 2019, arXiv:1912.01703. [Google Scholar]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Available online: https://www.tensorflow.org/ (accessed on 27 January 2025).
- Bradbury, J.; Frostig, R.; Hawkins, P.; Johnson, M.J.; Leary, C.; Maclaurin, D.; Necula, G.; Paszke, A.; VanderPlas, J.; Wanderman-Milne, S.; et al. JAX: Composable Transformations of Python+NumPy Programs. Available online: http://github.com/jax-ml/jax (accessed on 27 January 2025).
- Bezanson, J.; Edelman, A.; Karpinski, S.; Shah, V.B. Julia: A Fresh Approach to Numerical Computing. SIAM Rev. 2017, 59, 65–98. [Google Scholar] [CrossRef]
- Gao, K.; Mei, G.; Piccialli, F.; Cuomo, S.; Tu, J.; Huo, Z. Julia language in machine learning: Algorithms, applications, and open issues. Comput. Sci. Rev. 2020, 37, 100254. [Google Scholar] [CrossRef]
- Poddar, M.; Marwaha, J.S.; Yuan, W.; Romero-Brufau, S.; Brat, G.A. An operational guide to translational clinical machine learning in academic medical centers. NPJ Digit Med. 2024, 7, 129. [Google Scholar] [CrossRef]
- Rajagopal, A.; Ayanian, S.; Ryu, A.J.; Qian, R.; Legler, S.R.; Peeler, E.A.; Issa, M.; Coons, T.J.; Kawamoto, K. Machine Learning Operations in Health Care: A Scoping Review. Mayo Clin. Proc. Digit. Health 2024, 2, 421–437. [Google Scholar] [CrossRef]
- Bogdanoski, G.; Lucas, F.; Kern, W.; Czechowska, K. Translating the regulatory landscape of medical devices to create fit-for-purpose artificial intelligence (AI) cytometry solutions. Cytom. B Clin. Cytom. 2024, 106, 294–307. [Google Scholar] [CrossRef]
- Ng, D.P.; Simonson, P.D.; Tarnok, A.; Lucas, F.; Kern, W.; Rolf, N.; Bogdanoski, G.; Green, C.; Brinkman, R.R.; Czechowska, K. Recommendations for using artificial intelligence in clinical flow cytometry. Cytom. B Clin. Cytom. 2024, 106, 228–238. [Google Scholar] [CrossRef]
- Spies, N.C.; Farnsworth, C.W.; Wheeler, S.; McCudden, C.R. Validating, Implementing, and Monitoring Machine Learning Solutions in the Clinical Laboratory Safely and Effectively. Clin. Chem. 2024, 70, 1334–1343. [Google Scholar] [CrossRef]
- Medical Devices. Laboratory Developed Tests. In Federal Register.; 2024. Available online: https://www.federalregister.gov/documents/2024/05/06/2024-08935/medical-devices-laboratory-developed-tests (accessed on 14 August 2024).
- Vial, J.P.; Lechevalier, N.; Lacombe, F.; Dumas, P.Y.; Bidet, A.; Leguay, T.; Vergez, F.; Pigneux, A.; Béné, M.C. Unsupervised Flow Cytometry Analysis Allows for an Accurate Identification of Minimal Residual Disease Assessment in Acute Myeloid Leukemia. Cancers 2021, 13, 629. [Google Scholar] [CrossRef]
- Guess, T.; Potts, C.R.; Bhat, P.; Cartailler, J.A.; Brooks, A.; Holt, C.; Yenamandra, A.; Wheeler, F.C.; Savona, M.R.; Cartailler, J.P.; et al. Distinct Patterns of Clonal Evolution Drive Myelodysplastic Syndrome Progression to Secondary Acute Myeloid Leukemia. Blood Cancer Discov. 2022, 3, 316–329. [Google Scholar] [CrossRef] [PubMed]
- Evrard, M.; Becht, E.; Fonseca, R.; Obers, A.; Park, S.L.; Ghabdan-Zanluqui, N.; Schroeder, J.; Christo, S.N.; Schienstock, D.; Lai, J.; et al. Single-cell protein expression profiling resolves circulating and resident memory T cell diversity across tissues and infection contexts. Immunity 2023, 56, 1664–1680.e9. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Wang, Z.; Jiang, Y.; Littler, D.R.; Gerstein, M.; Purcell, A.W.; Rossjohn, J.; Ou, H.Y.; Song, J. Epitope-anchored contrastive transfer learning for paired CD8+ T cell receptor–antigen recognition. Nat. Mach. Intell. 2024, 6, 1344–1358. [Google Scholar] [CrossRef]
- Hashimoto, N.; Hanada, H.; Miyoshi, H.; Nagaishi, M.; Sato, K.; Hontani, H.; Ohshima, K.; Takeuchi, I. Multimodal Gated Mixture of Experts Using Whole Slide Image and Flow Cytometry for Multiple Instance Learning Classification of Lymphoma. J. Pathol. Inform. 2024, 15, 100359. [Google Scholar] [CrossRef]
- Lundberg, S.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.078744. [Google Scholar]
Algorithm | Class | Description | Advantages | Disadvantages | Common Uses |
---|---|---|---|---|---|
Support Vector Machines (SVMs) | Supervised | Finds a decision boundary (hyperplane) that maximizes the margin between classes (for classification) or fits the best line/hyperplane (for regression). Uses kernels to handle non-linear boundaries. | Effective in high-dimensional spaces Robust against overfitting with regularization Able to handle complex boundaries | Computationally expensive Relatively uninterpretable | General classification/regression |
Decision Trees | Supervised | Uses a tree-like model of decisions based on feature values. Each internal node represents a test on a feature; each branch is an outcome of the test, and each leaf is a class/regression outcome. | Easy to interpret and visualize Handles numerical and categorical data Fast training and inference | Prone to overfitting if not pruned Limited in complexity Susceptible to class imbalance | Simple systems Applications requiring interpretability |
Random Forest | Supervised | An ensemble of decision trees, aggregated by voting for classification or averaging for regression. Each tree is trained on a bootstrapped subset of data with random subsets of features. | Minimal hyperparameter optimization Robust against outliers and noise Handles high-dimensional data | Less interpretable than a single tree Resource-intensive to train Typically less effective than boosting | General purpose model for tabular data |
Gradient-Boosted Trees (XGBoost) | Supervised | Sequentially builds an ensemble of weak prediction trees, where each subsequent tree attempts to correct the errors of the previous ones. XGBoost is a popular optimized framework for gradient-boosting. | State of the art for tabular data Handles missing data and outliers well Highly tunable for performance nuance | Prone to overfitting Hyperparameter tuning is expensive Requires careful regularization | Highly effective models for tabular data |
Neural Networks | Variable | Inspired by the structure of biological neurons. Consists of layers of interconnected “neurons” that learn hierarchical representations of data through backpropagation. | Captures complex non-linear relationships Highly flexible architectures Scales well with large datasets | Computationally intensive Requires large training datasets Hyperparameter tuning is complex, but crucial | Effective models for tabular, language, vision, and more |
K-means Clustering | Unsupervised | Groups data into KK clusters by minimizing within-cluster variance. Iteratively updates cluster centroids and assignments until convergence. | Simple to implement Fast for moderate-sized datasets Requires spherical well-separated clusters | Must specify the number of clusters Sensitive to outliers Poor performance on varying cluster layouts | Customer segmentation Image compression Data exploration |
Density-Based Spatial Clustering of Applications with Noise (DBSCAN) | Unsupervised | Groups together points that are closely packed together (points with many nearby neighbors), marking as outliers the points that lie alone in low-density regions. | Can find arbitrarily shaped clusters Robust against outliers/noise | Poor for clusters with varying densities Sensitive to hyperparameter choices | Geospatial data analysis Anomaly detection Clustering with irregular shapes/densities |
Gaussian Mixture Models (GMMs) | Unsupervised | Assumes data are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. Each Gaussian distribution is characterized by its mean and covariance. | Probabilistic cluster membership Can model overlapping clusters Flexible to distribution type | Must specify the number of components Sensitive to initialization Can converge to local optima | Probabilistic clustering Anomaly detection Data distribution modeling |
Principal Component Analysis (PCA) | Unsupervised | Transforms data into new orthogonal axes (principal components) that capture the directions of maximum variance. The top components retain most of the variance in the data. | Effective dimensionality reduction Speeds up subsequent training Removes correlation among features | Fails to capture non-linear relationships Principal components lack interpretability | Dimensionality reduction Preprocessing for other models |
Non-Negative Matrix Factorization (NMF) | Unsupervised | Factorizes a non-negative data matrix into the product of two smaller non-negative matrices, interpreting data as “parts-based” additive combinations. | Produces interpretable decomposition Allows text and image input | Sensitive to initialization and local minima Only applicable to non-negative data May require careful tuning | Topic modeling in text analysis Image feature extraction Recommender systems |
Self-Organizing Maps (SOMs) | Unsupervised | Neural network that uses competitive learning. Maps high-dimensional data onto a low-dimensional (usually 2D) grid, preserving topological or neighborhood structure in the data. | Good for dimensionality reduction Preserves topological relationships Can reveal cluster structures visually | Susceptible to hyperparameters Tougher to interpret than linear methods Computationally intensive | High-dimensional data visualization Exploratory data analysis |
Uniform Manifold Approximation and Projection (UMAP) | Unsupervised | Graph-based method for dimensionality reduction that approximates the manifold structure of the data. Preserves local and global structure, often used for visualization in 2D or 3D. | Preserves local and global data structure Fast, even on large datasets Produces visually interpretable embeddings | Embeddings are sensitive to hyperparameters Interpretation of axes is not straightforward Non-deterministic unless forced | High-dimensional data visualization Exploratory data analysis |
t-Distributed Stochastic Neighbor Embedding (t-SNE) | Unsupervised | Non-linear technique that converts distances between points into probabilities, aiming to preserve local neighborhoods in a lower-dimensional space (usually 2D). | Imparts clusters on high-dimensionality data | Computationally intensive Requires hyperparameter tuning Potential for misleading visual artifacts | High-dimensional data visualization Exploratory data analysis |
Auto-encoders | Unsupervised | Neural networks that learn to compress (encode) data into a latent space and then reconstruct (decode) it. Can be adapted for denoising, anomaly detection, or generative tasks. | Learns complex non-linear embeddings Can reduce dimensionality and remove noise Highly versatile | Prone to overfitting Architecture has significant performance impact Difficult to interpret | Dimensionality reduction Noise reduction Synthetic data generation |
Transformers | Variable | Neural network architecture that uses self-attention mechanisms to process input sequences in parallel, avoiding recurrence. Originally designed for NLP but extended to images and more. | State of the art in many tasks Highly parallelizable training Can capture long-range dependencies | Data- and compute-intensive training Larger models are slower at inference time Simpler models may be sufficient for simpler tasks | Large language models Foundation models |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Spies, N.C.; Rangel, A.; English, P.; Morrison, M.; O’Fallon, B.; Ng, D.P. Machine Learning Methods in Clinical Flow Cytometry. Cancers 2025, 17, 483. https://doi.org/10.3390/cancers17030483
Spies NC, Rangel A, English P, Morrison M, O’Fallon B, Ng DP. Machine Learning Methods in Clinical Flow Cytometry. Cancers. 2025; 17(3):483. https://doi.org/10.3390/cancers17030483
Chicago/Turabian StyleSpies, Nicholas C., Alexandra Rangel, Paul English, Muir Morrison, Brendan O’Fallon, and David P. Ng. 2025. "Machine Learning Methods in Clinical Flow Cytometry" Cancers 17, no. 3: 483. https://doi.org/10.3390/cancers17030483
APA StyleSpies, N. C., Rangel, A., English, P., Morrison, M., O’Fallon, B., & Ng, D. P. (2025). Machine Learning Methods in Clinical Flow Cytometry. Cancers, 17(3), 483. https://doi.org/10.3390/cancers17030483