WO2024243056A1 - Systems and methods for classifying cells using distance metrics - Google Patents
Systems and methods for classifying cells using distance metrics Download PDFInfo
- Publication number
- WO2024243056A1 WO2024243056A1 PCT/US2024/030003 US2024030003W WO2024243056A1 WO 2024243056 A1 WO2024243056 A1 WO 2024243056A1 US 2024030003 W US2024030003 W US 2024030003W WO 2024243056 A1 WO2024243056 A1 WO 2024243056A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cell
- parameter
- cells
- class
- classes
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 169
- 238000010801 machine learning Methods 0.000 claims abstract description 83
- 210000004027 cell Anatomy 0.000 claims description 770
- 238000000684 flow cytometry Methods 0.000 claims description 37
- 210000003714 granulocyte Anatomy 0.000 claims description 34
- 210000001616 monocyte Anatomy 0.000 claims description 34
- 210000004698 lymphocyte Anatomy 0.000 claims description 30
- 238000004458 analytical method Methods 0.000 claims description 26
- 238000012549 training Methods 0.000 claims description 25
- 238000007637 random forest analysis Methods 0.000 claims description 21
- 238000003066 decision tree Methods 0.000 claims description 17
- 210000001744 T-lymphocyte Anatomy 0.000 claims description 16
- 239000000090 biomarker Substances 0.000 claims description 15
- 101000738771 Homo sapiens Receptor-type tyrosine-protein phosphatase C Proteins 0.000 claims description 12
- 102100037422 Receptor-type tyrosine-protein phosphatase C Human genes 0.000 claims description 12
- 210000003719 b-lymphocyte Anatomy 0.000 claims description 12
- 102100024222 B-lymphocyte antigen CD19 Human genes 0.000 claims description 9
- 101000980825 Homo sapiens B-lymphocyte antigen CD19 Proteins 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 7
- 102100036011 T-cell surface glycoprotein CD4 Human genes 0.000 claims description 6
- 238000003745 diagnosis Methods 0.000 claims description 6
- 238000012083 mass cytometry Methods 0.000 claims description 6
- 108060003951 Immunoglobulin Proteins 0.000 claims description 5
- 238000003384 imaging method Methods 0.000 claims description 5
- 102000018358 immunoglobulin Human genes 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 5
- 238000012174 single-cell RNA sequencing Methods 0.000 claims description 5
- 239000000427 antigen Substances 0.000 claims description 4
- 102100022749 Aminopeptidase N Human genes 0.000 claims description 3
- 102100038080 B-cell receptor CD22 Human genes 0.000 claims description 3
- 102100022005 B-lymphocyte antigen CD20 Human genes 0.000 claims description 3
- 108010041397 CD4 Antigens Proteins 0.000 claims description 3
- 102000006354 HLA-DR Antigens Human genes 0.000 claims description 3
- 108010058597 HLA-DR Antigens Proteins 0.000 claims description 3
- 102100031573 Hematopoietic progenitor cell antigen CD34 Human genes 0.000 claims description 3
- 102100026122 High affinity immunoglobulin gamma Fc receptor I Human genes 0.000 claims description 3
- 101000757160 Homo sapiens Aminopeptidase N Proteins 0.000 claims description 3
- 101000884305 Homo sapiens B-cell receptor CD22 Proteins 0.000 claims description 3
- 101000897405 Homo sapiens B-lymphocyte antigen CD20 Proteins 0.000 claims description 3
- 101000777663 Homo sapiens Hematopoietic progenitor cell antigen CD34 Proteins 0.000 claims description 3
- 101000913074 Homo sapiens High affinity immunoglobulin gamma Fc receptor I Proteins 0.000 claims description 3
- 101000878605 Homo sapiens Low affinity immunoglobulin epsilon Fc receptor Proteins 0.000 claims description 3
- 101000917858 Homo sapiens Low affinity immunoglobulin gamma Fc region receptor III-A Proteins 0.000 claims description 3
- 101000917839 Homo sapiens Low affinity immunoglobulin gamma Fc region receptor III-B Proteins 0.000 claims description 3
- 101000946889 Homo sapiens Monocyte differentiation antigen CD14 Proteins 0.000 claims description 3
- 101000934338 Homo sapiens Myeloid cell surface antigen CD33 Proteins 0.000 claims description 3
- 101000914496 Homo sapiens T-cell antigen CD7 Proteins 0.000 claims description 3
- 101000934346 Homo sapiens T-cell surface antigen CD2 Proteins 0.000 claims description 3
- 101000934341 Homo sapiens T-cell surface glycoprotein CD5 Proteins 0.000 claims description 3
- 102100038007 Low affinity immunoglobulin epsilon Fc receptor Human genes 0.000 claims description 3
- 102100029185 Low affinity immunoglobulin gamma Fc region receptor III-B Human genes 0.000 claims description 3
- 102100035877 Monocyte differentiation antigen CD14 Human genes 0.000 claims description 3
- 102100025243 Myeloid cell surface antigen CD33 Human genes 0.000 claims description 3
- 102000003729 Neprilysin Human genes 0.000 claims description 3
- 108090000028 Neprilysin Proteins 0.000 claims description 3
- 102100027208 T-cell antigen CD7 Human genes 0.000 claims description 3
- 102100025237 T-cell surface antigen CD2 Human genes 0.000 claims description 3
- 102100025244 T-cell surface glycoprotein CD5 Human genes 0.000 claims description 3
- 102000036639 antigens Human genes 0.000 claims description 3
- 108091007433 antigens Proteins 0.000 claims description 3
- 239000002771 cell marker Substances 0.000 claims description 3
- 230000004069 differentiation Effects 0.000 claims description 3
- 239000000523 sample Substances 0.000 description 13
- 230000015654 memory Effects 0.000 description 9
- 230000014509 gene expression Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 230000000890 antigenic effect Effects 0.000 description 5
- 239000003550 marker Substances 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 210000001185 bone marrow Anatomy 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 210000001151 cytotoxic T lymphocyte Anatomy 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 210000000822 natural killer cell Anatomy 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 210000005259 peripheral blood Anatomy 0.000 description 2
- 239000011886 peripheral blood Substances 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 102100021260 Galactosylgalactosylxylosylprotein 3-beta-glucuronosyltransferase 1 Human genes 0.000 description 1
- 101000894906 Homo sapiens Galactosylgalactosylxylosylprotein 3-beta-glucuronosyltransferase 1 Proteins 0.000 description 1
- 101000998120 Homo sapiens Interleukin-3 receptor subunit alpha Proteins 0.000 description 1
- 101000835093 Homo sapiens Transferrin receptor protein 1 Proteins 0.000 description 1
- 102100033493 Interleukin-3 receptor subunit alpha Human genes 0.000 description 1
- 102100026144 Transferrin receptor protein 1 Human genes 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000010224 classification analysis Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000003834 intracellular effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/40—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Definitions
- a method for classifying one or more cells comprises receiving data associated with the one or cells, the data including, for 1 4878-2377-2093.1 each respective cell, information associated with one or more measurable parameters of the respective cell; inputting at least a portion of the data into a machine learning model; and receiving, from the machine learning model, an indication of an identity of a cell class of a plurality of cell classes to which each respective cell belongs.
- the data includes, for each respective cell, the difference between (i) the value of at least one parameter for the respective cell, and (ii) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes.
- the data includes, for each respective cell, a distance on each of one or more multi-parameter plots between (i) the respective cell and (ii) one or more reference points. Each multi-parameter plot is associated with at least a first parameter and a second parameter. Each reference point corresponds to the combination of the average values of the first parameter and the second parameter for a distinct cell class.
- a method for classifying cells comprises: receiving data associated with a cell sample of an individual, the cell sample including a plurality of cells, the data including, for each respective cell, information associated with at least a first measurable parameter and a second measurable parameter; for each respective cell, determining, based on the received data, a value of the first measurable parameter and a value of the second measurable parameter; determining, for each respective cell, a distinct cell class a plurality of cell classes based on (i) a difference between the value of the first measurable parameter for the respective cell and an average value of the first measurable parameter for each of the plurality of cell classes and (ii) a difference between the value of the second measurable parameter for the respective cell and an average value of the second measurable parameter for each of the plurality of cell classes; automatically generating a report based on the determined cell class for each respective cell of the plurality of cells; and transmitting the report to the individual, a healthcare provider of the individual, or both.
- a method for classifying cells comprises: receiving data associated with a cell sample of an individual, the cell sample including a plurality of cells, the data including, for each respective cell, information associated with a plurality of measurable parameters; for each respective cell, determining, based on the received data, a value of a first measurable parameter and a value of a second measurable 2 4878-2377-2093.1 parameter; determining, for each respective cell, a distinct cell class a plurality of cell classes to which the respective cell belongs, the determining being based on (i) a difference between the value of the first measurable parameter for the respective cell and an average value of the first measurable parameter for each of the plurality of cell classes and (ii) a difference between the value of the second measurable parameter for the respective cell and an average value of the second measurable parameter for each of the plurality of cell classes; in response to determining that at least a portion of the plurality of cells belong to a first cell class of the plurality of cells;
- FIG.1 shows a system for implementing a method for classifying cells, according to aspects of the present disclosure.
- FIG.2 shows a flowchart of a method for classifying cells, according to aspects of the present disclosure.
- FIG. 3 shows markers used for spectral flow cytometry, according to aspects of the present disclosure.
- FIG. 4 shows a representation of different cell classes that cells may be sorted into, according to aspects of the present disclosure.
- FIG. 5 shows example distance metric calculation for a machine learning model, according to aspects of the present disclosure.
- FIG. 6 shows example distance metric calculations for individual cells for a specific cell class, according to aspects of the present disclosure.
- FIG.7A shows example distance metric calculations for a primary classification level, according to aspects of the present disclosure.
- FIG. 7B shows example distance metric calculations for a primary refinement classification level, according to aspects of the present disclosure.
- FIG.1 is a block diagram of an example system 100 for implementing any of the herein- discussed features, methods, processes, etc.
- system 100 can be used to implement one or more machine learning models that classify cells as discussed herein.
- the system 100 can include one or more processing devices 100, which can each include any one or more of a processor 112, a memory 114, a display 116, a user input device 118, and/or other components.
- the memory 114 can include machine-readable instructions for executing one or more machine learning models.
- the processor 112 can execute these instructions to implement the one or more machine learning models.
- the memory 114 can also store data associated with the cells that are being analyzed (e.g., flow cytometry data).
- the processing device 110 can include any suitable processing device, such as general purpose computer systems, microprocessors, digital signal processors, micro-controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs) field programmable logic devices (FPLDs), programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), mobile devices such as mobile telephones, personal digital assistants (PDAs), or tablet computers, local servers, remote servers, wearable computers, or the like.
- ASICs application specific integrated circuits
- PLDs programmable logic devices
- FPLDs field programmable logic devices
- PGAs programmable gate arrays
- FPGAs field programmable gate arrays
- mobile devices such as mobile telephones, personal digital assistants (PDAs), or tablet computers, local servers, remote servers, wearable computers, or the like.
- the memory 4 4878-2377-2093.1 device 114 can include any suitable memory device and/or machine-readable medium that is capable of storing, encoding, and/or carrying a set of instructions for execution by a processing device and that cause the processing device to perform and/or implement any of the features discussed herein, including solid-state memories, optical media, magnetic media, random access memory (RAM), read only memory (ROM), a floppy disk, a hard disk, a CD ROM, a DVD ROM, flash memory, or other computer readable medium that is read from and/or written to by a magnetic, optical, or other reading and/or writing system that is coupled to the processing device, can be used for the memory or memories.
- the display 116 can be used to display any information associated with the features disclosed herein, including the results of the classification analysis by the machine learning model.
- the display device 116 can be any known display technology, including but not limited to display devices using Liquid Crystal Display (LCD) or Light Emitting Diode (LED) technology.
- the user input device 118 can be used to allow the user to interact with the system 100 for any suitable purpose, including initiating, pausing, or terminating the analysis by the machine learning model; adjusting any parameters of the analysis, etc.
- the system 100 includes a flow cytometry system 120 that generates the data.
- the flow cytometry system 120 can generally be any suitable type of flow cytometry system.
- FIG.2 shows a flowchart of a method 200 for classifying cells, according to aspects of the present disclosure.
- Step 210 of method 200 includes receiving data associated with one or more cells.
- the data includes flow cytometry data and is associated with one or more measurable parameters of the cells.
- the parameters can include parameters associated with scattering of light caused by the cells.
- the parameters can also include parameters associated with the presence of one or more biomarkers in the cells. These parameters can be associated with the presence and/or amount of a predetermined molecule in the cells, and can include an intensity of fluorescent emission from the cells, a color of the fluorescent emission from the cells, etc.
- the predetermined molecule can generally include any suitable molecule that may be used as a biomarkers, such as a CD2 molecule, a CD3 molecule, a CD4 molecule, CD5 molecule, a CD7 molecule, a CD8 molecule, a CD10 5 4878-2377-2093.1 molecule, a CD11 molecule, a CD13 molecule, a CD14 molecule, a CD16 molecule, a CD19 molecule, a CD20 molecule, a CD22 molecule, a CD23 molecule, a CD33 molecule, a CD34 molecule, a CD45 molecule, a CD64 molecule, a CD117 molecule, an HLA-DR molecule, a cluster of differentiation molecule, an antigen, an antibody, an immunoglobulin chain (such as a kappa ( ⁇ ) light chain, a lambda ( ⁇ ) light chain, a gamma ( ⁇ ) heavy chain, a delta ( ⁇ ) heavy chain, an alpha ( ⁇ )
- the biomarker can be any type of cellular marker that can be used identify the cells, including proteins and markers associated with gene expression/transcription.
- Other types of parameters can also be used, such as parameters associated with surface markers of the cells, parameters associated with intracellular markers of the cells, parameters associated with the size of the cells, parameters associated with gene expression of the cells, etc.
- Step 220 of method 200 includes inputting at least a portion of the received data into one or more trained machine learning models.
- the one or more machine learning models can include one or more random forest models.
- Each of the one or more random forest models can include a plurality of decision trees and a voting module, where each decision tree makes an independent determination and the voting module selects one of those independent determinations as the output of the random forest model.
- Step 230 of the method 200 includes receiving from the one or more machine learning models an indication of the cell class to which each of the one or more cells belong. In general, each of the cells could be classified into one or more cell classes of a plurality of cell classes.
- the plurality of cell classes includes lymphocytes, granulocytes, monocytes, B-cells (which may be a distinct class, or can be a sub-class of lymphocytes), T-cells (which may be a distinct class, or can be a sub-class of lymphocytes), other classes (and/or subclasses), or any combination thereof.
- the one or more machine learning models can place each cell into a cell class by determining the distance of each cell on a multi-parameter plot to a reference point on the plot, where each reference point corresponds to the average value of one or more parameters for cells within a respective cell class for that plot.
- some plots may be associated with two or more parameters of the measurable parameters of the cell.
- a first parameter of the plot is associated with scattering of light caused by the cells
- a second parameter of the plot is associated with the presence of a biomarker.
- both parameters are associated with scattering of light caused by the cells.
- both parameters are associated with the presence of a biomarker in the cells.
- each of the plots will have one or more reference points that each correspond to a predetermined cell class.
- the reference point for a cell class (which is previously determined using training data) is at a location on the plot that corresponds to the average value of one or more parameters for all cells within the cell class from the training data.
- the reference point is generally within the distribution of values of the one or more parameters for the cells within the cell class from the training data.
- Most plots will include at least two reference points that correspond to two different cell classes. [0030] As used herein, references to the location of a respective cell on a multi-parameter plot generally refer to the location of the intersecting parameter values of the respective cell.
- the location of the n th cell on a multi-parameter plot of a first parameter P1 and a second parameter P2 can be expressed as ⁇ ⁇ 1 ⁇ , ⁇ 2 ⁇ ⁇ , where ⁇ 1 ⁇ is the value of the first parameter for the n th cell, and ⁇ 2 is the value of the second parameter for the n th cell.
- the location of the reference points uses similar nomenclature and can be referred to using mean parameter values.
- the reference point for the m th cell class on the same multiparameter plot can be expressed as ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ 1 ⁇ ⁇ ⁇ , ⁇ ⁇ ⁇ ⁇ 2 ⁇ ⁇ ⁇ ⁇ , where ⁇ ⁇ ⁇ ⁇ ⁇ 1 ⁇ ⁇ ⁇ ⁇ is the average value of the first parameter across all cells in the m th cell class, and ⁇ ⁇ ⁇ ⁇ ⁇ 2 ⁇ ⁇ ⁇ is the average value of the second parameter across all cells in the m th cell class.
- the location of the n th cell on the plot can be expressed as ⁇ ⁇ 1 ⁇ ⁇ , while the location of the reference point for the m th cell class on the same plot can be expressed as ⁇ ⁇ ⁇ ⁇ ⁇ 1 ⁇ ⁇ ⁇ ⁇ ⁇ .
- the location of the n th cell on the plot can be expressed as ⁇ ⁇ 1 ⁇ , ⁇ 2 ⁇ , ... , ⁇ ⁇ ⁇ ⁇ , while the location of the reference point for the m th cell class on the same plot can be expressed as ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ 1 ⁇ ⁇ ⁇ , ⁇ ⁇ ⁇ ⁇ ⁇ 2 ⁇ ⁇ ⁇ , ... , ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ .
- step 230 of the method 200 can include sub-steps 232, 234, and 236.
- Sub-step 232 can include, for each cell, determining the value of at least a first parameter and a second parameter. Depending on how many different plots and/or parameters will be used to classify the cell, additional parameters values can be determined.
- Sub-step 234 includes determining, for each respective cell, the distance between the cell and the one or more reference points on a given multi-parameter plot. Generally, sub-step 234 will be performed for each multi- 7 4878-2377-2093.1 parameter plot being used to classify the cells.
- Sub-step 236 includes, for each respective multi- parameter plot being used to classify the cells, placing each cell into a cell class based on the distance between the cell and the one or more reference points of the respective multi-parameter plot.
- the cell is placed into the cell class to which it is the closest to, e.g., the cell will belong to the cell class where the distance between the cell and the reference point for that cell class is the shortest.
- This distance as measured on the multi-parameter plot, can be considered to be a Euclidian distance or a Mahalanobis distance.
- the plots include a plot having a first reference point corresponding to a first cell class and a second reference point corresponding to a second cell class.
- the first cell class include lymphocytes
- the second cell class include granulocytes and monocytes.
- the first parameter is associated with the presence of a CD3 molecule in the cells
- the second parameter is associated with the presence of a CD19 molecule in the cells.
- the plots include a plot having a first reference point corresponding to a first cell class, a second reference point corresponding to a second cell class, and a third reference point corresponding to a third cell class.
- the first cell class include lymphocytes
- the second cell class include granulocytes
- the third cell class includes monocytes.
- the first cell class include granulocytes and monocytes
- the second cell class includes T-cells
- the third cell class includes B-cells.
- the first parameter is associated with the presence of a CD45 molecule in the cells
- the second parameter is associated with an amount of side scatter caused by the cells.
- the first parameter is associated with the presence of a CD3 molecule in the cells
- the second parameter is associated with the presence of a CD19 molecule in the cells.
- the placement of the cells into cell classes can be done in stages.
- the one or more machine learning models can first place each respective cell into a distinct cell class of a plurality of cell classes, using the techniques discussed herein. These classes may include a cell class containing lymphocytes, a cell class containing granulocytes and monocytes, etc.
- the one or more machine learning models can then place each cell belonging to that certain cell class into one of a plurality of cell sub-classes of the certain cell class. 8 4878-2377-2093.1 [0035]
- the one or more machine learning models in the first stage can place each cell into a lymphocyte cell class, a monocyte cell class, or a granulocyte cell class.
- the one or more machine learning models in the second stage can place each cell in the lymphocyte class into an NK-cell sub-class, a T-cell sub-class, or a B-cell sub-class.
- stages can also be performed, for example by placing each cell in the T-cell sub-class into a CD4+ T-cell group (or sub-sub-class) or a CD8+ T-cell group (or sub-sub-class).
- This staged approach thus provides more granularity in the analysis of the cell sample, and can in some cases be performed automatically in response earlier stages being completed.
- the classes e.g., lymphocyte class, ), sub-classes (e.g., NK-cell cub-class), sub-sub-classes (e.g., CD4+ T-cell group) referred to herein are only provided as examples, and that generally any suitable classification of cells based on any one or more parameters can be used, including any suitable phenotypic classification and/or any suitable functional classification.
- the machine learning models can be trained to perform varying levels of the analysis. For example, in some implementations, the data that is input into the trained machine learning models includes the distances between each cell and the reference points of any plot being used.
- the machine learning models are trained to classify each cell into one or more cell classes based on these distances.
- the data that is input into the trained machine learning models includes the required parameter values for each cell.
- the machine learning models are trained to analyze the data to determine the distances between each cell and the reference points of any plot being used, and then classify each cell into one or more cell classes based on these determined distances.
- the raw data e.g., raw flow cytometry data
- the machine learning models are trained to analyze the data to determine the required parameter values for each cell, determine the distances between each cell and the reference points of any plot being used, and then classify each cell into one or more cell classes based on these determined distances.
- the one or more machine learning models can include one or more random forest models that correspond to one multi-parameter plot/set of parameters.
- a first random forest model can be trained to place cells into cell classes based on a first parameter and a second parameter
- a second random forest model can be trained to place cells into cell classes based on the first parameter and a third parameter
- a third random forest model can be trained to place cells into cell classes on the second parameter and the third parameter, etc. 9 4878-2377-2093.1
- each random forest model includes a plurality of decision trees and a voting module. Each decision tree of a given random forest model is configured to generate an independent determination of which cell class each cell belongs to.
- each decision tree is configured to make an independent determination, for each respective cell, of which cell class the respective cell belongs to.
- the voting module selects one of the independent determinations as the final determination of the random forest model for each cell.
- the voting module is configured to select as the final determination the cell class which was the most frequent winner between the individual decision trees.
- the voting module can select the first cell class as the final determination of that random forest model.
- the voting module may select the final determination of the random forest model using other techniques, such as an average of the decision trees, a weighted average of the decision trees, etc.
- a training data set can be generated to train the one or more machine learning models (e.g., to train the random forest models).
- the training data set includes, for each respective cell in the training data set, (i) a distance (on one or more multi-parameter plots) between the respective cell and one or more reference points, and (ii) a determination of which cell class the respective cell belongs to.
- the parameter values each cell can be normalized with respect to the average parameter values for the different cell classes, such that the distances can be expressed as (i) a value that is greater than or equal to 0 and less than or equal to 1, (ii) a value expressed as a percentage), etc.
- the training data set could also include raw data (e.g., raw flow cytometry data and/or other data) instead of (or in addition to) the distances between the cells and the reference points.
- the data used by the one or more machine learning models can include any suitable type of data associated with different measurable parameters of the cells.
- the data can include flow cytometry data, mass cytometry data, single cell RNA-sequencing data, multiplex imaging data, data associated with generally any single cell analysis method that results in the output of one or more parameters for analysis, other types of data, etc.
- the machine learning 10 4878-2377-2093.1 models can be trained to receive and analyze multiple types of data, including multiple types of the data discussed herein.
- method 200 can include additional steps after all of the cells have been classified and the indication of the cell class(es) for each cell has been received.
- method 200 can include analyzing the indications and generating a graphical representation of the cell classes of the cells, generating a text description of the cells, generating a recommendation for clinical tests for the individual to whom the cells belong to undergo, generating a diagnosis for the individual, etc.
- a report can be automatically generated and transmitted to the individual, a healthcare provider of the individual, or both.
- the report can include, for example, a graphical representation of the determined cell class for each respective cell of the plurality of cells, a text description of the determined cell class for each respective cell of the plurality of cells, a recommendation for one or more clinical tests for the individual to undergo, a diagnosis for the individual, or any combination thereof.
- analysis can be automatically performed, and the report can be automatically generated and transmitted.
- method 200 can be used to sort each cell in a cell sample into one or more distinct cell classes of a plurality of cell classes.
- Method 200 thus provides for more detailed analysis of cell samples and recommendations regarding the same, allowing for better diagnostic mechanisms and improved treatment.
- Method 200 thus provides for more detailed analysis of cell samples and recommendations regarding the same, allowing for better diagnostic mechanisms and improved treatment.
- Method 200 thus provides for more detailed analysis of cell samples and recommendations regarding the same, allowing for better diagnostic mechanisms and improved treatment.
- Disclosed herein is an example of the features discussed herein.
- Introduction and Specific Aims Proposed herein is a clinical flow cytometry decision support system in which is the machine learning component is trained from a high dimensional flow cytometry (spectral) data set which encompasses the majority of necessary backbone antigenic markers for applicability by the majority of flow cytometry labs for hematolymphoid disorders.
- Aim 1 will encompass the collection of high-parameter flow cytometry data from a variety of hematolymphoid disorders using the Sony ID7000.
- Aim 2 seeks to develop the machine learning models for cell population identification using the high dimensional flow cytometry dataset from Aim 1. The backbone of this process will be based on machine learning models for our lab-specific model.
- Aim 3 will involve the validation of the machine learning models on clinical flow cytometry data sets which were acquired using different instrumentation and different panels to test for broad lab applicability.
- Aim 1 Utilizes a high-parameter flow cytometry panel that simultaneously examines 29 antigenic markers using the Sony ID70007-laser spectral flow cytometer.
- This panel (FIG.3) integrates a 4-tube screening panel and includes additional markers (CD123, CD71, and CD57) for additional cell population characterization.
- This panel has been tested on the Sony ID7000 with success.
- an additional 100 samples will be acquired from a variety of hematolymphoid disorders as well as normal patients from bone peripheral blood and bone marrow to gain a wide representation of cell populations. For all peripheral blood samples, at least 1 million cellular events will be collected, and for bone marrows, at least 5 million cellular events.
- Aim 2 Using the data from Aim 1, FlowSOM will be utilized for high-dimensional clustering for cell population identification using backbone markers (FIG. 3). Because of the complexity of the cell populations present as well as number of events (>100 million), a 20x20 SOM with a detection of 80 metaclusters will be used to allow for low frequency population identification. The 80 metaclusters will be thoroughly characterized and annotated based on backbone marker expression. [0051] As the machine learning model is intended to be universally applicable, individual antigenic marker expression at a fluorescence level cannot be utilized for cell population classification. This is due to the fact that each lab will have different cytometers, fluorophore- target combinations, and cytometer settings.
- distance metrics for each core antigenic marker are calculated relative to lymphocytes, granulocytes and monocytes (FIG. 5). These distance metrics can be normalized on a scale from 0 to 1, and are then used to train a random forest machine learning model for cell population classification at the class level. Additional rounds of distance metrics can also be calculated for non-core antigenic markers with additional machine learning models created.
- New Data Predictions [0055] The initial machine learning model from above using CD45, SSC and FSC is used to broadly classify cells as lymphocytes, granulocytes, or monocytes (FIG.6). The accuracy of this step is not very important, as it is only used for an initial classification and will be further refined.
- Distance metrics for core marker expression can then be calculated for each individual cell relative to the mean marker expression for lymphocytes, granulocytes, and monocytes for primary classification (FIG.7A). These distance metrics can be used to predict primary cell classification using the machine learning model from above. Distance metrics can then be calculated using primary classifications for a refinement of the primary classifications (FIG. 7B). Additional distance metrics and machine learning predictions can be used for more granular population classifications such as secondary cell classifications (FIG.4). [0056] The accuracy of the predictions will first be calculated using a 70:30 test to validation split for the training data from the Sony ID7000. Accuracy will be determined at each level of classification (FIG.4). Confusion matrices will be examined for determination of misclassification patterns (ex.
- Aim 3 In-house collected clinical flow cytometry data, which was acquired using a 4- tube screening panel, will first be examined on a different instrument. The fluorophore-antigen markers are non-overlapping with the spectral panel and will allow for a sufficient proof-of- principle. Classification performance to the “Secondary” Level (FIG. 4) will then be examined. Upon successful results with the in-house data, the models will be tested on collaborator data. Performance will be assessed using visual examination of classifications of cells plotted on 2- dimensional flow cytometry dot plots, as well as comparing population frequencies between 13 4878-2377-2093.1 manual analysis and automated analysis.
- the cell classification can be accomplished by determining the value of one or more parameters for each cell being classified, and comparing that value to the average value of each of the one or more parameters for each of the possible cell classes that the cell could be classified into, without any actual construction of a multi-parameter plot.
- the possible different cell classes and parameters used to classify cells into those cell classes can be similar to or the same as the cell classes and parameters discussed herein with respect to the multi-parameter plots.
- the sub-steps of step 230 of method 200 can be generalized.
- Sub-step 232 can include determining the value of at least one parameter for each respective cell.
- Sub-step 234 can include determining, for each respective cell, the difference between (i) the value of the at least one parameter for the respective cell and (ii) the average value of the at least one parameter for each of the one or more cell classes that the respective cell may be placed into.
- Sub-step 236 can include placing the cell into a specific one or more of the cell classes based on the difference(s).
- each of the different “plots” discussed herein can correspond to a different machine learning model that is trained to classify cells into specific cell classes based on specific parameters. The machine learning models can be trained to perform varying levels of the analysis.
- the data that is input into the trained machine learning models includes the differences between the parameter values for the individual cells and the average parameter values for the potential cell classes that the cells may be placed into.
- the machine learning models are trained to classify each cell into one or more cell classes based on these differences.
- the data that is input into the trained machine learning models includes the required parameter values for each cell.
- the machine learning models are trained to analyze the data to determine the differences between the parameter values of each cell and the average parameter values for the classes of cells that each cell could be classified into.
- the raw data e.g., raw flow cytometry data
- the machine learning models are trained to analyze the data to determine the required parameter values for each cell, determine the differences between the parameter values of each cell and the average parameter values of the cell classes that each cell could be classified into, and then place each cell into one or more cell classes based on these determined differences.
- a training data set can be generated to train the one or more machine learning models (e.g., to train the random forest models).
- the training data set includes, for each respective cell in the training data set, (i) the difference between (a) the value of at least one parameter for the respective cell and (b) the average value of the at least one parameter for each of the cell classes that the model is being trained for, and (ii) a determination of which cell class of the cell classes for the model that the respective cell belongs to.
- the parameter values each cell can be normalized with respect to the average parameter values for the different cell classes, such that the differences can be expressed as (i) a value that is greater than or equal to 0 and less than or equal to 1, (ii) a value expressed as a percentage), etc.
- the training data set could also include raw data (such as raw flow cytometry data and/or other data) instead of (or in addition to) the differences between the cell parameter values and the average parameter values for the cell classes.
- each of the machine learning models (which can be random forest models) is trained to classify cells into a cell class of a specific class of cells.
- the possible cell classes that a given model looks at become narrower (e.g., a first model that classifies cells as lymphocytes vs. granulocytes vs. monocytes is broader than a second model that classifies cells as granulocytes/monocytes vs. T-cells vs. B-cells)
- the model may utilize more parameters in classifying the cells.
- any of the methods disclosed herein can be implemented using a system having a control system with one or more processors, and a memory device storing machine- readable instructions.
- the control system can be coupled to the memory device, and methods can be implemented when the machine-readable instructions are executed by at least one of the processors of the control system.
- the methods can also be implemented using a computer program 15 4878-2377-2093.1 product (such as a non-transitory computer readable medium) comprising instructions that when executed by a computer, cause the computer to carry out the steps of the methods.
- a method for classifying cells comprising: receiving data associated with one or more cells of an individual, the data including, for each respective cell, information associated with one or more measurable parameters of the respective cell; inputting at least a portion of the data into one or more machine learning models; and receiving, from the one or more machine learning models, an indication of a cell class of a plurality of cell classes to which each respective cell belongs.
- Alternative Implementation 2 The method of Alternative Implementation 1, where the data associated with the one or more cells includes flow cytometry data, mass cytometry data, single cell RNA-sequencing data, multiplex imaging data, data associated with a single cell analysis technique, or any combination thereof.
- Alternative Implementation 13 The method of any one of Alternative Implementations 9 to 12, wherein the one or more parameters associated with the biomarker of the respective cell includes a presence of a predetermined molecule in the respective cell, an amount of the predetermined molecule in the respective cell, or both.
- Alternative Implementation 14 The method of Alternative Implementation 13, wherein the one or more parameters associated with the biomarker of the respective cell includes an 17 4878-2377-2093.1 intensity of fluorescent emission from the respective cell, a color of fluorescent emission from the respective cell, or both.
- the predetermined molecule includes a CD2 molecule, a CD3 molecule, a CD4 molecule, CD5 molecule, a CD7 molecule, a CD8 molecule, a CD10 molecule, a CD11 molecule, a CD13 molecule, a CD14 molecule, a CD16 molecule, a CD19 molecule, a CD20 molecule, a CD22 molecule, a CD23 molecule, a CD33 molecule, a CD34 molecule, a CD45 molecule, a CD64 molecule, a CD117 molecule, an HLA-DR molecule, a cellular marker usable for identification of the respective cell or any combination thereof.
- Alternative Implementation 16 includes a CD2 molecule, a CD3 molecule, a CD4 molecule, CD5 molecule, a CD7 molecule, a CD8 molecule, a CD10 molecule, a CD11 molecule, a CD13 molecule, a CD14 molecule, a CD16 molecule, a
- Alternative Implementation 17 The method of Alternative Implementation 16, wherein the immunoglobulin chain includes a kappa ( ⁇ ) light chain, a lambda ( ⁇ ) light chain, a gamma ( ⁇ ) heavy chain, a delta ( ⁇ ) heavy chain, an alpha ( ⁇ ) heavy chain, a mu ( ⁇ ) heavy chain, an epsilon ( ⁇ ) heavy chain, or any combination thereof.
- any one of Alternative Implementations 3 to 17, wherein the one or more distinct cell classes includes a first cell class and a second cell class.
- Alternative Implementation 19 The method of any one of Alternative Implementations 5 to 17, wherein the at least one plot includes a plot with a first reference point corresponding to a first cell class and a second reference point corresponding to a second cell class.
- Alternative Implementation 20 The method of Alternative Implementation 18 or Alternative Implementation 19, wherein the first cell class includes lymphocytes, and the second cell class includes granulocytes and monocytes.
- Alternative Implementation 21 Alternative Implementation 21.
- Alternative Implementation 22 or Alternative Implementation 23 wherein the first cell class includes lymphocytes, the second cell class includes granulocytes, and the third cell class includes monocytes.
- Alternative Implementation 25 The method of any one of Alternative Implementations 22 to 24, wherein the first parameter of the plot is associated with a presence of a CD45 molecule in the one or more cells, and the second parameter is associated with an amount of side scatter caused by the one or more cells.
- Alternative Implementation 26 The method of Alternative Implementation 22 or Alternative Implementation 23, wherein the first cell class includes granulocytes and monocytes, the second cell class includes T-cells, and the third cell class includes B-cells.
- Alternative Implementation 27 Alternative Implementation 27.
- Alternative Implementation 28 wherein the one or more distinct cell classes includes a first cell class that includes lymphocytes, a second cell class that includes granulocytes, and a third cell class that includes monocytes.
- Alternative Implementation 30 The method of Alternative Implementation 28, wherein the plot includes a first reference point corresponding to lymphocytes, a second reference point corresponding to granulocytes, and a third reference point corresponding to monocytes.
- Alternative Implementation 31 Alternative Implementation 31.
- Alternative Implementation 31 wherein the one or more distinct cell classes includes a first cell class that includes granulocytes and monocytes, a second cell class that includes T-cells, and a third cell class that includes B-cells.
- Alternative Implementation 34 The method of Alternative Implementation 31, wherein the plot includes a first reference point corresponding to granulocytes and monocytes and a second reference point corresponding to lymphocytes.
- Alternative Implementation 35 The method of Alternative Implementation 31, wherein the plot includes a first reference point corresponding to granulocytes and monocytes, a second reference point corresponding to T-cells, and a third reference point corresponding to B-cells.
- Alternative Implementation 36 Alternative Implementation 36.
- Alternative Implementation 37 The method of Alternative Implementation 36, wherein the one or more machine learning models is trained to sort the one or more cells into the at least one of the plurality of cell classes based at least in part on, for each respective cell, the difference between (i) a value of at least one parameter of the one or more measurable parameters for the respective cell and (ii) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes.
- Alternative Implementation 38 Alternative Implementation
- Alternative Implementation 39 The method of any one of Alternative Implementations 1 to 38, wherein the one or more machine learning models is trained to: determine whether each respective cell belongs to a first cell class of the plurality of cell classes or a second cell class of the plurality of cell classes; and determine whether each respective cell belonging to the first cell class of the plurality of cell classes belongs to a first sub-class or a second sub-class. 20 4878-2377-2093.1 [0105] Alternative Implementation 40.
- any one of Alternative Implementations 1 to 41 wherein the one or more machine learning models is trained to: determine, for each respective cell, a value of at least a first parameter of the one or more measurable parameters and a second parameter of the one or more measurable parameters; determine, for each respective cell, a difference between (i) a value of at least one parameter of the one or more measurable parameters for the respective cell, and (ii) ab average value of the at least one parameter for each of one or more distinct cell classes; and based on the determined difference for each respective cell, place the respective cell into one of the plurality of cell classes.
- Alternative Implementation 43 Alternative Implementation 43.
- any one of Alternative Implementations 1 to 41 wherein the one or more machine learning models is trained to: determine, for each respective cell, a value of at least a first parameter of the one or more measurable parameters and a second parameter of the one or more measurable parameters; determine, for each respective cell, a distance on a plot between the respective cell and one or more reference points, each of the one or more reference points corresponding to a combination of an average value of the first parameter and an average value of the second parameter for a distinct cell class of a plurality of cell classes; and based on the distance between the respective cell and the one or more reference points, place the respective cell into one of the plurality of cell classes.
- Alternative Implementation 44 Alternative Implementation 44.
- the one or more machine learning models includes a random forest model configured to determine the cell class to which each respective cell belongs.
- Alternative Implementation 45 The method of Alternative Implementation 44, wherein the random forest model has a plurality of decision trees and a voting module, each of the plurality of decision trees being configured to generate an independent determination of the cell class to which reach respective cell belongs, the voting module being configured to select the independent 21 4878-2377-2093.1 determination of one of the plurality of decision trees as a final determination of the cell class to which the respective cell belongs.
- Alternative Implementation 46 Alternative Implementation 46.
- any one of Alternative Implementations 1 to 36 wherein the one or more machine learning model is trained using a training data set, the training data set including (i) data associated with a plurality a cells other than the one or more cells, and (ii) for each respective cell of the plurality of cells, a determination of which cell class of the plurality of cell classes the respective cell belongs to.
- Alternative Implementation 47 The method of Alternative Implementation 46, wherein the data associated with the plurality of cells includes flow cytometry data, mass cytometry data, single cell RNA-sequencing data, multiplex imaging data, data associated with a single cell analysis technique, or any combination thereof.
- Alternative Implementation 48 Alternative Implementation 48.
- any one of Alternative Implementations 1 to 47 wherein the one or more machine learning model is trained using a training data set, the training data set including, for each respective cell of a plurality of cells, (i) a difference between (a) a value of at least one parameter of the one or more measurable parameters for the respective cell and (b) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes, and (ii) a determination of which cell class of the plurality of cell classes the respective cell belongs to.
- Alternative Implementation 49 The method of Alternative Implementation 48, wherein the at least one parameter includes at least a first parameter and a second parameter.
- any one of Alternative Implementations 1 to 47 wherein the one or more machine learning model is trained using a training data set, the training data set including (i) for each respective cell of a plurality of cells, a distance on at least one plot between the respective cell and one or more reference points, and (ii) for each respective cell of the plurality of cells, a determination of which cell class of the plurality of cell classes the respective cell belongs to.
- the training data set including (i) for each respective cell of a plurality of cells, a distance on at least one plot between the respective cell and one or more reference points, and (ii) for each respective cell of the plurality of cells, a determination of which cell class of the plurality of cell classes the respective cell belongs to.
- each plot is associated with at least a first parameter of the one or more measurable parameters and a second parameter of the measurable parameters, and wherein, for each plot, each respective reference point corresponds to a combination of average values of at least the first parameter and the second parameter for a corresponding cell class of the plurality of cell classes. 22 4878-2377-2093.1
- Alternative Implementation 52 The method of any one of Alternative Implementations 1 to 51, further comprising: analyzing the indication of the cell class to which each of the one or more cells belongs; and based at least in part on the analysis, generating a graphical representation of the cell classes of the one or more cells.
- any one of Alternative Implementations 1 to 56 wherein the data includes, for each respective cell, a difference between (i) a value of at least one parameter of the one or more measurable parameters for the respective cell and (ii) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes, and wherein the one or more machine learning models are trained to, based on the difference for each respective cell and each of the at least one parameter, place the respective cell into one of the plurality of cell classes.
- Alternative Implementation 58 Alternative Implementation 58.
- any one of Alternative Implementations 1 to 57 wherein the data includes, for each respective cell, a distance on a plot between the respective cell and one or more reference points, each of the one or more reference points corresponding to a combination of an average value of a first parameter and an average value of a second parameter for a distinct cell class of a plurality of cell classes, and wherein the one or more 23 4878-2377-2093.1 machine learning models are trained to, based on the distance between the respective cell and the one or more reference points, place the respective cell into one of the plurality of cell classes.
- a method for classifying cells comprising: receiving data associated with a cell sample of an individual, the cell sample including a plurality of cells, the data including, for each respective cell, information associated with at least a first measurable parameter and a second measurable parameter; for each respective cell, determining, based on the received data, a value of the first measurable parameter and a value of the second measurable parameter; determining, for each respective cell, a distinct cell class a plurality of cell classes based on (i) a difference between the value of the first measurable parameter for the respective cell and an average value of the first measurable parameter for each of the plurality of cell classes and (ii) a difference between the value of the second measurable parameter for the respective cell and an average value of the second measurable parameter for each of the plurality of cell classes; automatically generating a report based on the determined cell class for each respective cell of the plurality of cells; and transmitting the report to the individual, a healthcare provider of the individual, or both.
- Alternative Implementation 60 The method of Alternative Implementation 59, wherein the report includes a graphical representation of the determined cell class for each respective cell of the plurality of cells, a text description of the determined cell class for each respective cell of the plurality of cells, a recommendation for one or more clinical tests for the individual to undergo, a diagnosis for the individual, or any combination thereof.
- Alternative Implementation 61 Alternative Implementation 61.
- a method for classifying cells comprising receiving data associated with a cell sample of an individual, the cell sample including a plurality of cells, the data including, for each respective cell, information associated with a plurality of measurable parameters; for each respective cell, determining, based on the received data, a value of a first measurable parameter and a value of a second measurable parameter; determining, for each respective cell, a distinct cell class a plurality of cell classes to which the respective cell belongs, the determining being based on (i) a difference between the value of the first measurable parameter for the respective cell and an average value of the first measurable parameter for each of the plurality of cell classes and (ii) a difference between the value of the second measurable parameter for the respective cell and an average value of the second measurable parameter for each of the plurality of cell classes; in response to determining that at least a portion of the plurality of 24 4878-2377-2093.1 cells belong to a first cell class of the plurality of cell classes, determining, for each respective cell belonging to
- Alternative Implementation 62 A system for classifying one or more cells comprising: a memory device having stored thereon machine-readable instructions; and a control system including one or more processors configured to execute the machine-readable instructions to implement the method of any one of Alternative Implementations 1 to 61.
- Alternative Implementation 63 A system for classifying one or more cells, the system comprising a control system configured to implement the method of any one of Alternative Implementations 1 to 61.
- Alternative Implementation 64 A computer program product comprising instructions which, when executed by a computer, cause the computer to carry out the method of any one of Alternative Implementations 1 to 61.
- Alternative Implementation 65 Alternative Implementation 65.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Epidemiology (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Primary Health Care (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Pathology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- General Physics & Mathematics (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
A method for classifying one or more cells comprises receiving data associated with the one or cells, the data including, for each respective cell, information associated with one or more measurable parameters of the respective cell; inputting at least a portion of the data into a machine learning model; and receiving, from the machine learning model, an indication of a cell class of a plurality of cell classes to which each respective cell belongs to. The data can include, for each respective cell, the difference between (i) the value of at least one parameter for the cell respective cell, and (ii) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes.
Description
SYSTEMS AND METHODS FOR CLASSIFYING CELLS USING DISTANCE METRICS CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims priority to and the benefit of U.S. Provisional Patent Application No.63/503,411, filed May 19, 2023, which is hereby incorporated by reference herein in its entirety. TECHNICAL FIELD [0002] The present disclosure relates generally to systems and methods for classifying cells using distance metrics, and more particularly, to machine learning models trained to classify cells into a plurality of different cell classes based on differences between parameter values of the cells and mean parameter values of the cell classes. BACKGROUND [0003] The implementation of flow cytometry into clinical diagnostics has revolutionized the field of hematopathology and has improved diagnostic capabilities tremendously. However, interpretation of clinical flow cytometry data is inherently difficult due to complex and heterogenous immunophenotypes observed in multidimensional space. Additionally, classic flow cytometry analysis is fundamentally subjective when incorporating manual gating strategies for the identification of cell populations. While significant advances have been made in both technologies for flow cytometry acquisition as well as flow cytometry data analysis, these advances have been slow to make their way into clinical diagnostics. Certain approaches to flow cytometry analysis using machine learning models are useful. However, such models can have difficulty in predicting clinical diagnoses, and can be limited to use with specific flow cytometry panels. Thus, new systems and methods for classifying cells based on flow cytometry data (and/or other types of data) are needed. SUMMARY [0004] According to some implementations of the present disclosure, a method for classifying one or more cells comprises receiving data associated with the one or cells, the data including, for 1 4878-2377-2093.1
each respective cell, information associated with one or more measurable parameters of the respective cell; inputting at least a portion of the data into a machine learning model; and receiving, from the machine learning model, an indication of an identity of a cell class of a plurality of cell classes to which each respective cell belongs. [0005] In some of these implementations the data includes, for each respective cell, the difference between (i) the value of at least one parameter for the respective cell, and (ii) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes. [0006] In some of these implementations the data includes, for each respective cell, a distance on each of one or more multi-parameter plots between (i) the respective cell and (ii) one or more reference points. Each multi-parameter plot is associated with at least a first parameter and a second parameter. Each reference point corresponds to the combination of the average values of the first parameter and the second parameter for a distinct cell class. [0007] According to some implementations of the present disclosure, a method for classifying cells comprises: receiving data associated with a cell sample of an individual, the cell sample including a plurality of cells, the data including, for each respective cell, information associated with at least a first measurable parameter and a second measurable parameter; for each respective cell, determining, based on the received data, a value of the first measurable parameter and a value of the second measurable parameter; determining, for each respective cell, a distinct cell class a plurality of cell classes based on (i) a difference between the value of the first measurable parameter for the respective cell and an average value of the first measurable parameter for each of the plurality of cell classes and (ii) a difference between the value of the second measurable parameter for the respective cell and an average value of the second measurable parameter for each of the plurality of cell classes; automatically generating a report based on the determined cell class for each respective cell of the plurality of cells; and transmitting the report to the individual, a healthcare provider of the individual, or both. [0008] According to some implementations of the present disclosure, a method for classifying cells comprises: receiving data associated with a cell sample of an individual, the cell sample including a plurality of cells, the data including, for each respective cell, information associated with a plurality of measurable parameters; for each respective cell, determining, based on the received data, a value of a first measurable parameter and a value of a second measurable 2 4878-2377-2093.1
parameter; determining, for each respective cell, a distinct cell class a plurality of cell classes to which the respective cell belongs, the determining being based on (i) a difference between the value of the first measurable parameter for the respective cell and an average value of the first measurable parameter for each of the plurality of cell classes and (ii) a difference between the value of the second measurable parameter for the respective cell and an average value of the second measurable parameter for each of the plurality of cell classes; in response to determining that at least a portion of the plurality of cells belong to a first cell class of the plurality of cell classes, determining, for each respective cell belonging to the first cell class, a value of a third measurable parameter and a value of a fourth measurable parameter; and determining, for each respective cell belonging to the first cell classes, a distinct cell sub-class of a plurality of cell sub-classes of the first cell class to which the respective cell of the first cell class belongs, the determining being based on (i) a difference between the value of the third measurable parameter for the respective cell and an average value of the third measurable parameter for each of the plurality of cell sub- classes of the first cell class and (ii) a difference between the value of the fourth measurable parameter for the respective cell and an average value of the fourth measurable parameter for each of the plurality of cell sub-classes of the first cell class [0009] The above summary is not intended to represent each implementation or every aspect of the present disclosure. Additional features and benefits of the present disclosure are apparent from the detailed description and figures set forth below. BRIEF DESCRIPTION OF THE FIGURES [0010] The foregoing and other advantages of the present disclosure will become apparent upon reading the following detailed description and upon reference to the drawings. [0011] FIG.1 shows a system for implementing a method for classifying cells, according to aspects of the present disclosure. [0012] FIG.2 shows a flowchart of a method for classifying cells, according to aspects of the present disclosure. [0013] FIG. 3 shows markers used for spectral flow cytometry, according to aspects of the present disclosure. [0014] FIG. 4 shows a representation of different cell classes that cells may be sorted into, according to aspects of the present disclosure. 3 4878-2377-2093.1
[0015] FIG. 5 shows example distance metric calculation for a machine learning model, according to aspects of the present disclosure. [0016] FIG. 6 shows example distance metric calculations for individual cells for a specific cell class, according to aspects of the present disclosure. [0017] FIG.7A shows example distance metric calculations for a primary classification level, according to aspects of the present disclosure. [0018] FIG. 7B shows example distance metric calculations for a primary refinement classification level, according to aspects of the present disclosure. [0019] While the present disclosure is susceptible to various modifications and alternative forms, specific implementations and embodiments have been shown by way of example in the drawings and will be described in detail herein. It should be understood, however, that the present disclosure is not intended to be limited to the particular forms disclosed. Rather, the present disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present disclosure as defined by the appended claims. DETAILED DESCRIPTION [0020] FIG.1 is a block diagram of an example system 100 for implementing any of the herein- discussed features, methods, processes, etc. For example, system 100 can be used to implement one or more machine learning models that classify cells as discussed herein. The system 100 can include one or more processing devices 100, which can each include any one or more of a processor 112, a memory 114, a display 116, a user input device 118, and/or other components. The memory 114 can include machine-readable instructions for executing one or more machine learning models. The processor 112 can execute these instructions to implement the one or more machine learning models. The memory 114 can also store data associated with the cells that are being analyzed (e.g., flow cytometry data). [0021] The processing device 110 can include any suitable processing device, such as general purpose computer systems, microprocessors, digital signal processors, micro-controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs) field programmable logic devices (FPLDs), programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), mobile devices such as mobile telephones, personal digital assistants (PDAs), or tablet computers, local servers, remote servers, wearable computers, or the like. The memory 4 4878-2377-2093.1
device 114 can include any suitable memory device and/or machine-readable medium that is capable of storing, encoding, and/or carrying a set of instructions for execution by a processing device and that cause the processing device to perform and/or implement any of the features discussed herein, including solid-state memories, optical media, magnetic media, random access memory (RAM), read only memory (ROM), a floppy disk, a hard disk, a CD ROM, a DVD ROM, flash memory, or other computer readable medium that is read from and/or written to by a magnetic, optical, or other reading and/or writing system that is coupled to the processing device, can be used for the memory or memories. [0022] The display 116 can be used to display any information associated with the features disclosed herein, including the results of the classification analysis by the machine learning model. The display device 116 can be any known display technology, including but not limited to display devices using Liquid Crystal Display (LCD) or Light Emitting Diode (LED) technology. The user input device 118 can be used to allow the user to interact with the system 100 for any suitable purpose, including initiating, pausing, or terminating the analysis by the machine learning model; adjusting any parameters of the analysis, etc. In some implementations, the system 100 includes a flow cytometry system 120 that generates the data. The flow cytometry system 120 can generally be any suitable type of flow cytometry system. In other implementations, the system 100 does not include the flow cytometry system, but instead receives data from an external source. [0023] FIG.2 shows a flowchart of a method 200 for classifying cells, according to aspects of the present disclosure. Step 210 of method 200 includes receiving data associated with one or more cells. In some implementations, the data includes flow cytometry data and is associated with one or more measurable parameters of the cells. The parameters can include parameters associated with scattering of light caused by the cells. (e.g., a forward scatter amount, such as a forward scatter area and/or a forward scatter angle; a side scatter amount, such as a side scatter area and/or a side scatter angle; a forward scatter time-of-flight; a side scatter time-of-flight; etc.). [0024] The parameters can also include parameters associated with the presence of one or more biomarkers in the cells. These parameters can be associated with the presence and/or amount of a predetermined molecule in the cells, and can include an intensity of fluorescent emission from the cells, a color of the fluorescent emission from the cells, etc. The predetermined molecule can generally include any suitable molecule that may be used as a biomarkers, such as a CD2 molecule, a CD3 molecule, a CD4 molecule, CD5 molecule, a CD7 molecule, a CD8 molecule, a CD10 5 4878-2377-2093.1
molecule, a CD11 molecule, a CD13 molecule, a CD14 molecule, a CD16 molecule, a CD19 molecule, a CD20 molecule, a CD22 molecule, a CD23 molecule, a CD33 molecule, a CD34 molecule, a CD45 molecule, a CD64 molecule, a CD117 molecule, an HLA-DR molecule, a cluster of differentiation molecule, an antigen, an antibody, an immunoglobulin chain (such as a kappa (κ) light chain, a lambda (λ) light chain, a gamma (γ) heavy chain, a delta (δ) heavy chain, an alpha (α) heavy chain, a mu (μ) heavy chain, an epsilon (^) heavy chain, etc.), or any combination thereof. In general, the biomarker can be any type of cellular marker that can be used identify the cells, including proteins and markers associated with gene expression/transcription. [0025] Other types of parameters can also be used, such as parameters associated with surface markers of the cells, parameters associated with intracellular markers of the cells, parameters associated with the size of the cells, parameters associated with gene expression of the cells, etc. [0026] Step 220 of method 200 includes inputting at least a portion of the received data into one or more trained machine learning models. As discussed herein, the one or more machine learning models can include one or more random forest models. Each of the one or more random forest models can include a plurality of decision trees and a voting module, where each decision tree makes an independent determination and the voting module selects one of those independent determinations as the output of the random forest model. [0027] Step 230 of the method 200 includes receiving from the one or more machine learning models an indication of the cell class to which each of the one or more cells belong. In general, each of the cells could be classified into one or more cell classes of a plurality of cell classes. In some implementations, the plurality of cell classes includes lymphocytes, granulocytes, monocytes, B-cells (which may be a distinct class, or can be a sub-class of lymphocytes), T-cells (which may be a distinct class, or can be a sub-class of lymphocytes), other classes (and/or subclasses), or any combination thereof. [0028] In some implementations, the one or more machine learning models can place each cell into a cell class by determining the distance of each cell on a multi-parameter plot to a reference point on the plot, where each reference point corresponds to the average value of one or more parameters for cells within a respective cell class for that plot. For example, some plots may be associated with two or more parameters of the measurable parameters of the cell. In some implementations, a first parameter of the plot is associated with scattering of light caused by the cells, and a second parameter of the plot is associated with the presence of a biomarker. In other 6 4878-2377-2093.1
implementations, both parameters are associated with scattering of light caused by the cells. In further implementations, both parameters are associated with the presence of a biomarker in the cells. [0029] In general, each of the plots will have one or more reference points that each correspond to a predetermined cell class. The reference point for a cell class (which is previously determined using training data) is at a location on the plot that corresponds to the average value of one or more parameters for all cells within the cell class from the training data. The reference point is generally within the distribution of values of the one or more parameters for the cells within the cell class from the training data. Most plots will include at least two reference points that correspond to two different cell classes. [0030] As used herein, references to the location of a respective cell on a multi-parameter plot generally refer to the location of the intersecting parameter values of the respective cell. Thus, the location of the nth cell on a multi-parameter plot of a first parameter P1 and a second parameter P2 can be expressed as ^ ^^1^, ^^2^^, where ^^1^ is the value of the first parameter for the nth cell, and ^^2 is the value of the second parameter for the nth cell. The location of the reference points uses similar nomenclature and can be referred to using mean parameter values. For example, the reference point for the m th cell class on the same multiparameter plot can be expressed as ^ ത ^ ത ^ ത 1 തത ^ ത , ത ^ ത ^ ത 2 തത ^ ത ^, where ത ^ ത ^ ത 1 തത ^ ത is the average value of the first parameter across all cells in the m th cell class, and ത ^ ത ^ ത 2 തത ^ ത is the average value of the second parameter across all cells in the m th cell class. In implementations where the plot includes only a single parameter, the location of the nth cell on the plot can be expressed as ^ ^^1^^, while the location of the reference point for the mth cell class on the same plot can be expressed as ^ ത ^ ത ^ ത 1 തത ^ ത ^. In implementations where the plot includes X parameters, the location of the nth cell on the plot can be expressed as ^ ^^1^, ^^2^, … , ^^ ^^^^, while the location of the reference point for the mth cell class on the same plot can be expressed as ^ ത ^ ത ^ ത 1 തത ^ ത , ത ^ ത ^ ത 2 തത ^ ത , … , ത ^ ത ^ ത ^ ത ^ ത ^ ത ^.
step 230 of the method 200 can include sub-steps 232, 234, and 236. Sub-step 232 can include, for each cell, determining the value of at least a first parameter and a second parameter. Depending on how many different plots and/or parameters will be used to classify the cell, additional parameters values can be determined. Sub-step 234 includes determining, for each respective cell, the distance between the cell and the one or more reference points on a given multi-parameter plot. Generally, sub-step 234 will be performed for each multi- 7 4878-2377-2093.1
parameter plot being used to classify the cells. Sub-step 236 includes, for each respective multi- parameter plot being used to classify the cells, placing each cell into a cell class based on the distance between the cell and the one or more reference points of the respective multi-parameter plot. In some implementations, the cell is placed into the cell class to which it is the closest to, e.g., the cell will belong to the cell class where the distance between the cell and the reference point for that cell class is the shortest. This distance, as measured on the multi-parameter plot, can be considered to be a Euclidian distance or a Mahalanobis distance. [0032] In some implementations, the plots include a plot having a first reference point corresponding to a first cell class and a second reference point corresponding to a second cell class. In some of these implementations, the first cell class include lymphocytes, and the second cell class include granulocytes and monocytes. In some of these implementations, the first parameter is associated with the presence of a CD3 molecule in the cells, and the second parameter is associated with the presence of a CD19 molecule in the cells. [0033] In some implementations, the plots include a plot having a first reference point corresponding to a first cell class, a second reference point corresponding to a second cell class, and a third reference point corresponding to a third cell class. In some of these implementations, the first cell class include lymphocytes, the second cell class include granulocytes, and the third cell class includes monocytes. In some of these implementations, the first cell class include granulocytes and monocytes, the second cell class includes T-cells, and the third cell class includes B-cells. In some of these implementations, the first parameter is associated with the presence of a CD45 molecule in the cells, and the second parameter is associated with an amount of side scatter caused by the cells. In some of these implementations, the first parameter is associated with the presence of a CD3 molecule in the cells, and the second parameter is associated with the presence of a CD19 molecule in the cells. [0034] In some implementations, the placement of the cells into cell classes can be done in stages. For example, the one or more machine learning models can first place each respective cell into a distinct cell class of a plurality of cell classes, using the techniques discussed herein. These classes may include a cell class containing lymphocytes, a cell class containing granulocytes and monocytes, etc. In response to at least a portion of the cells being sorted into a certain cell class, the one or more machine learning models can then place each cell belonging to that certain cell class into one of a plurality of cell sub-classes of the certain cell class. 8 4878-2377-2093.1
[0035] For example, in one implementation, the one or more machine learning models in the first stage can place each cell into a lymphocyte cell class, a monocyte cell class, or a granulocyte cell class. In the second stage, the one or more machine learning models can place each cell in the lymphocyte class into an NK-cell sub-class, a T-cell sub-class, or a B-cell sub-class. Further stages can also be performed, for example by placing each cell in the T-cell sub-class into a CD4+ T-cell group (or sub-sub-class) or a CD8+ T-cell group (or sub-sub-class). This staged approach thus provides more granularity in the analysis of the cell sample, and can in some cases be performed automatically in response earlier stages being completed. Those of skill in the art will recognize that the classes (e.g., lymphocyte class, ), sub-classes (e.g., NK-cell cub-class), sub-sub-classes (e.g., CD4+ T-cell group) referred to herein are only provided as examples, and that generally any suitable classification of cells based on any one or more parameters can be used, including any suitable phenotypic classification and/or any suitable functional classification. [0036] The machine learning models can be trained to perform varying levels of the analysis. For example, in some implementations, the data that is input into the trained machine learning models includes the distances between each cell and the reference points of any plot being used. The machine learning models are trained to classify each cell into one or more cell classes based on these distances. In other implementations, the data that is input into the trained machine learning models includes the required parameter values for each cell. The machine learning models are trained to analyze the data to determine the distances between each cell and the reference points of any plot being used, and then classify each cell into one or more cell classes based on these determined distances. In further implementations, the raw data (e.g., raw flow cytometry data) is input into the machine learning models. The machine learning models are trained to analyze the data to determine the required parameter values for each cell, determine the distances between each cell and the reference points of any plot being used, and then classify each cell into one or more cell classes based on these determined distances. [0037] In some implementations, the one or more machine learning models can include one or more random forest models that correspond to one multi-parameter plot/set of parameters. For example, a first random forest model can be trained to place cells into cell classes based on a first parameter and a second parameter, a second random forest model can be trained to place cells into cell classes based on the first parameter and a third parameter, a third random forest model can be trained to place cells into cell classes on the second parameter and the third parameter, etc. 9 4878-2377-2093.1
[0038] In general, each random forest model includes a plurality of decision trees and a voting module. Each decision tree of a given random forest model is configured to generate an independent determination of which cell class each cell belongs to. For example, if a random forest model is trained to place each cell into a first cell class (e.g., lymphocytes), a second cell class (e.g., granulocytes), or a third cell class (e.g., monocytes), each decision tree is configured to make an independent determination, for each respective cell, of which cell class the respective cell belongs to. The voting module then selects one of the independent determinations as the final determination of the random forest model for each cell. In some implementations, the voting module is configured to select as the final determination the cell class which was the most frequent winner between the individual decision trees. For example, if five decision trees place a cell into the first cell class, two decision trees place the cell into the second cell class, and one decision tree places the cell into the third cell class, the voting module can select the first cell class as the final determination of that random forest model. In other implementations, the voting module may select the final determination of the random forest model using other techniques, such as an average of the decision trees, a weighted average of the decision trees, etc. [0039] A training data set can be generated to train the one or more machine learning models (e.g., to train the random forest models). In some implementations, the training data set includes, for each respective cell in the training data set, (i) a distance (on one or more multi-parameter plots) between the respective cell and one or more reference points, and (ii) a determination of which cell class the respective cell belongs to. In some of these implementations, the parameter values each cell can be normalized with respect to the average parameter values for the different cell classes, such that the distances can be expressed as (i) a value that is greater than or equal to 0 and less than or equal to 1, (ii) a value expressed as a percentage), etc. The training data set could also include raw data (e.g., raw flow cytometry data and/or other data) instead of (or in addition to) the distances between the cells and the reference points. [0040] In general, the data used by the one or more machine learning models (to train the models and/or to use the models to classify new cells) can include any suitable type of data associated with different measurable parameters of the cells. The data can include flow cytometry data, mass cytometry data, single cell RNA-sequencing data, multiplex imaging data, data associated with generally any single cell analysis method that results in the output of one or more parameters for analysis, other types of data, etc. In some implementations, the machine learning 10 4878-2377-2093.1
models can be trained to receive and analyze multiple types of data, including multiple types of the data discussed herein. For example, a given machine learning model can be trained to classify a cell based on a first parameter that is measured using flow cytometry, and a second parameter that is measured using mass cytometry. [0041] In some implementations, method 200 can include additional steps after all of the cells have been classified and the indication of the cell class(es) for each cell has been received. For example, method 200 can include analyzing the indications and generating a graphical representation of the cell classes of the cells, generating a text description of the cells, generating a recommendation for clinical tests for the individual to whom the cells belong to undergo, generating a diagnosis for the individual, etc. [0042] In some implementations, once each cell has been sorted into a cell class, a report can be automatically generated and transmitted to the individual, a healthcare provider of the individual, or both. The report can include, for example, a graphical representation of the determined cell class for each respective cell of the plurality of cells, a text description of the determined cell class for each respective cell of the plurality of cells, a recommendation for one or more clinical tests for the individual to undergo, a diagnosis for the individual, or any combination thereof. Thus, once a cell sample is collected from an individual, analysis can be automatically performed, and the report can be automatically generated and transmitted. [0043] In general, method 200 can be used to sort each cell in a cell sample into one or more distinct cell classes of a plurality of cell classes. After it is determined which cell class each cell belongs to, a variety of different actions can be taken, including further sorting into sub-classes, analysis of the cell classes of all the cells in the cell sample, etc. Method 200 thus provides for more detailed analysis of cell samples and recommendations regarding the same, allowing for better diagnostic mechanisms and improved treatment. [0044] Disclosed herein is an example of the features discussed herein. [0045] Introduction and Specific Aims [0046] Proposed herein is a clinical flow cytometry decision support system in which is the machine learning component is trained from a high dimensional flow cytometry (spectral) data set which encompasses the majority of necessary backbone antigenic markers for applicability by the majority of flow cytometry labs for hematolymphoid disorders. 11 4878-2377-2093.1
[0047] Aim 1 will encompass the collection of high-parameter flow cytometry data from a variety of hematolymphoid disorders using the Sony ID7000. Aim 2 seeks to develop the machine learning models for cell population identification using the high dimensional flow cytometry dataset from Aim 1. The backbone of this process will be based on machine learning models for our lab-specific model. Aim 3 will involve the validation of the machine learning models on clinical flow cytometry data sets which were acquired using different instrumentation and different panels to test for broad lab applicability. [0048] Project Description [0049] Aim 1: Utilizes a high-parameter flow cytometry panel that simultaneously examines 29 antigenic markers using the Sony ID70007-laser spectral flow cytometer. This panel (FIG.3) integrates a 4-tube screening panel and includes additional markers (CD123, CD71, and CD57) for additional cell population characterization. This panel has been tested on the Sony ID7000 with success. For this Aim, an additional 100 samples will be acquired from a variety of hematolymphoid disorders as well as normal patients from bone peripheral blood and bone marrow to gain a wide representation of cell populations. For all peripheral blood samples, at least 1 million cellular events will be collected, and for bone marrows, at least 5 million cellular events. [0050] Aim 2: Using the data from Aim 1, FlowSOM will be utilized for high-dimensional clustering for cell population identification using backbone markers (FIG. 3). Because of the complexity of the cell populations present as well as number of events (>100 million), a 20x20 SOM with a detection of 80 metaclusters will be used to allow for low frequency population identification. The 80 metaclusters will be thoroughly characterized and annotated based on backbone marker expression. [0051] As the machine learning model is intended to be universally applicable, individual antigenic marker expression at a fluorescence level cannot be utilized for cell population classification. This is due to the fact that each lab will have different cytometers, fluorophore- target combinations, and cytometer settings. This issue is not unique to clinical flow cytometry and has plagued large scale meta-analyses of flow cytometry data on the research side as well. To overcome this, a new machine learning method which uses relative expression metrics based on low-level population characteristics with subsequent refinement of classification was developed. An overview of the steps are as follows. [0052] Initial Model Training 12 4878-2377-2093.1
[0053] From the initial clustering data, cells are classified at multiple levels, with the most basic being lymphocyte, granulocyte or monocyte (FIG.4). A basic machine learning model using CD45, side scatter (SSC), and forward scatter (FSC) is then generated for classification of cells as lymphocytes, granulocytes, or monocytes. Next, distance metrics for each core antigenic marker are calculated relative to lymphocytes, granulocytes and monocytes (FIG. 5). These distance metrics can be normalized on a scale from 0 to 1, and are then used to train a random forest machine learning model for cell population classification at the class level. Additional rounds of distance metrics can also be calculated for non-core antigenic markers with additional machine learning models created. [0054] New Data Predictions [0055] The initial machine learning model from above using CD45, SSC and FSC is used to broadly classify cells as lymphocytes, granulocytes, or monocytes (FIG.6). The accuracy of this step is not very important, as it is only used for an initial classification and will be further refined. Distance metrics for core marker expression can then be calculated for each individual cell relative to the mean marker expression for lymphocytes, granulocytes, and monocytes for primary classification (FIG.7A). These distance metrics can be used to predict primary cell classification using the machine learning model from above. Distance metrics can then be calculated using primary classifications for a refinement of the primary classifications (FIG. 7B). Additional distance metrics and machine learning predictions can be used for more granular population classifications such as secondary cell classifications (FIG.4). [0056] The accuracy of the predictions will first be calculated using a 70:30 test to validation split for the training data from the Sony ID7000. Accuracy will be determined at each level of classification (FIG.4). Confusion matrices will be examined for determination of misclassification patterns (ex. CD4+ T-cell being classified as a CD8+ T-cell). [0057] Aim 3: In-house collected clinical flow cytometry data, which was acquired using a 4- tube screening panel, will first be examined on a different instrument. The fluorophore-antigen markers are non-overlapping with the spectral panel and will allow for a sufficient proof-of- principle. Classification performance to the “Secondary” Level (FIG. 4) will then be examined. Upon successful results with the in-house data, the models will be tested on collaborator data. Performance will be assessed using visual examination of classifications of cells plotted on 2- dimensional flow cytometry dot plots, as well as comparing population frequencies between 13 4878-2377-2093.1
manual analysis and automated analysis. Following completion of this aim, these methods will be incorporated into one or more existing decision support system for flow cytometry analysis. [0058] As used herein, terms relating to multi-parameter plots, distances on the multi- parameter plots, cell locations on the multi-parameter plots, reference points on the multi- parameter plots, etc. are used to describe the process of classifying cells by comparing parameter values. However, those of skill in the art will understand that the one or more machine learning models may not actually construct multi-parameter plots and determine the distances on these plots to classify cells into cell classes. Instead, the cell classification can be accomplished by determining the value of one or more parameters for each cell being classified, and comparing that value to the average value of each of the one or more parameters for each of the possible cell classes that the cell could be classified into, without any actual construction of a multi-parameter plot. The possible different cell classes and parameters used to classify cells into those cell classes can be similar to or the same as the cell classes and parameters discussed herein with respect to the multi-parameter plots. [0059] Thus, the sub-steps of step 230 of method 200 can be generalized. Sub-step 232 can include determining the value of at least one parameter for each respective cell. Sub-step 234 can include determining, for each respective cell, the difference between (i) the value of the at least one parameter for the respective cell and (ii) the average value of the at least one parameter for each of the one or more cell classes that the respective cell may be placed into. Sub-step 236 can include placing the cell into a specific one or more of the cell classes based on the difference(s). [0060] In general, each of the different “plots” discussed herein can correspond to a different machine learning model that is trained to classify cells into specific cell classes based on specific parameters. The machine learning models can be trained to perform varying levels of the analysis. For example, in some implementations, the data that is input into the trained machine learning models includes the differences between the parameter values for the individual cells and the average parameter values for the potential cell classes that the cells may be placed into. The machine learning models are trained to classify each cell into one or more cell classes based on these differences. In other implementations, the data that is input into the trained machine learning models includes the required parameter values for each cell. The machine learning models are trained to analyze the data to determine the differences between the parameter values of each cell and the average parameter values for the classes of cells that each cell could be classified into. In 14 4878-2377-2093.1
further implementations, the raw data (e.g., raw flow cytometry data) is input into the machine learning models. The machine learning models are trained to analyze the data to determine the required parameter values for each cell, determine the differences between the parameter values of each cell and the average parameter values of the cell classes that each cell could be classified into, and then place each cell into one or more cell classes based on these determined differences. [0061] Similar to the description herein related to the analysis of multi-parameter plots, a training data set can be generated to train the one or more machine learning models (e.g., to train the random forest models). In some implementations, the training data set includes, for each respective cell in the training data set, (i) the difference between (a) the value of at least one parameter for the respective cell and (b) the average value of the at least one parameter for each of the cell classes that the model is being trained for, and (ii) a determination of which cell class of the cell classes for the model that the respective cell belongs to. In some of these implementations, the parameter values each cell can be normalized with respect to the average parameter values for the different cell classes, such that the differences can be expressed as (i) a value that is greater than or equal to 0 and less than or equal to 1, (ii) a value expressed as a percentage), etc. The training data set could also include raw data (such as raw flow cytometry data and/or other data) instead of (or in addition to) the differences between the cell parameter values and the average parameter values for the cell classes. [0062] In general, each of the machine learning models (which can be random forest models) is trained to classify cells into a cell class of a specific class of cells. As the possible cell classes that a given model looks at become narrower (e.g., a first model that classifies cells as lymphocytes vs. granulocytes vs. monocytes is broader than a second model that classifies cells as granulocytes/monocytes vs. T-cells vs. B-cells), the model may utilize more parameters in classifying the cells. Thus, for example, a model that makes a broad classification may only analyze two parameters of each cell, but a model that makes a narrower classification may analyze ten parameters of each cell. [0063] Generally, any of the methods disclosed herein can be implemented using a system having a control system with one or more processors, and a memory device storing machine- readable instructions. The control system can be coupled to the memory device, and methods can be implemented when the machine-readable instructions are executed by at least one of the processors of the control system. The methods can also be implemented using a computer program 15 4878-2377-2093.1
product (such as a non-transitory computer readable medium) comprising instructions that when executed by a computer, cause the computer to carry out the steps of the methods. [0064] One or more elements or aspects or steps, or any portion(s) thereof, from one or more of any of claims or Alternative Implementations below can be combined with one or more elements or aspects or steps, or any portion(s) thereof, from one or more of any of the other claims or Alternative Implementations or combinations thereof, to form one or more additional implementations and/or claims of the present disclosure. [0065] ALTERNATIVE IMPLEMENTATIONS [0066] Alternative Implementation 1. A method for classifying cells, the method comprising: receiving data associated with one or more cells of an individual, the data including, for each respective cell, information associated with one or more measurable parameters of the respective cell; inputting at least a portion of the data into one or more machine learning models; and receiving, from the one or more machine learning models, an indication of a cell class of a plurality of cell classes to which each respective cell belongs. [0067] Alternative Implementation 2. The method of Alternative Implementation 1, where the data associated with the one or more cells includes flow cytometry data, mass cytometry data, single cell RNA-sequencing data, multiplex imaging data, data associated with a single cell analysis technique, or any combination thereof. [0068] Alternative Implementation 3. The method of Alternative Implementation 1 or Alternative Implementation 2, wherein the data associated with the one or more cells includes, for each respective cell, a difference between (i) a value of at least one parameter of the one or more measurable parameters for the respective cell and (ii) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes. [0069] Alternative Implementation 4. The method of Alternative Implementation 3, wherein the at least one parameter includes a first parameter and a second parameter. [0070] Alternative Implementation 5. The method of any one of Alternative Implementations 1 to 4, wherein the data associated with the one or more cells includes, for each respective cell, a distance on at least one plot between (i) the respective cell and (ii) one or more reference points, each respective reference point corresponding to a distinct cell class of the plurality of cell classes, each of the plots being associated with at least a first parameter of the one or more measurable parameters and a second parameter of the measurable parameters. 16 4878-2377-2093.1
[0071] Alternative Implementation 6. The method of Alternative Implementation 5, wherein, for each plot, each respective reference point corresponds to a combination of average values of at least the first parameter and the second parameter for the corresponding cell class. [0072] Alternative Implementation 7. The method of any one of Alternative Implementations 4 to 6, wherein the first parameter is a parameter associated with scattering of light caused by the one or cells, and the second parameter is a parameter associated with a presence of a biomarker in the one or more cells. [0073] Alternative Implementation 8. The method of any one of Alternative Implementations 4 to 6, wherein the first parameter a parameter associated with scattering of light caused by the one or cells, and the second parameter is an additional parameter associated with scattering of light caused by the one or more cells. [0074] Alternative Implementation 9. The method of any one of Alternative Implementations 4 to 6, wherein the first parameter is a parameter associated with a presence of a first biomarker in the one or more cells, and the second parameter is an additional parameter associated with a presence of a biomarker in the one or more cells. [0075] Alternative Implementation 10. The method of any one of Alternative Implementations 7 to 9, wherein the parameter associated with scattering of light caused by the respective cell includes a forward scatter amount, a side scatter amount, a forward scatter time-of-flight, or any combination thereof. [0076] Alternative Implementation 11. The method of Alternative Implementation 10, wherein the forward scatter amount includes a forward scatter area, a forward scatter angle, or both. [0077] Alternative Implementation 12. The method of Alternative Implementation 10 or Alternative Implementation 11, wherein the side scatter amount includes a side scatter area, a side scatter angle, or both. [0078] Alternative Implementation 13. The method of any one of Alternative Implementations 9 to 12, wherein the one or more parameters associated with the biomarker of the respective cell includes a presence of a predetermined molecule in the respective cell, an amount of the predetermined molecule in the respective cell, or both. [0079] Alternative Implementation 14. The method of Alternative Implementation 13, wherein the one or more parameters associated with the biomarker of the respective cell includes an 17 4878-2377-2093.1
intensity of fluorescent emission from the respective cell, a color of fluorescent emission from the respective cell, or both. [0080] Alternative Implementation 15. The method of Alternative Implementation 13 or Alternative Implementation 14, wherein the predetermined molecule includes a CD2 molecule, a CD3 molecule, a CD4 molecule, CD5 molecule, a CD7 molecule, a CD8 molecule, a CD10 molecule, a CD11 molecule, a CD13 molecule, a CD14 molecule, a CD16 molecule, a CD19 molecule, a CD20 molecule, a CD22 molecule, a CD23 molecule, a CD33 molecule, a CD34 molecule, a CD45 molecule, a CD64 molecule, a CD117 molecule, an HLA-DR molecule, a cellular marker usable for identification of the respective cell or any combination thereof. [0081] Alternative Implementation 16. The method of any one of Alternative Implementations 13 to 15, wherein the predetermined molecule includes a cluster of differentiation molecule, an antigen, an antibody, an immunoglobulin chain, or any combination thereof. [0082] Alternative Implementation 17. The method of Alternative Implementation 16, wherein the immunoglobulin chain includes a kappa (κ) light chain, a lambda (λ) light chain, a gamma (γ) heavy chain, a delta (δ) heavy chain, an alpha (α) heavy chain, a mu (μ) heavy chain, an epsilon (^) heavy chain, or any combination thereof. [0083] Alternative Implementation 18. The method of any one of Alternative Implementations 3 to 17, wherein the one or more distinct cell classes includes a first cell class and a second cell class. [0084] Alternative Implementation 19. The method of any one of Alternative Implementations 5 to 17, wherein the at least one plot includes a plot with a first reference point corresponding to a first cell class and a second reference point corresponding to a second cell class. [0085] Alternative Implementation 20. The method of Alternative Implementation 18 or Alternative Implementation 19, wherein the first cell class includes lymphocytes, and the second cell class includes granulocytes and monocytes. [0086] Alternative Implementation 21. The method of any one of Alternative Implementations 18 to 20, wherein the first parameter of the plot is associated with a presence of a CD3 molecule in the one or more cells, and the second parameter of the plot is associated with a presence of a CD19 molecule in the one or more cells. [0087] Alternative Implementation 22. The method of Alternative Implementation 18, wherein the one or more distinct cell classes further includes a third cell class. 18 4878-2377-2093.1
[0088] Alternative Implementation 23. The method of Alternative Implementation 19, wherein the plot further includes a third reference point corresponding to a third cell class. [0089] Alternative Implementation 24. The method of Alternative Implementation 22 or Alternative Implementation 23, wherein the first cell class includes lymphocytes, the second cell class includes granulocytes, and the third cell class includes monocytes. [0090] Alternative Implementation 25. The method of any one of Alternative Implementations 22 to 24, wherein the first parameter of the plot is associated with a presence of a CD45 molecule in the one or more cells, and the second parameter is associated with an amount of side scatter caused by the one or more cells. [0091] Alternative Implementation 26. The method of Alternative Implementation 22 or Alternative Implementation 23, wherein the first cell class includes granulocytes and monocytes, the second cell class includes T-cells, and the third cell class includes B-cells. [0092] Alternative Implementation 27. The method of any one of Alternative Implementations 22, 23, and 26, wherein the first parameter of the plot is associated with a presence of a CD3 molecule in the one or more cells, and the second parameter is associated with a presence of a CD19 molecule in the one or more cells. [0093] Alternative Implementation 28. The method of any one of Alternative Implementations 3 to 27, wherein the first parameter is associated with a presence of a CD45 molecule in the one or more cells, and the second parameter is associated with an amount of side scatter caused by the one or more cells. [0094] Alternative Implementation 29. The method of Alternative Implementation 28, wherein the one or more distinct cell classes includes a first cell class that includes lymphocytes, a second cell class that includes granulocytes, and a third cell class that includes monocytes. [0095] Alternative Implementation 30. The method of Alternative Implementation 28, wherein the plot includes a first reference point corresponding to lymphocytes, a second reference point corresponding to granulocytes, and a third reference point corresponding to monocytes. [0096] Alternative Implementation 31. The method of any one of Alternative Implementations 3 to 30, wherein the at least one plot includes a plot where the first parameter is associated with a presence of a CD45 molecule in the one or more cells, and the second parameter is associated with a presence of a CD3 molecule in the one or more cells. 19 4878-2377-2093.1
[0097] Alternative Implementation 32. The method of Alternative Implementation 31, wherein the one or more distinct cell classes includes a first cell class that includes granulocytes and monocytes, and a second cell class that includes lymphocytes. [0098] Alternative Implementation 33. The method of Alternative Implementation 31, wherein the one or more distinct cell classes includes a first cell class that includes granulocytes and monocytes, a second cell class that includes T-cells, and a third cell class that includes B-cells. [0099] Alternative Implementation 34. The method of Alternative Implementation 31, wherein the plot includes a first reference point corresponding to granulocytes and monocytes and a second reference point corresponding to lymphocytes. [0100] Alternative Implementation 35. The method of Alternative Implementation 31, wherein the plot includes a first reference point corresponding to granulocytes and monocytes, a second reference point corresponding to T-cells, and a third reference point corresponding to B-cells. [0101] Alternative Implementation 36. The method of any one of Alternative Implementations 1 to 28, wherein the one or more machine learning models is trained to sort each respective cell into at least one cell class of the plurality of cell classes. [0102] Alternative Implementation 37. The method of Alternative Implementation 36, wherein the one or more machine learning models is trained to sort the one or more cells into the at least one of the plurality of cell classes based at least in part on, for each respective cell, the difference between (i) a value of at least one parameter of the one or more measurable parameters for the respective cell and (ii) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes. [0103] Alternative Implementation 38. The method of Alternative Implementation 36, wherein the one or more machine learning models is trained to sort each respective cell into the at least one of the plurality of cells classes based at least in part on, for each respective cell, the distance on the at least one plot between the respective cell and the one or more reference points. [0104] Alternative Implementation 39. The method of any one of Alternative Implementations 1 to 38, wherein the one or more machine learning models is trained to: determine whether each respective cell belongs to a first cell class of the plurality of cell classes or a second cell class of the plurality of cell classes; and determine whether each respective cell belonging to the first cell class of the plurality of cell classes belongs to a first sub-class or a second sub-class. 20 4878-2377-2093.1
[0105] Alternative Implementation 40. The method of Alternative Implementation 39, wherein the first cell class includes lymphocytes, and the second cell class includes granulocytes and monocytes. [0106] Alternative Implementation 41. The method of Alternative Implementation 39 or Alternative Implementation 40, wherein the first sub-class includes T-cells and the second sub- class includes B-cells. [0107] Alternative Implementation 42. The method of any one of Alternative Implementations 1 to 41, wherein the one or more machine learning models is trained to: determine, for each respective cell, a value of at least a first parameter of the one or more measurable parameters and a second parameter of the one or more measurable parameters; determine, for each respective cell, a difference between (i) a value of at least one parameter of the one or more measurable parameters for the respective cell, and (ii) ab average value of the at least one parameter for each of one or more distinct cell classes; and based on the determined difference for each respective cell, place the respective cell into one of the plurality of cell classes. [0108] Alternative Implementation 43. The method of any one of Alternative Implementations 1 to 41, wherein the one or more machine learning models is trained to: determine, for each respective cell, a value of at least a first parameter of the one or more measurable parameters and a second parameter of the one or more measurable parameters; determine, for each respective cell, a distance on a plot between the respective cell and one or more reference points, each of the one or more reference points corresponding to a combination of an average value of the first parameter and an average value of the second parameter for a distinct cell class of a plurality of cell classes; and based on the distance between the respective cell and the one or more reference points, place the respective cell into one of the plurality of cell classes. [0109] Alternative Implementation 44. The method of any one of Alternative Implementations 1 to 34, wherein the one or more machine learning models includes a random forest model configured to determine the cell class to which each respective cell belongs. [0110] Alternative Implementation 45. The method of Alternative Implementation 44, wherein the random forest model has a plurality of decision trees and a voting module, each of the plurality of decision trees being configured to generate an independent determination of the cell class to which reach respective cell belongs, the voting module being configured to select the independent 21 4878-2377-2093.1
determination of one of the plurality of decision trees as a final determination of the cell class to which the respective cell belongs. [0111] Alternative Implementation 46. The method of any one of Alternative Implementations 1 to 36, wherein the one or more machine learning model is trained using a training data set, the training data set including (i) data associated with a plurality a cells other than the one or more cells, and (ii) for each respective cell of the plurality of cells, a determination of which cell class of the plurality of cell classes the respective cell belongs to. [0112] Alternative Implementation 47. The method of Alternative Implementation 46, wherein the data associated with the plurality of cells includes flow cytometry data, mass cytometry data, single cell RNA-sequencing data, multiplex imaging data, data associated with a single cell analysis technique, or any combination thereof. [0113] Alternative Implementation 48. The method of any one of Alternative Implementations 1 to 47, wherein the one or more machine learning model is trained using a training data set, the training data set including, for each respective cell of a plurality of cells, (i) a difference between (a) a value of at least one parameter of the one or more measurable parameters for the respective cell and (b) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes, and (ii) a determination of which cell class of the plurality of cell classes the respective cell belongs to. [0114] Alternative Implementation 49. The method of Alternative Implementation 48, wherein the at least one parameter includes at least a first parameter and a second parameter. [0115] Alternative Implementation 50. The method of any one of Alternative Implementations 1 to 47, wherein the one or more machine learning model is trained using a training data set, the training data set including (i) for each respective cell of a plurality of cells, a distance on at least one plot between the respective cell and one or more reference points, and (ii) for each respective cell of the plurality of cells, a determination of which cell class of the plurality of cell classes the respective cell belongs to. [0116] Alternative Implementation 51. The method of Alternative Implementation 50, wherein each plot is associated with at least a first parameter of the one or more measurable parameters and a second parameter of the measurable parameters, and wherein, for each plot, each respective reference point corresponds to a combination of average values of at least the first parameter and the second parameter for a corresponding cell class of the plurality of cell classes. 22 4878-2377-2093.1
[0117] Alternative Implementation 52. The method of any one of Alternative Implementations 1 to 51, further comprising: analyzing the indication of the cell class to which each of the one or more cells belongs; and based at least in part on the analysis, generating a graphical representation of the cell classes of the one or more cells. [0118] Alternative Implementation 53. The method of any one of Alternative Implementations 1 to 52, further comprising: analyzing the indication of the cell class to which each of the one or more cells belongs; and based at least in part on the analysis, generating a text description of the cell classes of the one or more cells. [0119] Alternative Implementation 54. The method of any one of Alternative Implementations 1 to 53, further comprising: analyzing the indication of the cell class to which each of the one or more cells belongs; and based at least in part on the analysis, generating a recommendation for one or more clinical tests for the individual to undergo. [0120] Alternative Implementation 55. The method of any one of Alternative Implementations 1 to 54, further comprising: analyzing the indication of the cell class to which each of the one or more cells belongs; and based at least in part on the analysis, generating a diagnosis for the individual. [0121] Alternative Implementation 56. The method of any one of Alternative Implementations 51 to 55, wherein the distance on the at least one plot between the respective cell and the one or more reference points is a Euclidian distance or a Mahalanobis distance. [0122] Alternative Implementation 57. The method of any one of Alternative Implementations 1 to 56, wherein the data includes, for each respective cell, a difference between (i) a value of at least one parameter of the one or more measurable parameters for the respective cell and (ii) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes, and wherein the one or more machine learning models are trained to, based on the difference for each respective cell and each of the at least one parameter, place the respective cell into one of the plurality of cell classes. [0123] Alternative Implementation 58. The method of any one of Alternative Implementations 1 to 57, wherein the data includes, for each respective cell, a distance on a plot between the respective cell and one or more reference points, each of the one or more reference points corresponding to a combination of an average value of a first parameter and an average value of a second parameter for a distinct cell class of a plurality of cell classes, and wherein the one or more 23 4878-2377-2093.1
machine learning models are trained to, based on the distance between the respective cell and the one or more reference points, place the respective cell into one of the plurality of cell classes. [0124] Alternative Implementation 59. A method for classifying cells, the method comprising: receiving data associated with a cell sample of an individual, the cell sample including a plurality of cells, the data including, for each respective cell, information associated with at least a first measurable parameter and a second measurable parameter; for each respective cell, determining, based on the received data, a value of the first measurable parameter and a value of the second measurable parameter; determining, for each respective cell, a distinct cell class a plurality of cell classes based on (i) a difference between the value of the first measurable parameter for the respective cell and an average value of the first measurable parameter for each of the plurality of cell classes and (ii) a difference between the value of the second measurable parameter for the respective cell and an average value of the second measurable parameter for each of the plurality of cell classes; automatically generating a report based on the determined cell class for each respective cell of the plurality of cells; and transmitting the report to the individual, a healthcare provider of the individual, or both. [0125] Alternative Implementation 60. The method of Alternative Implementation 59, wherein the report includes a graphical representation of the determined cell class for each respective cell of the plurality of cells, a text description of the determined cell class for each respective cell of the plurality of cells, a recommendation for one or more clinical tests for the individual to undergo, a diagnosis for the individual, or any combination thereof. [0126] Alternative Implementation 61. A method for classifying cells, the method comprising receiving data associated with a cell sample of an individual, the cell sample including a plurality of cells, the data including, for each respective cell, information associated with a plurality of measurable parameters; for each respective cell, determining, based on the received data, a value of a first measurable parameter and a value of a second measurable parameter; determining, for each respective cell, a distinct cell class a plurality of cell classes to which the respective cell belongs, the determining being based on (i) a difference between the value of the first measurable parameter for the respective cell and an average value of the first measurable parameter for each of the plurality of cell classes and (ii) a difference between the value of the second measurable parameter for the respective cell and an average value of the second measurable parameter for each of the plurality of cell classes; in response to determining that at least a portion of the plurality of 24 4878-2377-2093.1
cells belong to a first cell class of the plurality of cell classes, determining, for each respective cell belonging to the first cell class, a value of a third measurable parameter and a value of a fourth measurable parameter; and determining, for each respective cell belonging to the first cell classes, a distinct cell sub-class of a plurality of cell sub-classes of the first cell class to which the respective cell of the first cell class belongs, the determining being based on (i) a difference between the value of the third measurable parameter for the respective cell and an average value of the third measurable parameter for each of the plurality of cell sub-classes of the first cell class and (ii) a difference between the value of the fourth measurable parameter for the respective cell and an average value of the fourth measurable parameter for each of the plurality of cell sub-classes of the first cell class. [0127] Alternative Implementation 62. A system for classifying one or more cells comprising: a memory device having stored thereon machine-readable instructions; and a control system including one or more processors configured to execute the machine-readable instructions to implement the method of any one of Alternative Implementations 1 to 61. [0128] Alternative Implementation 63. A system for classifying one or more cells, the system comprising a control system configured to implement the method of any one of Alternative Implementations 1 to 61. [0129] Alternative Implementation 64. A computer program product comprising instructions which, when executed by a computer, cause the computer to carry out the method of any one of Alternative Implementations 1 to 61. [0130] Alternative Implementation 65. The computer program product of Alternative Implementation 64, wherein the computer program product is a non-transitory computer readable medium. [0131] While the present disclosure has been described with reference to one or more particular embodiments or implementations, those skilled in the art will recognize that many changes may be made thereto without departing from the spirit and scope of the present disclosure. Each of these implementations and obvious variations thereof is contemplated as falling within the spirit and scope of the present disclosure. It is also contemplated that additional implementations according to aspects of the present disclosure may combine any number of features from any of the implementations described herein. 25 4878-2377-2093.1
4878-2377-2093.1
Claims
CLAIMS WHAT IS CLAIMED IS: 1. A method for classifying cells, the method comprising: receiving data associated with one or more cells of an individual, the data including, for each respective cell, information associated with one or more measurable parameters of the respective cell; inputting at least a portion of the data into one or more machine learning models; and receiving, from the one or more machine learning models, an indication of a cell class of a plurality of cell classes to which each respective cell belongs.
2. The method of claim 1, where the data associated with the one or more cells includes flow cytometry data, mass cytometry data, single cell RNA-sequencing data, multiplex imaging data, data associated with a single cell analysis technique, or any combination thereof.
3. The method of claim 1 or claim 2, wherein the data associated with the one or more cells includes, for each respective cell, a difference between (i) a value of at least one parameter of the one or more measurable parameters for the respective cell and (ii) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes.
4. The method of claim 3, wherein the at least one parameter includes a first parameter and a second parameter.
5. The method of any one of claims 1 to 4, wherein the data associated with the one or more cells includes, for each respective cell, a distance on at least one plot between (i) the respective cell and (ii) one or more reference points, each respective reference point corresponding to a distinct cell class of the plurality of cell classes, each of the plots being associated with at least a first parameter of the one or more measurable parameters and a second parameter of the measurable parameters. 27 4878-2377-2093.1
6. The method of claim 5, wherein, for each plot, each respective reference point corresponds to a combination of average values of at least the first parameter and the second parameter for the corresponding cell class.
7. The method of any one of claims 4 to 6, wherein the first parameter is a parameter associated with scattering of light caused by the one or cells, and the second parameter is a parameter associated with a presence of a biomarker in the one or more cells.
8. The method of any one of claims 4 to 6, wherein the first parameter a parameter associated with scattering of light caused by the one or cells, and the second parameter is an additional parameter associated with scattering of light caused by the one or more cells.
9. The method of any one of claims 4 to 6, wherein the first parameter is a parameter associated with a presence of a first biomarker in the one or more cells, and the second parameter is an additional parameter associated with a presence of a biomarker in the one or more cells.
10. The method of any one of claims 7 to 9, wherein the parameter associated with scattering of light caused by the respective cell includes a forward scatter amount, a side scatter amount, a forward scatter time-of-flight, or any combination thereof.
11. The method of claim 10, wherein the forward scatter amount includes a forward scatter area, a forward scatter angle, or both.
12. The method of claim 10 or claim 11, wherein the side scatter amount includes a side scatter area, a side scatter angle, or both.
13. The method of any one of claims 9 to 12, wherein the one or more parameters associated with the biomarker of the respective cell includes a presence of a predetermined molecule in the respective cell, an amount of the predetermined molecule in the respective cell, or both. 28 4878-2377-2093.1
14. The method of claim 13, wherein the one or more parameters associated with the biomarker of the respective cell includes an intensity of fluorescent emission from the respective cell, a color of fluorescent emission from the respective cell, or both.
15. The method of claim 13 or claim 14, wherein the predetermined molecule includes a CD2 molecule, a CD3 molecule, a CD4 molecule, CD5 molecule, a CD7 molecule, a CD8 molecule, a CD10 molecule, a CD11 molecule, a CD13 molecule, a CD14 molecule, a CD16 molecule, a CD19 molecule, a CD20 molecule, a CD22 molecule, a CD23 molecule, a CD33 molecule, a CD34 molecule, a CD45 molecule, a CD64 molecule, a CD117 molecule, an HLA-DR molecule, a cellular marker usable for identification of the respective cell or any combination thereof.
16. The method of any one of claims 13 to 15, wherein the predetermined molecule includes a cluster of differentiation molecule, an antigen, an antibody, an immunoglobulin chain, or any combination thereof.
17. The method of claim 16, wherein the immunoglobulin chain includes a kappa (κ) light chain, a lambda (λ) light chain, a gamma (γ) heavy chain, a delta (δ) heavy chain, an alpha (α) heavy chain, a mu (μ) heavy chain, an epsilon (^) heavy chain, or any combination thereof.
18. The method of any one of claims 3 to 17, wherein the one or more distinct cell classes includes a first cell class and a second cell class.
19. The method of any one of claims 5 to 17, wherein the at least one plot includes a plot with a first reference point corresponding to a first cell class and a second reference point corresponding to a second cell class.
20. The method of claim 18 or claim 19, wherein the first cell class includes lymphocytes, and the second cell class includes granulocytes and monocytes. 29 4878-2377-2093.1
21. The method of any one of claims 18 to 20, wherein the first parameter of the plot is associated with a presence of a CD3 molecule in the one or more cells, and the second parameter of the plot is associated with a presence of a CD19 molecule in the one or more cells.
22. The method of claim 18, wherein the one or more distinct cell classes further includes a third cell class.
23. The method of claim 19, wherein the plot further includes a third reference point corresponding to a third cell class.
24. The method of claim 22 or claim 23, wherein the first cell class includes lymphocytes, the second cell class includes granulocytes, and the third cell class includes monocytes.
25. The method of any one of claims 22 to 24, wherein the first parameter of the plot is associated with a presence of a CD45 molecule in the one or more cells, and the second parameter is associated with an amount of side scatter caused by the one or more cells.
26. The method of claim 22 or claim 23, wherein the first cell class includes granulocytes and monocytes, the second cell class includes T-cells, and the third cell class includes B-cells.
27. The method of any one of claims 22, 23, and 26, wherein the first parameter of the plot is associated with a presence of a CD3 molecule in the one or more cells, and the second parameter is associated with a presence of a CD19 molecule in the one or more cells.
28. The method of any one of claims 3 to 27, wherein the first parameter is associated with a presence of a CD45 molecule in the one or more cells, and the second parameter is associated with an amount of side scatter caused by the one or more cells.
29. The method of claim 28, wherein the one or more distinct cell classes includes a first cell class that includes lymphocytes, a second cell class that includes granulocytes, and a third cell class that includes monocytes. 30 4878-2377-2093.1
30. The method of claim 28, wherein the plot includes a first reference point corresponding to lymphocytes, a second reference point corresponding to granulocytes, and a third reference point corresponding to monocytes.
31. The method of any one of claims 3 to 30, wherein the at least one plot includes a plot where the first parameter is associated with a presence of a CD45 molecule in the one or more cells, and the second parameter is associated with a presence of a CD3 molecule in the one or more cells.
32. The method of claim 31, wherein the one or more distinct cell classes includes a first cell class that includes granulocytes and monocytes, and a second cell class that includes lymphocytes.
33. The method of claim 31, wherein the one or more distinct cell classes includes a first cell class that includes granulocytes and monocytes, a second cell class that includes T-cells, and a third cell class that includes B-cells.
34. The method of claim 31, wherein the plot includes a first reference point corresponding to granulocytes and monocytes and a second reference point corresponding to lymphocytes.
35. The method of claim 31, wherein the plot includes a first reference point corresponding to granulocytes and monocytes, a second reference point corresponding to T-cells, and a third reference point corresponding to B-cells.
36. The method of any one of claims 1 to 28, wherein the one or more machine learning models is trained to sort each respective cell into at least one cell class of the plurality of cell classes.
37. The method of claim 36, wherein the one or more machine learning models is trained to sort the one or more cells into the at least one of the plurality of cell classes based at least in part on, for each respective cell, the difference between (i) a value of at least one parameter of the one or more measurable parameters for the respective cell and (ii) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes. 31 4878-2377-2093.1
38. The method of claim 36, wherein the one or more machine learning models is trained to sort each respective cell into the at least one of the plurality of cells classes based at least in part on, for each respective cell, the distance on the at least one plot between the respective cell and the one or more reference points.
39. The method of any one of claims 1 to 38, wherein the one or more machine learning models is trained to: determine whether each respective cell belongs to a first cell class of the plurality of cell classes or a second cell class of the plurality of cell classes; and determine whether each respective cell belonging to the first cell class of the plurality of cell classes belongs to a first sub-class or a second sub-class.
40. The method of claim 39, wherein the first cell class includes lymphocytes, and the second cell class includes granulocytes and monocytes.
41. The method of claim 39 or claim 40, wherein the first sub-class includes T-cells and the second sub-class includes B-cells.
42. The method of any one of claims 1 to 41, wherein the one or more machine learning models is trained to: determine, for each respective cell, a value of at least a first parameter of the one or more measurable parameters and a second parameter of the one or more measurable parameters; determine, for each respective cell, a difference between (i) a value of at least one parameter of the one or more measurable parameters for the respective cell, and (ii) ab average value of the at least one parameter for each of one or more distinct cell classes; and based on the determined difference for each respective cell, place the respective cell into one of the plurality of cell classes. 32 4878-2377-2093.1
43. The method of any one of claims 1 to 41, wherein the one or more machine learning models is trained to: determine, for each respective cell, a value of at least a first parameter of the one or more measurable parameters and a second parameter of the one or more measurable parameters; determine, for each respective cell, a distance on a plot between the respective cell and one or more reference points, each of the one or more reference points corresponding to a combination of an average value of the first parameter and an average value of the second parameter for a distinct cell class of a plurality of cell classes; and based on the distance between the respective cell and the one or more reference points, place the respective cell into one of the plurality of cell classes.
44. The method of any one of claims 1 to 34, wherein the one or more machine learning models includes a random forest model configured to determine the cell class to which each respective cell belongs.
45. The method of claim 44, wherein the random forest model has a plurality of decision trees and a voting module, each of the plurality of decision trees being configured to generate an independent determination of the cell class to which reach respective cell belongs, the voting module being configured to select the independent determination of one of the plurality of decision trees as a final determination of the cell class to which the respective cell belongs.
46. The method of any one of claims 1 to 36, wherein the one or more machine learning model is trained using a training data set, the training data set including (i) data associated with a plurality a cells other than the one or more cells, and (ii) for each respective cell of the plurality of cells, a determination of which cell class of the plurality of cell classes the respective cell belongs to.
47. The method of claim 46, wherein the data associated with the plurality of cells includes flow cytometry data, mass cytometry data, single cell RNA-sequencing data, multiplex imaging data, data associated with a single cell analysis technique, or any combination thereof. 33 4878-2377-2093.1
48. The method of any one of claims 1 to 47, wherein the one or more machine learning model is trained using a training data set, the training data set including, for each respective cell of a plurality of cells, (i) a difference between (a) a value of at least one parameter of the one or more measurable parameters for the respective cell and (b) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes, and (ii) a determination of which cell class of the plurality of cell classes the respective cell belongs to.
49. The method of claim 48, wherein the at least one parameter includes at least a first parameter and a second parameter.
50. The method of any one of claims 1 to 47, wherein the one or more machine learning model is trained using a training data set, the training data set including (i) for each respective cell of a plurality of cells, a distance on at least one plot between the respective cell and one or more reference points, and (ii) for each respective cell of the plurality of cells, a determination of which cell class of the plurality of cell classes the respective cell belongs to.
51. The method of claim 50, wherein each plot is associated with at least a first parameter of the one or more measurable parameters and a second parameter of the measurable parameters, and wherein, for each plot, each respective reference point corresponds to a combination of average values of at least the first parameter and the second parameter for a corresponding cell class of the plurality of cell classes.
52. The method of any one of claims 1 to 51, further comprising: analyzing the indication of the cell class to which each of the one or more cells belongs; and based at least in part on the analysis, generating a graphical representation of the cell classes of the one or more cells.
53. The method of any one of claims 1 to 52, further comprising: analyzing the indication of the cell class to which each of the one or more cells belongs; and 34 4878-2377-2093.1
based at least in part on the analysis, generating a text description of the cell classes of the one or more cells.
54. The method of any one of claims 1 to 53, further comprising: analyzing the indication of the cell class to which each of the one or more cells belongs; and based at least in part on the analysis, generating a recommendation for one or more clinical tests for the individual to undergo.
55. The method of any one of claims 1 to 54, further comprising: analyzing the indication of the cell class to which each of the one or more cells belongs; and based at least in part on the analysis, generating a diagnosis for the individual.
56. The method of any one of claims 51 to 55, wherein the distance on the at least one plot between the respective cell and the one or more reference points is a Euclidian distance or a Mahalanobis distance.
57. The method of any one of claims 1 to 56, wherein the data includes, for each respective cell, a difference between (i) a value of at least one parameter of the one or more measurable parameters for the respective cell and (ii) an average value of the at least one parameter for each of one or more distinct cell classes of the plurality of cell classes, and wherein the one or more machine learning models are trained to, based on the difference for each respective cell and each of the at least one parameter, place the respective cell into one of the plurality of cell classes.
58. The method of any one of claims 1 to 57, wherein the data includes, for each respective cell, a distance on a plot between the respective cell and one or more reference points, each of the one or more reference points corresponding to a combination of an average value of a first parameter and an average value of a second parameter for a distinct cell class of a plurality of cell classes, and wherein the one or more machine learning models are trained to, based on the distance 35 4878-2377-2093.1
between the respective cell and the one or more reference points, place the respective cell into one of the plurality of cell classes.
59. A method for classifying cells, the method comprising: receiving data associated with a cell sample of an individual, the cell sample including a plurality of cells, the data including, for each respective cell, information associated with at least a first measurable parameter and a second measurable parameter; for each respective cell, determining, based on the received data, a value of the first measurable parameter and a value of the second measurable parameter; determining, for each respective cell, a distinct cell class a plurality of cell classes based on (i) a difference between the value of the first measurable parameter for the respective cell and an average value of the first measurable parameter for each of the plurality of cell classes and (ii) a difference between the value of the second measurable parameter for the respective cell and an average value of the second measurable parameter for each of the plurality of cell classes; automatically generating a report based on the determined cell class for each respective cell of the plurality of cells; and transmitting the report to the individual, a healthcare provider of the individual, or both.
60. The method of claim 59, wherein the report includes a graphical representation of the determined cell class for each respective cell of the plurality of cells, a text description of the determined cell class for each respective cell of the plurality of cells, a recommendation for one or more clinical tests for the individual to undergo, a diagnosis for the individual, or any combination thereof.
61. A method for classifying cells, the method comprising: receiving data associated with a cell sample of an individual, the cell sample including a plurality of cells, the data including, for each respective cell, information associated with a plurality of measurable parameters; for each respective cell, determining, based on the received data, a value of a first measurable parameter and a value of a second measurable parameter; 36 4878-2377-2093.1
determining, for each respective cell, a distinct cell class a plurality of cell classes to which the respective cell belongs, the determining being based on (i) a difference between the value of the first measurable parameter for the respective cell and an average value of the first measurable parameter for each of the plurality of cell classes and (ii) a difference between the value of the second measurable parameter for the respective cell and an average value of the second measurable parameter for each of the plurality of cell classes; in response to determining that at least a portion of the plurality of cells belong to a first cell class of the plurality of cell classes, determining, for each respective cell belonging to the first cell class, a value of a third measurable parameter and a value of a fourth measurable parameter; and determining, for each respective cell belonging to the first cell classes, a distinct cell sub- class of a plurality of cell sub-classes of the first cell class to which the respective cell of the first cell class belongs, the determining being based on (i) a difference between the value of the third measurable parameter for the respective cell and an average value of the third measurable parameter for each of the plurality of cell sub-classes of the first cell class and (ii) a difference between the value of the fourth measurable parameter for the respective cell and an average value of the fourth measurable parameter for each of the plurality of cell sub-classes of the first cell class.
62. A system for classifying one or more cells comprising: a memory device having stored thereon machine-readable instructions; and a control system including one or more processors configured to execute the machine- readable instructions to implement the method of any one of claims 1 to 61.
63. A system for classifying one or more cells, the system comprising a control system configured to implement the method of any one of claims 1 to 61.
64. A computer program product comprising instructions which, when executed by a computer, cause the computer to carry out the method of any one of claims 1 to 61. 37 4878-2377-2093.1
65. The computer program product of claim 64, wherein the computer program product is a non-transitory computer readable medium. 38 4878-2377-2093.1
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202363503411P | 2023-05-19 | 2023-05-19 | |
US63/503,411 | 2023-05-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024243056A1 true WO2024243056A1 (en) | 2024-11-28 |
Family
ID=93589793
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2024/030003 WO2024243056A1 (en) | 2023-05-19 | 2024-05-17 | Systems and methods for classifying cells using distance metrics |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024243056A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022053624A1 (en) * | 2020-09-10 | 2022-03-17 | Oxford NanoImaging Limited | Cell classification algorithms, and use of such algorithms to inform and optimise medical treatments |
US20220335736A1 (en) * | 2021-04-16 | 2022-10-20 | Hamid Reza TIZHOOSH | Systems and methods for automatically classifying cell types in medical images |
US20220392613A1 (en) * | 2019-08-30 | 2022-12-08 | Juno Therapeutics, Inc. | Machine learning methods for classifying cells |
-
2024
- 2024-05-17 WO PCT/US2024/030003 patent/WO2024243056A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220392613A1 (en) * | 2019-08-30 | 2022-12-08 | Juno Therapeutics, Inc. | Machine learning methods for classifying cells |
WO2022053624A1 (en) * | 2020-09-10 | 2022-03-17 | Oxford NanoImaging Limited | Cell classification algorithms, and use of such algorithms to inform and optimise medical treatments |
US20220335736A1 (en) * | 2021-04-16 | 2022-10-20 | Hamid Reza TIZHOOSH | Systems and methods for automatically classifying cell types in medical images |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8682810B2 (en) | Method and system for analysis of flow cytometry data using support vector machines | |
US20160169786A1 (en) | Automated flow cytometry analysis method and system | |
Qian et al. | Elucidation of seventeen human peripheral blood B‐cell subsets and quantification of the tetanus response using a density‐based method for the automated identification of cell populations in multidimensional flow cytometry data | |
US20240044904A1 (en) | System, method, and article for detecting abnormal cells using multi-dimensional analysis | |
US9880155B2 (en) | System, method, and article for detecting abnormal cells using multi-dimensional analysis | |
EP3867625A1 (en) | Adaptive sorting for particle analyzers | |
US12130223B2 (en) | Optimized sorting gates | |
Costa et al. | A new automated flow cytometry data analysis approach for the diagnostic screening of neoplastic B-cell disorders in peripheral blood samples with absolute lymphocytosis | |
EP3230887A1 (en) | Automated flow cytometry analysis method and system | |
WO2024243056A1 (en) | Systems and methods for classifying cells using distance metrics | |
Khowawisetsut et al. | Data analysis and presentation in flow cytometry. | |
US20240038338A1 (en) | System and method for automated flow cytometry data analysis and interpretation | |
Pura et al. | Team: A multiple testing algorithm on the aggregation tree for flow cytometry analysis | |
US20240192210A1 (en) | Systems and methods for comprehensive and standardized immune system phenotyping and automated cell classification | |
Bashashati et al. | A pipeline for automated analysis of flow cytometry data: preliminary results on lymphoma sub-type diagnosis | |
Xu | Machine Learning for Flow Cytometry Data Analysis | |
WO2024196932A1 (en) | Systems and methods for classifying cells | |
Zhang et al. | An automatic analysis and quality assurance method for lymphocyte subset identification | |
CN119265307A (en) | Natural killer cell marker, screening device and application | |
Mohamed | Using Probability Binning and Bayesian Inference to measure Euclidean Distance of Flow Cytometric data | |
MATTON | Automating flow cytometry data analysis using clustering techniques | |
Van Gassen | Development of machine learning techniques for flow cytometry data | |
Thairu | Quality control and analysis of ow cytometry data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24811684 Country of ref document: EP Kind code of ref document: A1 |