CN109101783B - Cancer network marker determination method and system based on probability model - Google Patents
Cancer network marker determination method and system based on probability model Download PDFInfo
- Publication number
- CN109101783B CN109101783B CN201810920673.7A CN201810920673A CN109101783B CN 109101783 B CN109101783 B CN 109101783B CN 201810920673 A CN201810920673 A CN 201810920673A CN 109101783 B CN109101783 B CN 109101783B
- Authority
- CN
- China
- Prior art keywords
- sample
- gene
- disease
- likelihood
- normal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a cancer network marker determination method and a system based on a probability model, wherein the method comprises the following steps: converting all the obtained gene expression data matrixes of the normal samples and the disease samples into likelihood matrixes by using a probability density function, and constructing a normal sample distribution function according to all the likelihood matrixes of the normal samples; and then each element in the likelihood matrix of each disease sample is brought into a normal sample distribution function, a significant difference gene set of each disease sample is determined, the significant difference gene set of each disease sample is mapped into a protein-protein interaction network, and a network marker of each disease sample is determined. By applying the method or the system provided by the invention, the cancer network markers can be accurately and effectively obtained, and the cancer network markers are utilized to classify the subtype of the disease, so that the accurate diagnosis and treatment of the disease are realized.
Description
Technical Field
The invention relates to the technical field of gene detection, in particular to a cancer network marker determination method and system based on a probability model.
Background
Research has shown that the development of cancer is the result of the co-action of multiple genes. Because the traditional gene expression profile data has the defects of large noise, few samples, unbalanced positive and negative samples and the like, the combination of the expression profile data and the biological network and the determination of the cancer network marker become a potential solution idea. Meanwhile, compared with the traditional single-gene marker, the network marker has higher efficiency and stability.
Disclosure of Invention
The invention provides a cancer network marker determination method and system based on a probability model on the basis of considering heterogeneity among samples and difference of diseases among different patients due to different factors such as pathogenesis and the like. The invention can accurately and effectively obtain the cancer network markers, and classify diseases by using the cancer network markers so as to realize accurate diagnosis and treatment of the diseases.
In order to achieve the purpose, the invention provides the following scheme:
a cancer network marker determination method based on a probabilistic model, the cancer network marker determination method comprising:
acquiring gene expression data matrixes of a plurality of normal samples and a plurality of disease samples; elements in the gene expression data matrix are gene expression amount;
converting the gene expression data matrixes of all the normal samples into normal sample likelihood matrixes and converting the gene expression data matrixes of all the disease samples into disease sample likelihood matrixes by using a probability density function; elements in the normal sample likelihood matrix and the disease sample likelihood matrix are both gene likelihoods;
constructing a normal sample distribution function according to all the normal sample likelihood degree matrixes;
sequentially substituting each element in each disease sample likelihood matrix into the normal sample distribution function, and determining a significant difference gene set of each disease sample;
and mapping the significant difference gene set of each disease sample into a protein-protein interaction network in turn, and determining the network marker of each disease sample.
Optionally, the cancer network marker determination method further comprises:
classifying the disease samples into different subtypes according to the network markers of each disease sample and the known cancer subtype prior data.
Optionally, the transforming, by using a probability density function, the gene expression data matrices of all the normal samples into normal sample likelihood matrices, and transforming the gene expression data matrices of all the disease samples into disease sample likelihood matrices specifically includes:
constructing a gene likelihood calculation model by using a probability density function; the expression of the gene likelihood calculation model isWherein λ isiRepresenting the likelihood of gene i;expressing the expression level of the ith gene in the jth sample; f. ofi 1Represents the normal distribution curve of the gene i under the disease sample; f. ofi 2Represents a normal distribution curve of the gene i under a normal sample;
and converting the gene expression data matrixes of all the normal samples into normal sample likelihood matrixes and converting the gene expression data matrixes of all the disease samples into disease sample likelihood matrixes according to the gene likelihood calculation model.
Optionally, the constructing a normal sample distribution function according to all the normal sample likelihood matrices specifically includes:
calculating the mean value and the variance of each gene likelihood according to all the normal sample likelihood matrixes;
and constructing a normal distribution function of each gene likelihood under a normal sample according to the mean value and the variance of the gene likelihood.
Optionally, the step of sequentially substituting each element in each disease sample likelihood matrix into the normal sample distribution function to determine a significant difference gene set of each disease sample includes:
sequentially bringing each element in the disease sample likelihood matrix into the normal sample distribution function, and calculating the probability value of each gene in each disease sample;
judging whether the probability value is less than or equal to a set threshold value or not;
if yes, determining the genes corresponding to the probability values smaller than or equal to the set threshold value as the significant difference genes of the disease sample.
Optionally, the mapping the significantly different gene sets of each disease sample to a protein-protein interaction network in sequence to determine a network marker of each disease sample specifically includes:
and mapping the significant difference gene sets of the disease samples to a protein-protein interaction network in sequence, and determining five genes with the maximum number of screened connecting genes and first-order neighbor nodes of the five genes as network markers of the disease samples according to the correlation action relationship among the genes.
The present invention also provides a cancer network marker determination system based on a probabilistic model, the cancer network marker determination system comprising:
the gene expression data matrix acquisition module is used for acquiring gene expression data matrixes of a plurality of normal samples and a plurality of disease samples; elements in the gene expression data matrix are gene expression amount;
the gene expression data matrix conversion module is used for converting the gene expression data matrixes of all the normal samples into normal sample likelihood matrixes and converting the gene expression data matrixes of all the disease samples into disease sample likelihood matrixes by using a probability density function; elements in the normal sample likelihood matrix and the disease sample likelihood matrix are both gene likelihoods;
the normal sample distribution function building module is used for building a normal sample distribution function according to all the normal sample likelihood matrixes;
the significant difference gene set determining module is used for sequentially substituting each element in the likelihood matrix of each disease sample into the normal sample distribution function to determine a significant difference gene set of each disease sample;
and the network marker determining module is used for mapping the significant difference gene set of each disease sample into a protein-protein interaction network in sequence and determining the network marker of each disease sample.
Optionally, the cancer network marker determination system further comprises:
and the disease subtype classification module is used for classifying different subtypes of the disease samples according to the network markers of each disease sample and known cancer subtype prior data.
Optionally, the gene expression data matrix transformation module specifically includes:
the gene likelihood calculation model building unit is used for building a gene likelihood calculation model by utilizing a probability density function; the expression of the gene likelihood calculation model isWherein λ isiRepresenting the likelihood of gene i;expressing the expression level of the ith gene in the jth sample; f. ofi 1Represents the normal distribution curve of the gene i under the disease sample; f. ofi 2Represents a normal distribution curve of the gene i under a normal sample;
and the transformation unit is used for transforming the gene expression data matrixes of all the normal samples into normal sample likelihood matrixes and transforming the gene expression data matrixes of all the disease samples into disease sample likelihood matrixes according to the gene likelihood calculation model.
Optionally, the significantly different gene set determining module specifically includes:
the probability value calculating unit is used for sequentially substituting each element in the disease sample likelihood matrix into the normal sample distribution function and calculating the probability value of each gene in each disease sample;
the judging unit is used for judging whether the probability value is less than or equal to a set threshold value or not;
and the significant difference gene set determining unit is used for determining the genes corresponding to the probability values which are less than or equal to the set threshold value as the significant difference genes of the disease sample.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a cancer network marker determining method and a system based on a probability model, wherein the cancer network marker determining method comprises the following steps: acquiring gene expression data matrixes of a plurality of normal samples and a plurality of disease samples, and converting the gene expression data matrixes of all the normal samples into normal sample likelihood matrixes and converting the gene expression data matrixes of all the disease samples into disease sample likelihood matrixes by using a probability density function; then, according to all the normal sample likelihood matrixes, a normal sample distribution function is constructed, each element in each disease sample likelihood matrix is sequentially brought into the normal sample distribution function, and a significant difference gene set of each disease sample is determined; and finally, mapping the significant difference gene set of each disease sample to a protein-protein interaction network in sequence, and determining the network marker of each disease sample. By applying the method or the system provided by the invention, the cancer network markers can be accurately and effectively obtained, and the cancer network markers are utilized to classify the subtype of the disease, so that the accurate diagnosis and treatment of the disease are realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a schematic flowchart of a cancer network marker determination method based on a probabilistic model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the present invention based on probabilistic model for determining cancer network markers;
FIG. 3 is a schematic diagram of a network marker selected by the present invention;
FIG. 4 is a graph of the relationship of individual subtype partial markers obtained for cancer UCEC;
fig. 5 is a graph of the result of subtype classification of cancer UCEC;
FIG. 6 is a sample number distribution graph of individual subtypes of cancer UCEC;
fig. 7 is a graph of survival for individual subtypes of cancer UCEC;
fig. 8 is a schematic structural diagram of a cancer network marker determination system based on a probabilistic model according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a cancer network marker determination method and system based on a probability model on the basis of considering heterogeneity among samples and difference of diseases among different patients due to different factors such as pathogenesis and the like. The invention can accurately and effectively obtain the cancer network markers, and classify diseases by using the cancer network markers so as to realize accurate diagnosis and treatment of the diseases.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
To overcome data noise, the present invention assumes that the expression profile data for each gene in a particular population or phenotype follows a normal distribution. Based on this assumption, the original gene expression profile data matrix can be converted into a likelihood matrix. The invention determines the significant difference genes in each disease sample through the likelihood matrix, and projects the significant difference genes into a protein-protein interaction (PPI) network to obtain the network marker of each disease sample.
Because of different factors such as the causes of diseases, the same disease is different among different patients, and the traditional disease classification can not well represent all disease samples. Therefore, a more exhaustive sub-classification of these classical diseases is of great biological importance in disease diagnosis and treatment. And integrating the markers of all the disease samples together to obtain an integrated likelihood matrix about the cancer markers, and classifying the disease samples into different subtypes by using the ConsensussCluster plus method of the R language in combination with the existing cancer subtype information.
Based on the above, the main idea of the present invention is to introduce probability density function and combine with the idea of single sample, to screen the network markers of each disease sample, and to classify different subtypes of cancer by using the markers specific to these samples and the clinical information of the samples.
Fig. 1 is a schematic flowchart of a cancer network marker determining method based on a probabilistic model according to an embodiment of the present invention, and as shown in fig. 1, the cancer network marker determining method based on the probabilistic model according to an embodiment of the present invention includes the following steps.
Step 101: acquiring gene expression data matrixes of a plurality of normal samples and a plurality of disease samples; the elements in the gene expression data matrix are gene expression levels.
Step 102: converting the gene expression data matrix of each normal sample into a normal sample likelihood matrix by using a probability density function, and converting the gene expression data matrix of each disease sample into a disease sample likelihood matrix; elements in the normal sample likelihood matrix and the disease sample likelihood matrix are both gene likelihoods.
Step 103: and constructing a normal sample distribution function according to all the normal sample likelihood degree matrixes.
Step 104: and substituting each element in each disease sample likelihood matrix into the normal sample distribution function to determine a significant difference gene set of each disease sample.
Step 105: mapping the significant difference gene set of each disease sample into a protein-protein interaction network, and determining network markers of each disease sample.
Step 106: classifying the disease samples into different subtypes according to the network markers of each disease sample and the known cancer subtype prior data.
Wherein The data in The gene expression data matrix for The disease sample in step 101 is obtained from The Cancer Genoatlas (TCGA) database.
Step 102 specifically includes:
constructing a gene likelihood calculation model by using a probability density function; the expression of the gene likelihood calculation model isWherein λ isiRepresenting the likelihood of gene i;expressing the expression quantity of the ith gene in the jth sample, wherein i is the gene number, and j is the sample number; f. ofi 1Represents the normal distribution curve of the gene i under the disease sample; f. ofi 2Represents the normal distribution curve of gene i under normal samples, 1 and 2 represent disease and normal, respectively.
The method specifically comprises the following steps: respectively measuring the mean value and the variance of each gene expression quantity of the normal sample and the disease sample, and constructing a normal distribution curve f of each gene under the normal sample and the disease samplei 2And fi 1Wherein the normal distribution function isx is expression quantity, mu is mean value, sigma is standard deviation; then based on the normal distribution curve f of each gene under normal sample and disease samplei 2、fi 1And constructing a gene likelihood calculation model.
And converting the gene expression data matrix of each normal sample into a normal sample likelihood matrix according to the gene likelihood calculation model, and converting the gene expression data matrix of each disease sample into a disease sample likelihood matrix.
Step 103 specifically comprises:
and calculating the mean value and the variance of each gene likelihood according to all the normal sample likelihood matrixes.
And constructing a distribution function of each gene likelihood under a normal sample according to the mean value and the variance of the gene likelihood. The distribution function here is a normal distribution function.
Step 104 is to calculate a significantly different gene set for each disease sample based on the single sample concept. And (3) constructing a probability density function by using the normal samples in the likelihood matrix, and comparing whether each gene is obviously different in the normal samples or not for each disease sample, thereby screening the obviously different genes.
Step 104 specifically includes:
and substituting each element in the disease sample likelihood matrix into the normal sample distribution function, and calculating the probability value p of each gene in each disease sample.
Judging whether the probability value p is less than or equal to a set threshold value or not; the threshold value here is set to 0.05.
If yes, determining the genes corresponding to the probability value p smaller than or equal to the set threshold value as the significant difference genes of the disease sample.
Protein-Protein Interaction (PPI) network information is obtained from the STRING database. The STRING database is a widely used and developed database for searching the interaction between proteins, and includes the direct physical interaction between the proteins verified by experiments, and the predicted results of the protein interaction mined from the PubMed abstract and other bioinformatics methods.
Step 105 specifically includes:
mapping the significant difference gene set of the disease sample to a protein-protein interaction network, and determining five genes with the largest number of screened connecting genes and first-order neighbor nodes of the five genes as network markers of the disease sample according to the correlation action relationship among the genes, thereby eliminating false positive parts from the difference genes, and avoiding the false positive condition of the obtained markers caused by the fact that gene expression data contains noise, the sample amount is small and positive and negative samples are unbalanced.
Step 106 specifically comprises classifying the disease samples into different subtypes by using a ConsensusClusterPlus method of R language through the prior knowledge of cancer network markers and cancer subtypes of each disease sample, and performing survival analysis on each obtained subtype by using clinical data information of the disease samples. Wherein, clinical data of disease samples are also obtained from TCGA database.
On the basis, researchers can carry out more intensive research on the acquisition of cancer markers and the classification of cancer subtypes by means of the concept, and realize accurate diagnosis and treatment of diseases on the basis.
The invention herein provides a specific data embodiment to exemplarily illustrate the present invention.
Fig. 2 is a schematic diagram of determining cancer network markers based on a probabilistic model according to the present invention, as shown in fig. 2, the details are as follows:
conversion of a calculated gene expression matrix to a likelihood matrix
TABLE 1 mRNA Gene expression matrix
Table 1 shows a matrix of mRNA gene expression, which contains 8 samples of information (n1, n2, n3, n4) indicating normal tissue samples and (d1, d2, d3, d4) indicating diseased tissue samples. g1, g2, g3, g4 and g5 represent the names of mRNAs, and the data in the table are gene expression data. The transformed likelihood matrix is then:
TABLE 2 likelihood matrix
Table 2 is a table relating to the likelihood matrix for these 8The 5 genes of the sample were determined separatelyThus obtaining a likelihood matrix, and the data in the table is transformed likelihood data.
Obtaining differentially expressed genes for each disease sample
Using the transformed likelihood matrix obtained by the mRNA gene expression matrix, assuming that the normal sample still obeys normal distribution at this time, counting whether the genes in each disease sample are significantly different in the normal sample, thereby obtaining a differential expression gene set (p <0.05) for each disease sample, as shown in table 3:
table 3 differential genes selected
As shown in table 3, for the four disease samples (d1, d2, d3, d4), it was examined whether each gene was significantly different in the normal sample (p <0.05), and the bolded data in the table indicates that the genes were significantly different in the corresponding samples.
Network marker acquisition
Since the difference gene obtained by the gene expression amount may have a false positive condition, the interaction relationship between the genes in the PPI is used to delete the false positive portion. In the network, if a certain gene is significantly different and many genes directly connected with the certain gene are different genes, the different genes are considered to be relatively stable and are used as cancer markers of a sample, the screening standard is that the genes with the top five connecting base factors in the different gene network and the first-order nodes connected with the genes are used as network markers, and the dark squares shown in fig. 3 are the screened network markers.
Classification of different subtypes of cancer
Classifying the endometrial cancer (UCEC) data into different subtypes according to the obtained network marker information of each disease sample and combining the existing cancer subtype knowledge and clinical data of the disease samples as shown in figure 4 to obtain a subtype classification result graph of the cancer UCEC and a number distribution graph of each subtype sample of the cancer UCEC as shown in figures 5 and 6, and further obtain a survival curve of each subtype of the cancer UCEC as shown in figure 7, wherein the survival difference between each subtype is characterized by a p value, and p <0.05 indicates that the cancer subtypes have larger difference.
In order to achieve the above object, the present invention also provides a cancer network marker determination system based on a probabilistic model.
Fig. 8 is a schematic structural diagram of a cancer network marker determining system based on a probabilistic model according to an embodiment of the present invention, and as shown in fig. 8, the cancer network marker determining system according to the embodiment of the present invention includes:
a gene expression data matrix obtaining module 100, configured to obtain gene expression data matrices of multiple normal samples and multiple disease samples; the elements in the gene expression data matrix are gene expression levels.
A gene expression data matrix transformation module 200, configured to transform the gene expression data matrix of each normal sample into a normal sample likelihood matrix and transform the gene expression data matrix of each disease sample into a disease sample likelihood matrix by using a probability density function; elements in the normal sample likelihood matrix and the disease sample likelihood matrix are both gene likelihoods.
A normal sample distribution function constructing module 300, configured to construct a normal sample distribution function according to all the normal sample likelihood matrices.
A significant difference gene set determining module 400, configured to bring each element in the likelihood matrix of each disease sample into the normal sample distribution function, and determine a significant difference gene set of each disease sample.
A network marker determination module 500 for mapping the significantly different gene sets of each of the disease samples into a protein-protein interaction network, determining network markers for each disease sample.
A disease subtype classification module 600 for classifying the disease samples into different subtypes according to the network markers of each disease sample and the known cancer subtype prior data.
The gene expression data matrix transformation module 200 specifically includes:
the gene likelihood calculation model building unit is used for building a gene likelihood calculation model by utilizing a probability density function; the expression of the gene likelihood calculation model isWherein λ isiRepresenting the likelihood of gene i;expressing the expression level of the ith gene in the jth sample; f. ofi 1Represents the normal distribution curve of the gene i under the disease sample; f. ofi 2Represents the normal distribution curve of gene i under normal samples.
And the transformation unit is used for transforming the gene expression data matrix of each normal sample into a normal sample likelihood matrix according to the gene likelihood calculation model, and transforming the gene expression data matrix of each disease sample into a disease sample likelihood matrix.
The significantly different gene set determining module 400 specifically includes:
and the probability value calculating unit is used for substituting each element in the disease sample likelihood matrix into the normal sample distribution function and calculating the probability value of each gene in each disease sample.
And the judging unit is used for judging whether the probability value is less than or equal to a set threshold value or not.
And the significant difference gene set determining unit is used for determining the genes corresponding to the probability values which are less than or equal to the set threshold value as the significant difference genes of the disease sample.
The invention provides a cancer network marker determination method and system based on a probability model, which are used for obtaining corresponding disease subtypes by classifying diseases on the basis that the diseases are different among different patients due to different factors such as pathogenesis and the like, and helping to better improve the diagnosis and treatment of the diseases. This plays a very important role in cancer network marker acquisition and cancer subtype classification. Compared with the traditional disease sample common cancer marker, the invention can obtain the specific network marker of each disease sample, can find the cancer subtype type to which each disease sample belongs, better realizes the accurate diagnosis and treatment of the disease, and has very important significance for screening mRNA playing a key role in the occurrence and development process of the disease and improving the diagnosis and treatment of the cancer.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.
Claims (9)
1. A method for determining cancer network markers based on a probabilistic model, the method comprising:
acquiring gene expression data matrixes of a plurality of normal samples and a plurality of disease samples; elements in the gene expression data matrix are gene expression amount;
converting the gene expression data matrixes of all the normal samples into normal sample likelihood matrixes and converting the gene expression data matrixes of all the disease samples into disease sample likelihood matrixes by using a probability density function; elements in the normal sample likelihood matrix and the disease sample likelihood matrix are both gene likelihoods;
constructing a normal sample distribution function according to all the normal sample likelihood degree matrixes;
sequentially substituting each element in each disease sample likelihood matrix into the normal sample distribution function, and determining a significant difference gene set of each disease sample;
and mapping the significant difference gene set of each disease sample into a protein-protein interaction network in turn, and determining the network marker of each disease sample.
2. The method for determining cancer network markers according to claim 1, wherein the transforming the gene expression data matrix of all the normal samples into a normal sample likelihood matrix and the transforming the gene expression data matrix of all the disease samples into a disease sample likelihood matrix by using a probability density function specifically comprises:
constructing a gene likelihood calculation model by using a probability density function; the expression of the gene likelihood calculation model isWherein λ isiRepresenting the likelihood of gene i;expressing the expression level of the ith gene in the jth sample; f. ofi 1Represents the normal distribution curve of the gene i under the disease sample; f. ofi 2Represents a normal distribution curve of the gene i under a normal sample;
and converting the gene expression data matrixes of all the normal samples into normal sample likelihood matrixes and converting the gene expression data matrixes of all the disease samples into disease sample likelihood matrixes according to the gene likelihood calculation model.
3. The method for determining cancer network markers according to claim 1, wherein the constructing a normal sample distribution function according to all the normal sample likelihood matrices specifically comprises:
calculating the mean value and the variance of each gene likelihood according to all the normal sample likelihood matrixes;
and constructing a normal distribution function of each gene likelihood under a normal sample according to the mean value and the variance of the gene likelihood.
4. The method for determining cancer network markers according to claim 1, wherein the step of sequentially substituting each element in the likelihood matrix of each disease sample into the distribution function of the normal sample to determine the significantly different gene set of each disease sample comprises:
sequentially bringing each element in the disease sample likelihood matrix into the normal sample distribution function, and calculating the probability value of each gene in each disease sample;
judging whether the probability value is less than or equal to a set threshold value or not;
if yes, determining the genes corresponding to the probability values smaller than or equal to the set threshold value as the significant difference genes of the disease sample.
5. The method for determining cancer network markers according to claim 1, wherein the step of sequentially mapping the significantly different gene sets of each disease sample into a protein-protein interaction network to determine the network markers of each disease sample comprises:
and mapping the significant difference gene sets of the disease samples to a protein-protein interaction network in sequence, and determining five genes with the maximum number of screened connecting genes and first-order neighbor nodes of the five genes as network markers of the disease samples according to the correlation action relationship among the genes.
6. A cancer network marker determination system based on a probabilistic model, the cancer network marker determination system comprising:
the gene expression data matrix acquisition module is used for acquiring gene expression data matrixes of a plurality of normal samples and a plurality of disease samples; elements in the gene expression data matrix are gene expression amount;
the gene expression data matrix conversion module is used for converting the gene expression data matrixes of all the normal samples into normal sample likelihood matrixes and converting the gene expression data matrixes of all the disease samples into disease sample likelihood matrixes by using a probability density function; elements in the normal sample likelihood matrix and the disease sample likelihood matrix are both gene likelihoods;
the normal sample distribution function building module is used for building a normal sample distribution function according to all the normal sample likelihood matrixes;
the significant difference gene set determining module is used for sequentially substituting each element in the likelihood matrix of each disease sample into the normal sample distribution function to determine a significant difference gene set of each disease sample;
and the network marker determining module is used for mapping the significant difference gene set of each disease sample into a protein-protein interaction network in sequence and determining the network marker of each disease sample.
7. The cancer network marker determination system of claim 6, further comprising:
and the disease subtype classification module is used for classifying different subtypes of the disease samples according to the network markers of each disease sample and known cancer subtype prior data.
8. The cancer network marker determination system of claim 6, wherein the gene expression data matrix transformation module specifically comprises:
the gene likelihood calculation model building unit is used for building a gene likelihood calculation model by utilizing a probability density function; the expression of the gene likelihood calculation model isWherein λ isiRepresenting the likelihood of gene i;expressing the expression level of the ith gene in the jth sample; f. ofi 1Represents the normal distribution curve of the gene i under the disease sample; f. ofi 2Represents a normal distribution curve of the gene i under a normal sample;
and the transformation unit is used for transforming the gene expression data matrixes of all the normal samples into normal sample likelihood matrixes and transforming the gene expression data matrixes of all the disease samples into disease sample likelihood matrixes according to the gene likelihood calculation model.
9. The cancer network marker determination system of claim 6, wherein the significantly different gene set determination module specifically comprises:
the probability value calculating unit is used for sequentially substituting each element in the disease sample likelihood matrix into the normal sample distribution function and calculating the probability value of each gene in each disease sample;
the judging unit is used for judging whether the probability value is less than or equal to a set threshold value or not;
and the significant difference gene set determining unit is used for determining the genes corresponding to the probability values which are less than or equal to the set threshold value as the significant difference genes of the disease sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810920673.7A CN109101783B (en) | 2018-08-14 | 2018-08-14 | Cancer network marker determination method and system based on probability model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810920673.7A CN109101783B (en) | 2018-08-14 | 2018-08-14 | Cancer network marker determination method and system based on probability model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109101783A CN109101783A (en) | 2018-12-28 |
CN109101783B true CN109101783B (en) | 2020-09-04 |
Family
ID=64849535
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810920673.7A Expired - Fee Related CN109101783B (en) | 2018-08-14 | 2018-08-14 | Cancer network marker determination method and system based on probability model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109101783B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110010204B (en) * | 2019-04-04 | 2022-12-02 | 中南大学 | Fusion network and multi-scoring strategy based prognostic biomarker identification method |
CN110444248B (en) * | 2019-07-22 | 2021-09-24 | 山东大学 | Cancer biomolecule marker screening method and system based on network topology parameters |
CN110797083B (en) * | 2019-09-18 | 2023-04-18 | 中南大学 | Biomarker identification method based on multiple networks |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103268431A (en) * | 2013-05-21 | 2013-08-28 | 中山大学 | Cancer hypotype biomarker detecting system based on student t distribution |
CN103473416A (en) * | 2013-09-13 | 2013-12-25 | 中国人民解放军国防科学技术大学 | Protein-protein interaction model building method and device |
WO2013192504A1 (en) * | 2012-06-22 | 2013-12-27 | The Trustees Of Dartmouth College | Novel vista-ig constructs and the use of vista-ig for treatment of autoimmune, allergic and inflammatory disorders |
CN105117617A (en) * | 2015-08-26 | 2015-12-02 | 大连海事大学 | Method for screening environmentally sensitive biomolecules |
CN106295246A (en) * | 2016-08-07 | 2017-01-04 | 吉林大学 | Find the lncRNA relevant to tumor and predict its function |
CN107025387A (en) * | 2017-03-29 | 2017-08-08 | 电子科技大学 | One kind is used for biomarker for cancer and knows method for distinguishing |
CN108181471A (en) * | 2017-12-15 | 2018-06-19 | 新疆医科大学第附属医院 | A kind of detection marker of dissection of aorta and marker appraisal procedure |
CN108345768A (en) * | 2017-01-20 | 2018-07-31 | 深圳华大生命科学研究院 | A kind of method and marker combination of determining infant's intestinal flora maturity |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180211013A1 (en) * | 2017-01-25 | 2018-07-26 | International Business Machines Corporation | Patient Communication Priority By Compliance Dates, Risk Scores, and Organizational Goals |
-
2018
- 2018-08-14 CN CN201810920673.7A patent/CN109101783B/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013192504A1 (en) * | 2012-06-22 | 2013-12-27 | The Trustees Of Dartmouth College | Novel vista-ig constructs and the use of vista-ig for treatment of autoimmune, allergic and inflammatory disorders |
CN103268431A (en) * | 2013-05-21 | 2013-08-28 | 中山大学 | Cancer hypotype biomarker detecting system based on student t distribution |
CN103473416A (en) * | 2013-09-13 | 2013-12-25 | 中国人民解放军国防科学技术大学 | Protein-protein interaction model building method and device |
CN105117617A (en) * | 2015-08-26 | 2015-12-02 | 大连海事大学 | Method for screening environmentally sensitive biomolecules |
CN106295246A (en) * | 2016-08-07 | 2017-01-04 | 吉林大学 | Find the lncRNA relevant to tumor and predict its function |
CN108345768A (en) * | 2017-01-20 | 2018-07-31 | 深圳华大生命科学研究院 | A kind of method and marker combination of determining infant's intestinal flora maturity |
CN107025387A (en) * | 2017-03-29 | 2017-08-08 | 电子科技大学 | One kind is used for biomarker for cancer and knows method for distinguishing |
CN108181471A (en) * | 2017-12-15 | 2018-06-19 | 新疆医科大学第附属医院 | A kind of detection marker of dissection of aorta and marker appraisal procedure |
Non-Patent Citations (4)
Title |
---|
Accurate and Reliable Cancer Classification Based on Probabilistic Inference of Pathway Activity;Junjie Su 等;《PloS ONE》;20091207;第4卷(第12期);第1-10页 * |
Learning Gaussian Graphical Models of Gene Networks with False Discovery Rate Control;Jose M. Pena 等;《European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics》;20081231;第4973卷;第165-176页 * |
Personalized characterization of diseases using sample-specific networks;Xiaoping Liu 等;《Nucleic Acids Research》;20160904;第44卷(第22期);第1-18页 * |
血清肿瘤标志物在胰腺癌诊断中的选择;高云朝;《上海医学》;20051231;第28卷(第04期);第330-331页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109101783A (en) | 2018-12-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112435714B (en) | Tumor immune subtype classification method and system | |
CN110444248B (en) | Cancer biomolecule marker screening method and system based on network topology parameters | |
CN109101783B (en) | Cancer network marker determination method and system based on probability model | |
CN108694991B (en) | Relocatable drug discovery method based on integration of multiple transcriptome datasets and drug target information | |
CN106599616B (en) | Ultralow frequency mutational site determination method based on duplex-seq | |
CN109994200A (en) | A kind of multiple groups cancer data confluence analysis method based on similarity fusion | |
CN111883223B (en) | Report interpretation method and system for structural variation in patient sample data | |
CN113053535B (en) | Medical information prediction system and medical information prediction method | |
CN107301328B (en) | Cancer subtype accurate discovery and evolution analysis method based on data flow clustering | |
CN114530249A (en) | Disease risk assessment model construction method based on intestinal microorganisms and application | |
CN106055922A (en) | Hybrid network gene screening method based on gene expression data | |
CN107169264B (en) | complex disease diagnosis system | |
CN108920903B (en) | LncRNA and disease incidence relation prediction method and system based on naive Bayes | |
CN115631847B (en) | Early lung cancer diagnosis system, storage medium and equipment based on multiple groups of chemical characteristics | |
CN117912570B (en) | Classification feature determining method and system based on gene co-expression network | |
CN115881232A (en) | ScRNA-seq cell type annotation method based on graph neural network and feature fusion | |
CN116564409A (en) | Machine learning-based identification method for sequencing data of transcriptome of metastatic breast cancer | |
CN110223786B (en) | Method and system for predicting drug-drug interaction based on nonnegative tensor decomposition | |
CN116344046A (en) | Quantification method of stability in individual health state based on multiple groups of study data | |
KR102462746B1 (en) | Method And System For Constructing Cancer Patient Specific Gene Networks And Finding Prognostic Gene Pairs | |
Joshi et al. | Delimiting continuity: Comparison of target enrichment and double digest restriction‐site associated DNA sequencing for delineating admixing parapatric Melitaea butterflies | |
Zhou et al. | Accurate integration of multiple heterogeneous single-cell RNA-seq data sets by learning contrastive biological variation | |
CN115116542B (en) | Metagenome-based sample-specific species interaction network construction method and system | |
CN116631641B (en) | Disease prediction device integrating self-adaptive similar patient diagrams | |
CN110797083B (en) | Biomarker identification method based on multiple networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200904 Termination date: 20210814 |
|
CF01 | Termination of patent right due to non-payment of annual fee |