Nothing Special   »   [go: up one dir, main page]

CN111613268B - Method for determining gene expression regulation mechanism based on single cell transcriptome data - Google Patents

Method for determining gene expression regulation mechanism based on single cell transcriptome data Download PDF

Info

Publication number
CN111613268B
CN111613268B CN202010464757.1A CN202010464757A CN111613268B CN 111613268 B CN111613268 B CN 111613268B CN 202010464757 A CN202010464757 A CN 202010464757A CN 111613268 B CN111613268 B CN 111613268B
Authority
CN
China
Prior art keywords
cell
gene
expression
determining
central
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010464757.1A
Other languages
Chinese (zh)
Other versions
CN111613268A (en
Inventor
孙小强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202010464757.1A priority Critical patent/CN111613268B/en
Publication of CN111613268A publication Critical patent/CN111613268A/en
Application granted granted Critical
Publication of CN111613268B publication Critical patent/CN111613268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Genetics & Genomics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The application provides a method for determining a gene expression regulation mechanism based on single-cell transcriptome data, which comprises the following steps: determining the specific high expression gene of the central cell and the specific high expression gene of the neighbor cell according to the single cell transcription group data; determining a first sub-network between the central cell and the neighbor cell according to the specific high-expression genes of the central cell and the neighbor cell and the pairing information between the ligand and the receptor; determining a second sub-network of the central cell according to the specific high-expression gene of the central cell and the interaction information between the transcription factor and the target gene; determining a third sub-network of the central cell according to the specific high-expression gene of the central cell and the interaction information between the receptor and the transcription factor; and determining an intercellular multilayer signal network of the central cell and the neighbor cells according to the first sub-network, the second sub-network and the third sub-network so as to reveal the neighbor cells to regulate the gene expression of the central cell.

Description

Method for determining gene expression regulation mechanism based on single cell transcriptome data
Technical Field
The application relates to the field of bioinformatics, in particular to a method for determining a gene expression regulation and control mechanism based on single-cell transcriptome data.
Background
The gene expression is the basis and the root of a complex life phenomenon, is a synergistic action process with multi-level, multi-factor and space-time characteristics, and the control rule of the gene expression is mastered, which is helpful for explaining the mechanisms of life growth and development, disease occurrence and development and the like.
Changes in gene expression level affect changes in cell function and fate, and therefore, the establishment of corresponding cell signaling networks is required for the study of gene expression regulation mechanisms. The signal networks involved in the regulation of gene expression include both intercellular signaling and intracellular signaling and gene activation. Therefore, a systematic, multi-layered, intercellular and intracellular signaling network is needed to elucidate the regulatory mechanisms of gene expression.
Currently, there are two main methods for studying gene expression regulation mechanisms: the first is based on traditional experimental studies and the second is based on methods of high throughput technology (RNA-Seq). Both of these approaches focus primarily on one or several linear signaling pathways at the molecular level.
Both methods have a number of disadvantages in their application:
1) The regulation of gene expression is regulated by a very complex signal network, and is formed by interweaving signal channels formed by functional molecules such as ligand-receptor-transcription factor-target gene and the like, and a single or a plurality of linear channels are not enough to clarify the regulation mechanism of gene expression.
2) The influence of the cellular microenvironment on the regulation of gene expression levels was neglected.
3) Traditional high-throughput sequencing techniques ignore differences in cell specificity. However, the conventional transcriptome sequencing usually reflects the average expression level of the whole gene in a certain region, and individual cell-specific functional molecules with special regulation and control effects may be mistaken as molecules without regulation and control significance because the expression level of the individual cell-specific functional molecules is not as high as that of functional molecules widely expressed among other cells. Thus, subtle, specific biological effects caused by cell-type differences are easily overlooked using traditional transcriptome sequencing.
4) The traditional experiment research efficiency is low. Although the accuracy of experimental research is high and the result is relatively reliable, due to the high resource consumption of the experiment and the complexity of the cell signal network, research can be carried out only on specific signal paths. Therefore, the regulation mechanism of gene expression cannot be comprehensively and systematically elucidated by conventional experimental studies.
Therefore, it is difficult to comprehensively and systematically disclose the regulation mechanism of gene expression by the current methods for studying the regulation mechanism of gene expression.
Disclosure of Invention
The embodiments of the present application are directed to a method for determining a gene expression regulation mechanism based on single-cell transcriptome data, so as to comprehensively and systematically disclose the gene expression regulation mechanism.
In order to achieve the above object, the embodiments of the present application are implemented as follows:
in a first aspect, the embodiments of the present application provide a method for determining a gene expression control mechanism based on single-cell transcriptome data, comprising: determining a specific high-expression gene of a central cell and a specific high-expression gene of a neighbor cell, wherein the central cell represents a cell type of a regulation mechanism of gene expression to be researched, and the neighbor cell represents a cell type having influence possibility on the gene expression of the central cell; determining a first sub-network between the central cell and the neighbor cell according to the specific high-expression genes of the central cell and the specific high-expression genes of the neighbor cell and pairing information between a ligand and a receptor; determining a second sub-network of the central cell according to the specific high-expression genes of the central cell and the interaction information between the transcription factors and the target genes; determining a third sub-network of the central cell according to the specific high-expression genes of the central cell and the interaction information between the receptor and the transcription factor; and determining an intercellular multilayer signal network of the central cell and the neighbor cells according to the first sub-network, the second sub-network and the third sub-network so as to reveal the neighbor cells to regulate the gene expression of the central cell.
In the embodiment of the application, a first sub-network, a second sub-network and a third sub-network are respectively constructed by determining the specific high-expression gene of the central cell and the specific high-expression gene of the neighbor cell and combining pairing information between a ligand and a receptor, interaction information between a transcription factor and a target gene and interaction information between the receptor and the transcription factor, and the intercellular multilayer signal network of the central cell and the neighbor cell is obtained after integration. Therefore, the gene regulation mechanism can be displayed more comprehensively and systematically, the influence of cell specificity on the gene expression regulation mechanism can be accurately reflected, and a new tool is provided for analyzing the regulation mechanism of the cell microenvironment mediation on the interested gene.
With reference to the first aspect, in a first possible implementation manner of the first aspect, before the determining the specific high-expression genes of the central cell and the specific high-expression genes of the neighbor cells, the method further includes: determining an expression matrix of the single cell according to the single cell transcription group data, wherein a row represents each gene, and a column represents each cell; filtering the expression matrix of the single cell, and carrying out normalization processing on the filtered expression matrix of the single cell so as to realize the pretreatment of the single cell transcriptome data; performing dimension reduction, clustering and cell type identification on the single-cell transcriptome data subjected to data preprocessing to determine a data set comprising multiple cell types; determining the central cell and the neighbor cells from the data set.
In the implementation mode, the expression matrix of the single cell is determined according to the single cell transcription group data so as to filter and normalize the expression matrix, so that the data quality control can be realized, the influence of the sequencing depth/library size and the abnormal value/extreme value on the sequencing result can be reduced, the possible technical noise can be corrected, and the like. And performing dimensionality reduction, clustering and cell type identification on the single-cell transcription group data subjected to data preprocessing to determine a data set comprising multiple cell types, so that the accurate classification of the cell types is favorably realized, and the research on a gene expression regulation mechanism is facilitated.
With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the determining the specific high-expression gene of the central cell and the specific high-expression gene of the neighboring cell includes: determining a specific expression matrix of the central cell and a specific expression matrix of the neighbor cell according to the clustering result of the single-cell transcriptome data; and determining the specific high-expression genes of the central cells and the specific high-expression genes of the neighbor cells according to the specific expression matrix of the central cells and the specific expression matrix of the neighbor cells and preset screening conditions.
In the implementation mode, the specific high-expression genes of the central cells and the specific high-expression genes of the neighbor cells can be accurately determined by combining the expression matrix for determining the specificity of the central cells and the expression matrix for determining the specificity of the neighbor cells with the preset screening conditions.
With reference to the first aspect, in a third possible implementation manner of the first aspect, the determining a first subnetwork between the central cell and the neighbor cell according to the specific high-expression gene of the central cell and the specific high-expression gene of the neighbor cell, and the pairing information between the ligand and the receptor includes: obtaining a first relationship list comprising pairing information between a ligand and a receptor; according to the first relation list, determining a high-expression receptor from the specific high-expression genes of the central cell, and determining a high-expression ligand from the specific high-expression genes of the neighbor cells; establishing a first sub-network between the central cell and the neighbor cells based on the highly expressed receptor and the highly expressed ligand.
In this implementation, a first relationship list including information on pairs of ligands and receptors is determined, a high-expression receptor is determined from the specific high-expression genes of the central cell, and a high-expression ligand is determined from the specific high-expression genes of the neighbor cells, so that a first subnetwork between the central cell and the neighbor cells is established, and a signal network formed by the neighbor cells through the pairs of ligands and receptors of the central cell can be accurately reflected.
With reference to the first aspect, in a fourth possible implementation manner of the first aspect, the determining a second subnetwork of the central cell according to the specific highly expressed gene of the central cell and the interaction information between the transcription factor and the target gene includes: acquiring a second relation list containing interaction information between the transcription factor and the target gene; determining a first gene set containing all target genes in the central cell, a second gene set containing high-expression target genes in the central cell and a third gene set containing target genes corresponding to the specified transcription factors; determining a fourth gene set comprising significantly activated transcription factors according to the first gene set, the second gene set and the third gene set; establishing a second subnetwork of the central cells according to the second relationship list, the second gene set, and the fourth gene set.
In this implementation, by determining a second relationship list including interaction information of the transcription factor and the target genes, and determining a fourth gene set including a significantly activated transcription factor according to a first gene set including all target genes in the central cell, a second gene set including target genes highly expressed in the central cell, and a third gene set including target genes corresponding to the specified transcription factor, a second subnetwork of the central cell can be established by combining the second relationship list, the second gene set, and the fourth gene set, so as to comprehensively and accurately reflect a signal network between the transcription factor and the target genes in the central cell.
With reference to the first aspect, in a fifth possible implementation manner of the first aspect, the determining a third subnetwork of the central cell according to the specific highly expressed gene of the central cell and the interaction information between the receptor and the transcription factor includes: obtaining a third relation list containing interaction information between the receptor and the transcription factor; determining a fifth gene set containing all transcription factors in the central cells, a sixth gene set containing activated transcription factors in the central cells and a seventh gene set containing the transcription factors corresponding to the specified receptors; determining an eighth gene set comprising significantly activated receptors according to the fifth gene set, the sixth gene set and the seventh gene set; establishing a third subnetwork of the central cells according to the third relational list, the sixth gene set and the eighth gene set.
In this implementation, a third subnetwork of the central cell can be established by determining a third relational list containing information about interaction between the receptors and the transcription factors, and determining an eighth gene set containing significantly activated receptors from a fifth gene set containing all the transcription factors in the central cell, a sixth gene set containing the transcription factors activated in the central cell, and a seventh gene set containing the transcription factors corresponding to the specified receptors, so as to comprehensively and accurately reflect a signal network between the receptors and the transcription factors in the central cell.
With reference to the first aspect or any one of the first to the fifth possible implementation manners of the first aspect, in a sixth possible implementation manner of the first aspect, the determining an intercellular multilayer signal network of the central cell and the neighboring cells according to the first sub-network, the second sub-network, and the third sub-network includes: updating the first, second, and third sub-networks according to an upstream-downstream relationship among the first, second, and third sub-networks; and integrating the updated first sub-network, the updated second sub-network and the updated third sub-network to determine the intercellular multilayer signal network of the central cell and the neighbor cells.
Updating each sub-network according to the upstream and downstream relations among the first sub-network, the second sub-network and the third sub-network, and integrating the updated sub-networks to determine the intercellular multilayer signal network of the central cell and the neighbor cells, so that the signal network relation between the central cell and the neighbor cells can be systematically and comprehensively reflected, and the method is favorable for systematically and deeply researching the regulation and control path and mechanism of gene expression in the central cell.
In a second aspect, the present application provides an apparatus for determining a gene expression control mechanism based on single-cell transcriptome data, comprising: a high expression gene screening unit for determining a specific high expression gene of a central cell and a specific high expression gene of a neighbor cell, wherein the central cell represents a cell type of a regulation mechanism of gene expression of the central cell to be researched, and the neighbor cell represents a cell type having influence possibility on the gene expression of the central cell; a first sub-network construction unit, configured to determine a first sub-network between the central cell and the neighbor cell according to the specific high-expression genes of the central cell and the neighbor cell, and pairing information between a ligand and a receptor; a second sub-network construction unit, which is used for determining a second sub-network of the central cell according to the specific high-expression genes of the central cell and the interaction information between the transcription factors and the target genes; a third sub-network construction unit, configured to determine a third sub-network of the central cell according to the specific highly expressed gene of the central cell and the interaction information between the receptor and the transcription factor; and the multilayer network model construction unit is used for determining an intercellular multilayer signal network of the central cell and the neighbor cells according to the first sub-network, the second sub-network and the third sub-network so as to reveal the regulation and control of the neighbor cells on the gene expression of the central cell.
In a third aspect, embodiments of the present application provide a storage medium storing one or more programs, which are executable by one or more processors to implement the method for determining a gene expression regulation mechanism based on single-cell transcriptome data according to any one of the first aspect or possible implementations of the first aspect.
In a fourth aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory is configured to store information including program instructions, and the processor is configured to control execution of the program instructions, where the program instructions are loaded and executed by the processor, to implement the method for determining a single-cell transcriptome-data-based gene expression regulation mechanism according to any one of the first aspect or possible implementations of the first aspect.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 is a flow chart of a method for determining a gene expression control mechanism based on single-cell transcriptome data according to an embodiment of the present disclosure.
FIG. 2 is a schematic diagram of the overall process of the method for determining the gene expression control mechanism based on single-cell transcriptome data provided in the embodiment of the present application.
FIG. 3 is a diagram of cell-specific gene expression violin.
FIG. 4 is a schematic diagram of the cell-specific expression of the gene of interest ACE2 and the multi-layer signal network between the central cell and the neighboring cells.
Fig. 5 is a block diagram illustrating a structure of a device for determining a gene expression control mechanism based on single-cell transcriptome data according to an embodiment of the present application.
Fig. 6 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Icon: 10-a device for determining a gene expression regulation mechanism based on single-cell transcriptome data; 11-high expression gene screening unit; 12-a first sub-network building unit; 13-a second sub-network building unit; 14-a third sub-network building unit; 15-a multi-layer network model building unit; 20-an electronic device; 21-a memory; 22-a communication module; 23-a bus; 24-a processor.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Gene expression regulation is regulated by a very complex signaling network. The signal network is usually formed by interlacing signal paths composed of functional molecules such as ligand-receptor-transcription factor-target gene. Since there is also an interplay between signal paths, there is not a strict one-to-one correspondence between functional molecules: for example, the same receptor can bind to a plurality of ligands with similar structures, the receptor can change a plurality of downstream molecular conformations after being activated, and transcription factors can also activate the expression of a plurality of target genes; alternatively, the same functional molecule may appear in different pathways and perform different biological functions, and different ligands acting on the same receptor may produce different or even opposite effects. Therefore, a comprehensive signaling network integrating multiple pathways is needed to more systematically elucidate gene expression regulation mechanisms.
Furthermore, cells are not independent individuals in multicellular tissues or organisms, often multiple cell types coexist interactively in the microenvironment, and their function and fate are often coordinated by its local environment and neighboring cells. The cellular microenvironment includes various types of cells and chemical molecules, and the intercellular secretion of signal molecules (indirect information exchange) is one of the information transmission modes in the cellular microenvironment, which is not limited to the interaction between intercellular receptors and ligands, but also includes the signal molecule transduction on the cell surface, the cascade amplification effect of signal molecules inside the cell and the interaction between downstream transcription factors and target genes. Therefore, the regulation mechanism of gene expression can be comprehensively and systematically researched by researching the cell microenvironment.
Moreover, conventional transcriptome sequencing often reflects the average expression level of the whole gene in a certain region, and individual cell-specific functional molecules with special regulation and control effects may be mistaken as molecules without regulation and control significance because the expression level of the individual cell-specific functional molecules is not as high as that of other cell-to-cell functional molecules. Subtle, specific biological effects caused by cell type differences are therefore easily overlooked using traditional transcriptome sequencing. The single cell transcription group data can be used for identifying cell types and quantifying cell type specific gene expression in mixed cell groups, so that the interaction of microenvironment is solved, and intracellular and intercellular signal pathways mediated by the microenvironment are clarified.
Therefore, in order to comprehensively and systematically explore the regulation mechanism of gene expression, the embodiment of the application provides a method for determining the regulation mechanism of gene expression based on the single-cell transcriptome data.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for determining a gene expression control mechanism based on single-cell transcriptome data according to an embodiment of the present disclosure. The method for determining a gene expression regulatory mechanism based on single-cell transcriptome data may include step S10, step S20, step S30, step S40, and step S50.
In order to take into account the subtle, specific biological effects caused by cell type differences in the intercellular signaling network for comprehensive and systematic study of the gene expression regulation mechanism, the single-cell transcriptome data approach can be used in this embodiment.
Referring to FIG. 2, FIG. 2 is a schematic diagram of the overall process of determining the gene expression control mechanism based on single-cell transcriptome data according to the embodiment of the present application. The process mainly comprises the following steps: the method comprises a single cell transcription group data analysis process, a multilayer structure signal path sub-network construction process and a cell microenvironment multilayer signal network construction process. Wherein, the analysis of the single-Cell transcriptome data can comprise the processes of data Quality control (Quality control), normalization (normalization), dimension reduction (dimensional reduction), cell clustering analysis (Cell clustering), marker identification (Marker identification) and the like; the signal path sub-network construction of the multi-layer structure may include: construction of Ligand-Receptor sub-network, i.e. Ligand-Receptor sub-network (first sub-network); construction of a TF-target gene sub-network, i.e., transcription factor-target gene network (second subnetwork); construction of Receptor-TF sub-network, receptor-transcription factor sub-network (third sub-network); a multilayer signaling network, namely the process of constructing a multilayer signal network and the like.
Before step S10 is executed, data preprocessing may be performed on the single-cell transcriptome data, and dimension reduction, clustering and cell type identification may be performed on the single-cell transcriptome data after data preprocessing, so as to determine a data set including multiple cell types.
In this embodiment, R-package seruat can be used to analyze single-cell transcriptome data, and the analysis process may include two parts: data preprocessing and downstream analysis.
Illustratively, the data preprocessing process may include the steps of: data quality control, normalization/standardization (normalization/scaling) and data rectification and Integration (Correction and Integration).
For example, assuming a total of n cells, m genes, in the single-cell transcriptome data, the single-cell transcriptome data may be converted into an m × n matrix, where rows represent each gene and columns represent each cell.
In order to realize data quality control, the expression matrix of single cells can be filtered. Illustratively, data quality control can be achieved based on the number of gene expressions (e.g., filtering out cells expressing genes greater than 2500 and less than 200), gene expression profiles (e.g., filtering out genes expressed by less than 3 cells), mitochondrial/ribosomal gene ratio (e.g., filtering out cells with a ribosomal/mitochondrial ratio of greater than 10%). It should be noted that the data quality control method and standard are only exemplary and should not be considered as limiting the present application, and the data quality control method and standard may be selected according to actual needs.
In this example, each value in the expression matrix may represent the successful capture, reverse transcription and sequencing of a mRNA molecule in a cell. In practice, even if the same cell is sequenced twice, the depth of the counts obtained may vary. Thus, to reduce the impact of sequencing depth and/or library size on sequencing results, the expression matrix can be normalized. For example, expression matrices are normalized by the lognormaize algorithm to obtain comparable relative abundance of gene expression between cells. In order to facilitate the subsequent downstream analysis process, the height difference gene may be calculated by using the vst algorithm (of course, the height difference gene may also be calculated when the subsequent downstream analysis is performed, and the calculation is not limited herein). And, in order to exclude the influence of the gene expression outlier and the extreme value, the expression matrix may be z-score-converted (normalized) so that the mean of the expression amount of each gene in all cells is 0 and the variance is 1. It should be noted that the normalization processing method is only exemplary, and should not be considered as limiting the application, and different manners may be selected according to actual needs. In addition, the normalization method can select whether to perform normalization according to actual needs.
When single-cell transcriptome data contains multiple data sets, the effect of batch effects also needs to be considered in the process of merging the data sets. In addition, single cell sequencing may present various technical and biological noises, such as cell stress state, sequencing capture failure (dropout), and the like. Corresponding measures (ComBat, LIGER and other software) can be taken to reduce the influence of batch effect and other noises so as to ensure the accuracy of the data as much as possible.
After data preprocessing is performed on the single-cell transcriptome data, downstream analysis can be performed on the single-cell transcriptome data after data preprocessing. In this embodiment, the downstream analysis may include dimension reduction, clustering, and cell type identification.
For example, the normalized expression matrix may be subjected to linear dimensionality reduction by using a Principal Component Analysis (PCA), or may be subjected to other manners, which is not limited herein.
After linear dimensionality reduction, partial principal components can be selected as required to represent the entire data set for cluster analysis. For example, the selected principal components are subjected to clustering analysis by using a Louvain algorithm, and the visualization of a clustering result can be realized through tSNE. Of course, the cluster analysis or the visualization of the cluster result may be implemented in other manners, and is not limited herein.
Then, the single cell identification information collected through the channels of documents, databases and the like or the clustering result is annotated through automatic annotation software to obtain different cell types, so that the cell types are identified to determine a data set comprising a plurality of cell types.
The data preprocessing is carried out on the single-cell transcription group data, so that the data quality control is realized, the influence of the sequencing depth/library size and the abnormal value/extreme value on the sequencing result is reduced, the possible technical noise is corrected, and the like. And performing dimensionality reduction, clustering and cell type identification on the single-cell transcription group data subjected to data preprocessing to determine a data set comprising multiple cell types, so that the accurate classification of the cell types is favorably realized, and the research on a gene expression regulation mechanism is facilitated.
After data pre-processing and downstream analysis (including dimension reduction, clustering, and cell type identification) of the single-cell transcriptome data is completed, central cells and neighbor cells can be determined from the data set.
Generally, when the gene expression regulation mechanism of some genes of interest (e.g., a gene of interest) needs to be investigated due to the specific expression of the gene, the cell type specifically expressing the gene can be determined as a central cell, and the cell type having a possibility of affecting the gene expression of the central cell can be considered as a neighbor cell. Thus, the central cell and neighbor cells can be determined from the data set. However, such a manner should not be considered as limiting the present application, and there are many other ways of determining central and neighbor cells, for example, it may be desirable to study the effect of the cellular microenvironment on gene expression, a particular cell type may be designated as a central cell, a particular cell type may be designated as a neighbor cell, and the like.
After the central cell and the neighbor cells are determined, step S10 may be performed.
Step S10: and determining the specific high-expression genes of the central cell and the specific high-expression genes of the neighbor cells, wherein the central cell represents the cell type of the regulation mechanism of the gene expression to be researched, and the neighbor cells represent the cell types with influence possibility on the gene expression of the central cell.
In this embodiment, the specific expression matrix of the central cell and the specific expression matrix of the neighbor cells can be determined according to the clustering result of the single-cell transcriptome data. Then, the specific high expression genes of the central cells and the specific high expression genes of the neighbor cells can be determined according to the specific expression matrix of the central cells and the specific expression matrix of the neighbor cells and preset screening conditions.
Illustratively, a cell type-specific expression matrix can be obtained according to the clustering result of the single-cell transcriptome data, and the expression ratio of a certain gene (e.g., a gene of interest or other genes) in a specified type of cell (i.e., the cell proportion of the specified type of cell in which the gene expression exceeds a set threshold) can be calculated.
For example, the preset screening condition may be: the one of the two types of cells (e.g., the central cell and a neighbor cell of the central cell) in which the ratio of the gene expression is larger needs to be larger than a certain threshold (e.g., 0.1); the difference of the gene expression ratio of the two types of cells needs to be larger than a certain threshold (for example, 0.1). In addition, the difference in the mean expression values of the gene in the two types of cells needs to be greater than a certain threshold (e.g., 0.25). Of course, such screening conditions are merely exemplary and should not be construed as limiting the present application, and may be set based on a combination of factors such as actual requirements, types of genes, and types of cells.
Illustratively, genes in a cell type-specific expression matrix may be screened according to screening conditions. Then, normalization processing (for example, normalization processing by using a lognormaize algorithm) can be performed on the expression matrix corresponding to the screened gene, and then t-test is performed to examine the reliability of the differential expression of the gene. For example, if the p-value of the t-test is less than 0.05, it can be determined that the gene is highly expressed in the cell type. Thus, it can be determined that the gene is a specific highly expressed gene of the cell type.
In this way, the specific high-expression gene of the central cell and the specific high-expression gene of the neighbor cell can be accurately determined.
Of course, the method for determining the specific highly expressed gene of the central cell and the specific highly expressed gene of the neighboring cell is not limited to this method, and may be implemented in other methods. For example, the fold-change method is used to analyze the gene expression level difference by using the expression value fold, i.e., the ratio of the expression levels of the genes under two conditions is calculated, the threshold value of the ratio is determined, and the gene with the ratio larger than the threshold value is judged as the differentially expressed gene. In addition, statistical wilcoxon rank sum test, SAM and other methods can be used, and an appropriate method can be selected according to actual needs to determine the specific high-expression gene of the central cell and the specific high-expression gene of the neighbor cell, without limitation.
After the specific high expression genes of the central cell and the specific high expression genes of the neighbor cells are determined, step S20 may be performed.
Step S20: and determining a first sub-network between the central cell and the neighbor cell according to the specific high-expression genes of the central cell and the specific high-expression genes of the neighbor cells and pairing information between the ligand and the receptor.
In this embodiment, a first relationship list including pairing information between the ligand and the receptor may be obtained, and the high-expression receptor may be determined from the specific high-expression gene of the central cell and the high-expression ligand may be determined from the specific high-expression gene of the neighbor cell according to the first relationship list, so as to establish a first subnetwork between the central cell and the neighbor cell according to the high-expression receptor and the high-expression ligand.
Illustratively, matching information about the ligand and the receptor can be collected from databases such as DLRP, IUPHAR, HPMR, HPRD, and STRING, and the collected matching information (for example, 2557 pairs) of the ligand and the receptor is collated to obtain a first relationship list including the interaction relationship between the 2557 pairs of the ligand and the receptor, which can be expressed as: e LR ={(Ligand i ,Receptor i )}。
It should be noted that 2557 pairing information of ligand and receptor is only exemplary, and the collected pairing information of ligand and receptor may be different according to the database, and the logarithm of the pairing information of ligand and receptor may be changed by updating the pairing information of ligand and receptor in the database, which is not limited herein. In addition, the obtained first relationship list may be already stored, and is not limited herein.
According to the first relation list E LR The ligand with high expression can be determined from the specificity high expression gene of the neighbor cell and is marked as
Figure BDA0002510861330000141
Determination of the highly expressed receptor from the specific highly expressed genes of the central cell, and this is reported
Figure BDA0002510861330000142
A first subnetwork between the central cell and the neighbor cells can thus be established, denoted:
Figure BDA0002510861330000143
the method comprises the steps of determining a first relation list containing pairing information of a ligand and a receptor, determining a high-expression receptor from specific high-expression genes of a central cell, and determining a high-expression ligand from specific high-expression genes of neighbor cells, so that a first sub-network between the central cell and the neighbor cells is established, and a signal network formed by pairing the ligand and the receptor of the central cell by the neighbor cells can be accurately reflected.
After the specific high expression genes of the central cell and the specific high expression genes of the neighbor cells are determined, step S30 may be performed.
Step S30: and determining a second sub-network of the central cell according to the specific high-expression genes of the central cell and the interaction information between the transcription factors and the target genes.
In this embodiment, a second relationship list containing information on the interaction between the transcription factor and the target gene may be acquired. And a first gene set comprising all target genes in the central cell, a second gene set comprising target genes highly expressed in the central cell, and a third gene set comprising target genes corresponding to the specified transcription factors can be determined. Then, a fourth gene set containing a transcription factor which is significantly activated is determined according to the first gene set, the second gene set and the third gene set, so that a second sub-network of the central cell is established according to the second relation list, the second gene set and the fourth gene set.
Illustratively, the interaction information between the transcription factor and the target gene can be collected from a TRED, KEGG, etc. database, and the collected interaction information between the transcription factor and the target gene (e.g. 8869 pairs) is collated to obtain a second relationship list containing the 8869 pairs of interaction relationships between the transcription factor and the target gene, which can be expressed as: e TT ={(TF i ,TG i )}。
It should be noted that the interaction information of 8869 on the transcription factor and the target gene is merely exemplary and not limited herein. In addition, the obtained second relationship list may be already stored, and is not limited herein.
According to the second relation list E TT The first gene comprising all target genes in the central cell can be determinedThe cause is collected and recorded as TG all (ii) a A second set of genes comprising highly expressed target genes in the central cell (i.e., a set of all highly expressed target genes in the central cell) can be determined and designated TG up (ii) a The TF containing the specified transcription factor can be determined i The third set of corresponding target genes, denoted as
Figure BDA0002510861330000151
Then, TG can be based on the first gene set all A second gene set TG up And a third Gene set
Figure BDA0002510861330000152
Verifying the activation of the transcription factor to determine a fourth set of genes comprising a significantly activated transcription factor, denoted as
Figure BDA0002510861330000153
Illustratively, fisher exact test (i.e., fisher exact test) can be used to verify the activation of the transcription factor, and specifically, the activation probability of the transcription factor is calculated as follows:
Figure BDA0002510861330000161
wherein,
Figure BDA0002510861330000162
representing a binomial coefficient;
Figure BDA0002510861330000163
represents a highly expressed target gene regulated by a specified transcription factor in a central cell; b = | TG up A represents a high-expression target gene regulated by a non-specified transcription factor in a central cell;
Figure BDA0002510861330000164
represents a non-highly expressed target gene regulated by a specified transcription factor in a central cell; d = | TG all And l-a-b-c, which represents a target gene which is not highly expressed in the central cell, or a target gene regulated by a specified transcription factor.
Illustratively, a transcription factor can be determined to be a significantly activated transcription factor when the transcription factor activation probability P is less than 0.05. From this, a fourth gene set comprising all significantly activated transcription factors in the central cell can be determined
Figure BDA0002510861330000165
It should be noted that the manner of determining the fourth gene set including the significantly activated transcription factor may also be another manner, for example, the manner of verifying the activation state of the transcription factor by using the chi-square test to determine the fourth gene set including all the significantly activated transcription factors in the central cell, and therefore, the manner of determining the fourth gene set should not be considered as a limitation of the present application.
After the fourth gene set is determined, the second relationship list E can be used TT A second gene set TC up And a fourth Gene set
Figure BDA0002510861330000166
Establishing a second subnetwork of central cells, denoted:
Figure BDA0002510861330000167
by determining a second relation list containing interaction information of the transcription factors and the target genes, determining a fourth gene set containing the transcription factors which are obviously activated according to a first gene set containing all the target genes in the central cell, a second gene set containing the target genes which are highly expressed in the central cell and a third gene set containing the target genes corresponding to the specified transcription factors, and combining the second relation list, the second gene set and the fourth gene set, a second sub-network of the central cell can be established so as to comprehensively and accurately reflect a signal network between the transcription factors and the target genes in the central cell.
After the specific high expression genes of the central cell and the specific high expression genes of the neighbor cells are determined, step S40 may be performed.
Step S40: and determining a third sub-network of the central cell according to the specific high-expression genes of the central cell and the interaction information between the receptor and the transcription factor.
In this embodiment, a third relationship list including information on interaction between receptors and transcription factors may be obtained, a fifth gene set including all transcription factors in the central cell, a sixth gene set including transcription factors activated in the central cell, a seventh gene set including transcription factors corresponding to the designated receptors may be determined, and an eighth gene set including significantly activated receptors may be determined according to the fifth gene set, the sixth gene set, and the seventh gene set. Then, a third subnetwork of central cells is established according to the third relational list, the sixth gene set and the eighth gene set.
Illustratively, information about the interaction between the receptor and the transcription factor can be collected from the STRING database, and the collected information about the interaction between the receptor and the transcription factor is screened to screen out the receptor and the downstream transcription factor (for example, 39141 pair) with the shortest distance, so as to obtain a third relationship list containing the interaction relationship between the 39141 pair of the receptor and the transcription factor, which can be written as: e RT ={(Receptor i ,TF i )}。
It should be noted that 39141 is merely exemplary and not limiting for the interaction information between the receptor and the transcription factor. In addition, the obtained third relationship list may be already stored, and is not limited herein.
According to the third relation table E RT A fifth set of genes, designated TF, can be identified which contains all transcription factors in the central cell all (ii) a A sixth set of genes comprising transcription factors activated in the central cell (i.e., all significantly activated transcription factors in the central cell) can be identified as TF A (ii) a Can determine that the specific receptor R is contained i Corresponding rotationThe seventh Gene set of transcription factors, denoted
Figure BDA0002510861330000171
Wherein the sixth gene set TF A Namely the fourth gene set
Figure BDA0002510861330000172
Then, TF can be assembled based on the fifth gene set all And the sixth gene set TF A (i.e., the fourth Gene set)
Figure BDA0002510861330000173
) And the seventh Gene set
Figure BDA0002510861330000174
Validating the activation of the receptor to determine an eighth set of genes comprising significantly activated receptors, denoted as
Figure BDA0002510861330000175
Illustratively, fisher's exact test can also be used to verify receptor activation, and specifically, the probability of receptor activation is calculated as follows:
Figure BDA0002510861330000181
wherein,
Figure BDA0002510861330000182
(i.e. the
Figure BDA0002510861330000183
) Indicating activated transcription factors regulated by a given receptor in the central cell; y = | TF A I-x (i.e. the
Figure BDA0002510861330000184
) Indicates activated transcription factors regulated by unspecified receptors in the central cell;
Figure BDA0002510861330000185
represents an inactive transcription factor regulated by a given receptor in the central cell; n = | TF all I-x-y-m, which means neither an activated transcription factor nor a transcription factor regulated by a given receptor in the central cell.
Illustratively, a receptor can be determined to be a significantly activated receptor when the receptor activation probability P is less than 0.05. From this, an eighth gene set comprising all significantly activated receptors in the central cell can be determined
Figure BDA0002510861330000186
It should be noted that the activation status of the receptor can also be verified in other ways, such as chi-square test, to determine the eighth gene set comprising all significantly activated receptors in the central cell, which is not limited herein.
After the eighth gene set is determined, the third relation table E can be used RT And the sixth gene set TF A (i.e., the fourth Gene set)
Figure BDA0002510861330000187
) The eighth Gene set
Figure BDA0002510861330000188
Establishing a third subnetwork of central cells, denoted:
Figure BDA0002510861330000189
namely, it is
Figure BDA00025108613300001810
By determining a third relation list containing interaction information of the receptors and the transcription factors, and determining an eighth gene set containing significantly activated receptors according to a fifth gene set containing all the transcription factors in the central cells, a sixth gene set containing the transcription factors activated in the central cells and a seventh gene set containing the transcription factors corresponding to the specified receptors, a third subnetwork of the central cells can be established so as to comprehensively and accurately reflect a signal network between the receptors and the transcription factors in the central cells.
It should be noted that, step S20 does not have a strict sequence with step S30 and step S40, but step S30 and step S40 need to be executed first and then step S30 and step S40 need to be executed between step S30 and step S40. That is, step S20 may be performed first, and then step S30 and step S40 may be performed; or, first, step S30 is executed, then step S20 is executed, and then step S40 is executed; or, step S30 is executed first, step S40 is executed, and step S20 is executed; step S20 and step S30 may be executed simultaneously, or step S20 and step S40 may be executed simultaneously, only step S40 is required to be ensured after step S30, and therefore, the present application should not be considered as limited herein.
After establishing the first, second and third sub-networks, step S50 may be performed.
Step S50: and determining an intercellular multilayer signal network of the central cell and the neighbor cells according to the first sub-network, the second sub-network and the third sub-network so as to reveal the neighbor cells to regulate the gene expression of the central cell.
In this embodiment, the first sub-network, the second sub-network and the third sub-network may be updated according to the upstream and downstream relationship among the first sub-network, the second sub-network and the third sub-network. And then integrating the updated first sub-network, the updated second sub-network and the updated third sub-network to determine an intercellular multilayer signal network of the central cell and the neighbor cells.
Illustratively, a first sub-network may reveal an intercellular signaling pathway between the receptor and the ligand, a second sub-network may reveal a mutual intracellular signaling pathway between the transcription factor and the target gene, and a third sub-network may reveal an intracellular signaling pathway between the receptor and the transcription factor. When a gene expression regulation mechanism is explored, the ligand of the neighbor cell is combined with the receptor of the central cell, the receptor interacts with the transcription factor, and the transcription factor can act on the target gene, so that the determined upstream and downstream relationship is as follows: first sub-network-third sub-network-second sub-network. Thus, the first, second and third sub-networks may be updated according to the determined upstream and downstream relationship.
For example, the update process of the first sub-network, the second sub-network, and the third sub-network may specifically be:
determination of a significantly activated and highly expressed receptor in central cells as
Figure BDA0002510861330000191
Thus, the first subnetwork (ligand and receptor) is updated to:
Figure BDA0002510861330000192
while the third subnetwork (receptor and transcription factor) can be updated to:
Figure BDA0002510861330000193
in N RT The corresponding transcription factor in (1) is marked as TF; the second subnetwork (transcription factor and target gene) can be updated to: n is a radical of TT =E TT ∩(TF×TG up ) Wherein, TG up Represents a target gene highly expressed in the central cell, i.e., the second gene set.
After updating the first, second, and third subnetworks, the updated subnetworks may be integrated to determine an intercellular multi-layered signal network of the central cell and the neighboring cells.
In addition, in order to facilitate observation and research, the intercellular multilayer signal network of the central cell and the neighbor cells can be visualized and visually presented.
Updating each sub-network through the upstream-downstream relationship among the first sub-network, the second sub-network and the third sub-network, and integrating the updated sub-networks to determine the intercellular multilayer signal network of the central cell and the neighbor cells, so that the signal network relationship between the central cell and the neighbor cells can be systematically and comprehensively reflected, and the regulation and control path and mechanism of gene expression in the central cell can be systematically and deeply researched.
It should be noted that, in this embodiment, the central cell and the neighbor cells are taken as an example for illustration, the number of neighbor cells (providing ligands) is not limited, and in the actual process of studying the gene expression regulation mechanism, the number of the neighbor cells involved is usually large, and it can reflect which cells are regulated during the gene expression process of the central cell, and what the regulation mechanism is. In addition, the interaction relation among the ligand, the receptor, the transcription factor and the target gene is expressed as a multi-layer and multi-channel network structure, so that the gene regulation mechanism can be displayed comprehensively and systematically, the influence of cell specificity on the gene expression regulation mechanism can be accurately reflected, and a new tool is provided for analyzing the regulation mechanism of the cell microenvironment mediated on the interested gene.
Referring to fig. 3 and 4, fig. 3 is a diagram of gene expression violin, and fig. 4 is a schematic diagram of cell-specific expression of the gene of interest ACE2 and a multilayer signal network between a central cell and a neighbor cell.
As can be seen from fig. 3, the method for determining a gene expression regulation and control mechanism based on single-cell transcriptome data provided in the embodiment of the present application can identify specific cell types, which provides a completely new perspective for studying the gene expression regulation and control mechanism. The cell class can be annotated by identifying genes (age, SFTPC, etc. in fig. 3).
Part A of FIG. 4 shows that the present method can map gene expression levels to specific cell types, excluding interference from differences in overall expression levels of a variety of cells. Compared with the traditional analysis method, the method can capture specific biological effects at the resolution of a single cell so as to explore the influence of cell heterogeneity on gene expression.
Part B of fig. 4 shows the mechanism of regulation of ACE2 expression levels by the microenvironment signal network centered on AT2 cells. The intercellular multilayer signal network not only shows the influence of a plurality of signal paths on the ACE2 expression level, but also shows the specific effect of various cells on the ACE2 expression level. Wherein SARS-CoV-2 represents a novel coronavirus; the Transcription factor represents a Transcription factor; nucleotide chain (i.e., gene); ligand represents a Ligand; receptor means Receptor; while Mast cells, AT1 cells, etc. represent neighbor cells, and AT2 cells represent central cells.
Referring to fig. 5, based on the same inventive concept, the embodiment of the present application further provides an apparatus 10 for determining a gene expression regulation mechanism based on single-cell transcriptome, comprising:
a high-expression gene screening unit 11 for determining a specific high-expression gene of a central cell indicating a cell type of a regulatory mechanism of gene expression of which is to be studied and a specific high-expression gene of a neighbor cell indicating a cell type having a possibility of affecting gene expression of the central cell;
a first sub-network constructing unit 12, configured to determine a first sub-network between the central cell and the neighbor cell according to the specific high-expression genes of the central cell and the specific high-expression genes of the neighbor cell, and pairing information between a ligand and a receptor;
a second subnetwork constructing unit 13 for determining a second subnetwork of the central cell according to the specific highly expressed gene of the central cell and the interaction information between the transcription factor and the target gene;
a third sub-network constructing unit 14 for determining a third sub-network of the central cell according to the specific highly expressed gene of the central cell and the interaction information between the receptor and the transcription factor;
and a multilayer network model constructing unit 15, configured to determine an intercellular multilayer signal network of the central cell and the neighbor cells according to the first sub-network, the second sub-network, and the third sub-network, so as to reveal the regulation of the neighbor cells on the gene expression of the central cell.
In this embodiment, the apparatus 10 for determining a gene expression regulation and control mechanism based on single-cell transcriptome data further includes a data processing and cell determining unit, configured to perform data preprocessing on the single-cell transcriptome data before the high-expression gene screening unit 11 determines the specific high-expression gene of the central cell and the specific high-expression gene of the neighbor cell; performing dimensionality reduction, clustering and cell type identification on the single-cell transcription group data subjected to data preprocessing to determine a data set comprising multiple cell types; determining the central cell and the neighbor cells from the data set.
In this embodiment, the data processing and cell determining unit is further configured to determine an expression matrix of a single cell according to the single cell transcription group data, where a row represents each gene and a column represents each cell; and filtering the expression matrix of the single cell, and carrying out normalization processing on the filtered expression matrix of the single cell so as to realize the pretreatment of the single cell transcriptome data.
In this embodiment, the high-expression gene screening unit 11 is further configured to determine an expression matrix of specificity of the central cell and an expression matrix of specificity of the neighbor cells according to a clustering result of the single-cell transcriptome data; and determining the specific high-expression genes of the central cells and the specific high-expression genes of the neighbor cells according to the specific expression matrix of the central cells and the specific expression matrix of the neighbor cells and preset screening conditions.
In this embodiment, the first sub-network constructing unit 12 is further configured to obtain a first relationship list including pairing information between the ligand and the receptor; according to the first relation list, determining a high-expression receptor from the specific high-expression genes of the central cell, and determining a high-expression ligand from the specific high-expression genes of the neighbor cells; establishing a first subnetwork between the central cell and the neighbor cells based on the highly expressed receptor and the highly expressed ligand.
In this embodiment, the second sub-network constructing unit 13 is further configured to obtain a second relationship list including information on interaction between the transcription factor and the target gene; determining a first gene set containing all target genes in the central cell, a second gene set containing high-expression target genes in the central cell and a third gene set containing target genes corresponding to the specified transcription factors; determining a fourth gene set comprising significantly activated transcription factors according to the first gene set, the second gene set and the third gene set; establishing a second subnetwork of the central cells according to the second relationship list, the second gene set, and the fourth gene set.
In this embodiment, the third subnetwork constructing unit 14 is further configured to obtain a third relation list including information about interaction between the receptor and the transcription factor; determining a fifth gene set containing all transcription factors in the central cells, a sixth gene set containing activated transcription factors in the central cells and a seventh gene set containing the transcription factors corresponding to the specified receptors; determining an eighth gene set comprising significantly activated receptors according to the fifth gene set, the sixth gene set and the seventh gene set; establishing a third subnetwork of the central cells according to the third relational list, the sixth gene set and the eighth gene set.
In this embodiment, the multi-layer network model constructing unit 15 is further configured to update the first sub-network, the second sub-network, and the third sub-network according to an upstream-downstream relationship among the first sub-network, the second sub-network, and the third sub-network; and integrating the updated first sub-network, the updated second sub-network and the updated third sub-network to determine the intercellular multilayer signal network of the central cell and the neighbor cells.
Referring to fig. 6, fig. 6 is a block diagram of an electronic device 20 according to an embodiment of the present disclosure.
In this embodiment, the electronic device 20 may be a terminal device, such as a personal computer, a notebook computer, etc., and is not limited herein. Of course, the electronic device 20 may also be a server, such as a network server, a cloud server, a server cluster, and the like, which is not limited herein.
Illustratively, the electronic device 20 may include: a communication module 22 connected to the outside world via a network, one or more processors 24 for executing program instructions, a bus 23, a Memory 21 of different form, such as a magnetic disk, a ROM (Read-Only Memory), a RAM (Random Access Memory), or any combination thereof. The memory 21, the communication module 22 and the processor 24 are connected by a bus 23.
Illustratively, the memory 21 has stored therein a program. Processor 24 may invoke and run these programs from memory 21 so that a method of determining a gene expression regulatory mechanism based on single-cell transcriptome data may be performed by running the programs to fully and systematically reveal the regulatory mechanism of gene expression.
Also, the present embodiments provide a storage medium storing one or more programs, which are executable by one or more processors to implement the method for determining a gene expression regulation mechanism based on single-cell transcriptome data as described in the present embodiments.
In summary, the embodiments of the present application provide a method for determining a gene expression regulation and control mechanism based on single-cell transcriptome data, which includes determining a specific high-expression gene of a central cell and a specific high-expression gene of a neighbor cell, respectively constructing a first subnetwork, a second subnetwork, and a third subnetwork by combining pairing information between a ligand and a receptor, interaction information between a transcription factor and a target gene, and interaction information between the receptor and the transcription factor, and integrating to obtain an intercellular multilayer signal network of the central cell and the neighbor cell. Therefore, the gene regulation mechanism can be displayed more comprehensively and systematically, the influence of cell specificity on the gene expression regulation mechanism can be accurately reflected, and a new tool is provided for analyzing the regulation mechanism of the cell microenvironment mediation on the interested gene.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method for determining a gene expression control mechanism based on single cell transcriptome data, comprising:
determining a specific high-expression gene of a central cell and a specific high-expression gene of a neighbor cell, wherein the central cell represents a cell type of a regulation mechanism of gene expression to be researched, and the neighbor cell represents a cell type having influence possibility on the gene expression of the central cell;
determining a first sub-network between the central cell and the neighbor cell according to the specific high-expression genes of the central cell and the specific high-expression genes of the neighbor cell and pairing information between a ligand and a receptor;
determining a second sub-network of the central cell according to the specific high-expression genes of the central cell and the interaction information between the transcription factors and the target genes;
determining a third sub-network of the central cell according to the specific high-expression genes of the central cell and the interaction information between the receptor and the transcription factor;
and determining an intercellular multilayer signal network of the central cell and the neighbor cells according to the first sub-network, the second sub-network and the third sub-network so as to reveal the neighbor cells to regulate the gene expression of the central cell.
2. The method for determining the gene expression regulatory mechanism based on single-cell transcriptome data of claim 1, wherein before said determining the specific highly expressed genes of the central cell and the specific highly expressed genes of the neighbor cells, said method further comprises:
determining an expression matrix of the single cell according to the single cell transcriptome data, wherein a row represents each gene and a column represents each cell;
filtering the expression matrix of the single cell, and carrying out normalization processing on the filtered expression matrix of the single cell so as to realize the pretreatment of the single cell transcriptome data;
performing dimensionality reduction, clustering and cell type identification on the single-cell transcription group data subjected to data preprocessing to determine a data set comprising multiple cell types;
determining the central cell and the neighbor cells from the data set.
3. The method for determining the gene expression regulation mechanism based on single-cell transcriptome data of claim 2, wherein said determining the specific high-expression gene of the central cell and the specific high-expression gene of the neighbor cell comprises:
determining a specific expression matrix of the central cell and a specific expression matrix of the neighbor cell according to the clustering result of the single-cell transcriptome data;
and determining the specific high-expression genes of the central cells and the specific high-expression genes of the neighbor cells according to the specific expression matrix of the central cells and the specific expression matrix of the neighbor cells and preset screening conditions.
4. The method for determining the gene expression control mechanism based on single-cell transcriptome data of claim 1, wherein said determining the first subnetwork between the central cell and the neighbor cell according to the specific high-expression gene of the central cell and the specific high-expression gene of the neighbor cell, and the pairing information between the ligand and the receptor comprises:
obtaining a first relationship list comprising pairing information between a ligand and a receptor;
according to the first relation list, determining a high-expression receptor from the specific high-expression genes of the central cell, and determining a high-expression ligand from the specific high-expression genes of the neighbor cells;
establishing a first subnetwork between the central cell and the neighbor cells based on the highly expressed receptor and the highly expressed ligand.
5. The method for determining the gene expression control mechanism based on single-cell transcriptome data of claim 1, wherein determining the second subnetwork of the central cell according to the specific highly expressed gene of the central cell and the interaction information between the transcription factor and the target gene comprises:
acquiring a second relation list containing interaction information between the transcription factor and the target gene;
determining a first gene set containing all target genes in the central cell, a second gene set containing high-expression target genes in the central cell and a third gene set containing target genes corresponding to the specified transcription factors;
determining a fourth gene set comprising significantly activated transcription factors according to the first gene set, the second gene set and the third gene set;
establishing a second subnetwork of the central cells according to the second relationship list, the second gene set, and the fourth gene set.
6. The method for determining the gene expression control mechanism based on single-cell transcriptome data of claim 1, wherein said determining the third subnetwork of said central cell according to the specific highly expressed gene of said central cell and the interaction information between the receptor and the transcription factor comprises:
obtaining a third relation list containing interaction information between the receptor and the transcription factor;
determining a fifth gene set containing all transcription factors in the central cells, a sixth gene set containing activated transcription factors in the central cells and a seventh gene set containing the transcription factors corresponding to the specified receptors;
determining an eighth gene set comprising significantly activated receptors according to the fifth gene set, the sixth gene set and the seventh gene set;
establishing a third subnetwork of the central cells according to the third relational list, the sixth gene set and the eighth gene set.
7. The method for determining the gene expression regulatory mechanism based on single-cell transcriptome data of any one of claims 1 to 6, wherein said determining the intercellular multilayer signal network of the central cell and the neighbor cells according to the first sub-network, the second sub-network and the third sub-network comprises:
updating the first, second, and third sub-networks according to an upstream-downstream relationship among the first, second, and third sub-networks;
and integrating the updated first sub-network, the updated second sub-network and the updated third sub-network to determine the intercellular multilayer signal network of the central cell and the neighbor cells.
8. An apparatus for determining a gene expression control mechanism based on single-cell transcriptome data, comprising:
a high expression gene screening unit for determining a specific high expression gene of a central cell and a specific high expression gene of a neighbor cell, wherein the central cell represents a cell type of a regulation mechanism of gene expression of the central cell to be researched, and the neighbor cell represents a cell type having influence possibility on the gene expression of the central cell;
a first sub-network construction unit, configured to determine a first sub-network between the central cell and the neighbor cell according to the specific high-expression genes of the central cell and the neighbor cell, and pairing information between a ligand and a receptor;
a second sub-network construction unit, which is used for determining a second sub-network of the central cell according to the specific high-expression genes of the central cell and the interaction information between the transcription factors and the target genes;
a third subnetwork construction unit, configured to determine a third subnetwork of the central cell according to the specific highly expressed gene of the central cell and the interaction information between the receptor and the transcription factor;
and the multilayer network model construction unit is used for determining an intercellular multilayer signal network of the central cell and the neighbor cells according to the first sub-network, the second sub-network and the third sub-network so as to reveal the regulation and control of the neighbor cells on the gene expression of the central cell.
9. A storage medium storing one or more programs executable by one or more processors to implement the method for determining the gene expression regulatory mechanism based on single-cell transcriptome of any one of claims 1 to 7.
10. An electronic device comprising a memory for storing information including program instructions and a processor for controlling execution of the program instructions, characterized in that: the program instructions, when loaded and executed by a processor, implement the method for determining a gene expression regulation mechanism based on single-cell transcriptome data of any one of claims 1 to 7.
CN202010464757.1A 2020-05-27 2020-05-27 Method for determining gene expression regulation mechanism based on single cell transcriptome data Active CN111613268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010464757.1A CN111613268B (en) 2020-05-27 2020-05-27 Method for determining gene expression regulation mechanism based on single cell transcriptome data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010464757.1A CN111613268B (en) 2020-05-27 2020-05-27 Method for determining gene expression regulation mechanism based on single cell transcriptome data

Publications (2)

Publication Number Publication Date
CN111613268A CN111613268A (en) 2020-09-01
CN111613268B true CN111613268B (en) 2023-02-24

Family

ID=72203129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010464757.1A Active CN111613268B (en) 2020-05-27 2020-05-27 Method for determining gene expression regulation mechanism based on single cell transcriptome data

Country Status (1)

Country Link
CN (1) CN111613268B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112466403B (en) * 2020-12-31 2022-06-14 广州基迪奥生物科技有限公司 Cell communication analysis method and system
CN112820353B (en) * 2021-01-22 2023-10-03 中山大学 Method and system for analyzing cell fate conversion key transcription factors
CN113178233B (en) * 2021-04-27 2023-04-28 西安电子科技大学 Large-scale single-cell transcriptome data efficient clustering method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874704A (en) * 2017-01-04 2017-06-20 湖南大学 The sub- recognition methods of key regulatory in a kind of common regulated and control network of gene based on linear model
CN109467596A (en) * 2018-11-12 2019-03-15 湖北省农业科学院畜牧兽医研究所 Application of the transcription factor SP 1 in regulation pig RTL1 gene expression
CN109637588A (en) * 2018-12-29 2019-04-16 北京百迈客生物科技有限公司 A method of gene regulatory network is constructed based on full transcript profile high-flux sequence
CN109726352A (en) * 2018-12-12 2019-05-07 青岛大学 A kind of construction method of the gene regulatory network based on Differential Equation Model
CN109979538A (en) * 2019-03-28 2019-07-05 广州基迪奥生物科技有限公司 A kind of analysis method based on the unicellular transcript profile sequencing data of 10X

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874704A (en) * 2017-01-04 2017-06-20 湖南大学 The sub- recognition methods of key regulatory in a kind of common regulated and control network of gene based on linear model
CN109467596A (en) * 2018-11-12 2019-03-15 湖北省农业科学院畜牧兽医研究所 Application of the transcription factor SP 1 in regulation pig RTL1 gene expression
CN109726352A (en) * 2018-12-12 2019-05-07 青岛大学 A kind of construction method of the gene regulatory network based on Differential Equation Model
CN109637588A (en) * 2018-12-29 2019-04-16 北京百迈客生物科技有限公司 A method of gene regulatory network is constructed based on full transcript profile high-flux sequence
CN109979538A (en) * 2019-03-28 2019-07-05 广州基迪奥生物科技有限公司 A kind of analysis method based on the unicellular transcript profile sequencing data of 10X

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Effect of Dynamic Interaction between microRNA and Transcription Factor on Gene Expression;Xiaoqiang Sun et al.;《Research Article》;20161110;第1-2页 *

Also Published As

Publication number Publication date
CN111613268A (en) 2020-09-01

Similar Documents

Publication Publication Date Title
Hung et al. Gene set enrichment analysis: performance evaluation and usage guidelines
CN111613268B (en) Method for determining gene expression regulation mechanism based on single cell transcriptome data
Oulas et al. Systems bioinformatics: increasing precision of computational diagnostics and therapeutics through network-based approaches
Marsland III et al. A minimal model for microbial biodiversity can reproduce experimentally observed ecological patterns
Chen et al. Identifying protein complexes and functional modules—from static PPI networks to dynamic PPI networks
Chang et al. Dynamic modular architecture of protein-protein interaction networks beyond the dichotomy of ‘date’and ‘party’hubs
Ideker et al. Discovering regulatory and signalling circuits in molecular interaction networks
Jiang et al. Constructing disease-specific gene networks using pair-wise relevance metric: application to colon cancer identifies interleukin 8, desmin and enolase 1 as the central elements
Liu Identifying network-based biomarkers of complex diseases from high-throughput data
US20170228496A1 (en) System and method for process control of gene sequencing
Dhawan et al. Guidelines for using sigQC for systematic evaluation of gene signatures
Taskesen et al. Pan-cancer subtyping in a 2D-map shows substructures that are driven by specific combinations of molecular characteristics
Maghsoudi et al. A comprehensive survey of the approaches for pathway analysis using multi-omics data integration
CN111312334B (en) Receptor-ligand system analysis method for influencing intercellular communication
Farahbod et al. Differential coexpression in human tissues and the confounding effect of mean expression levels
Chung et al. Decoding the exposome: data science methodologies and implications in exposome-wide association studies (ExWASs)
CN117079804A (en) Method and system for constructing digestive system tumor clinical result prediction model
Overall et al. The small world of adult hippocampal neurogenesis
Bartlett et al. An eQTL biological data visualization challenge and approaches from the visualization community
Chand et al. Network biology approach for identifying key regulatory genes by expression based study of breast cancer
Yépez et al. Detection of aberrant events in RNA sequencing data
Seffernick et al. High-dimensional genomic feature selection with the ordered stereotype logit model
Lee et al. ASpediaFI: functional interaction analysis of alternative splicing events
CN118314951B (en) Glioblastoma prognosis biomarker screening analysis method and system
Ghulam et al. A Review of Pathway Databases and Related Methods Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant