CN109448787B - Protein subnuclear localization method for feature extraction and fusion based on improved PSSM - Google Patents
Protein subnuclear localization method for feature extraction and fusion based on improved PSSM Download PDFInfo
- Publication number
- CN109448787B CN109448787B CN201811187766.XA CN201811187766A CN109448787B CN 109448787 B CN109448787 B CN 109448787B CN 201811187766 A CN201811187766 A CN 201811187766A CN 109448787 B CN109448787 B CN 109448787B
- Authority
- CN
- China
- Prior art keywords
- pssm
- protein
- features
- improved
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 53
- 102000004169 proteins and genes Human genes 0.000 title claims abstract description 53
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000000605 extraction Methods 0.000 title claims abstract description 26
- 230000004927 fusion Effects 0.000 title claims abstract description 21
- 230000004807 localization Effects 0.000 title claims abstract description 14
- 239000011159 matrix material Substances 0.000 claims abstract description 20
- 238000005457 optimization Methods 0.000 claims abstract description 5
- 230000009191 jumping Effects 0.000 claims abstract description 3
- 238000013145 classification model Methods 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 102000004196 processed proteins & peptides Human genes 0.000 claims description 9
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 5
- 238000002372 labelling Methods 0.000 claims description 5
- 230000014509 gene expression Effects 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 230000000295 complement effect Effects 0.000 claims description 2
- 238000012216 screening Methods 0.000 claims description 2
- 238000012360 testing method Methods 0.000 claims description 2
- 238000011160 research Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 3
- 238000007500 overflow downdraw method Methods 0.000 description 3
- 230000004960 subcellular localization Effects 0.000 description 3
- 230000001640 apoptogenic effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 108010077805 Bacterial Proteins Proteins 0.000 description 1
- 108010052285 Membrane Proteins Proteins 0.000 description 1
- 102000018697 Membrane Proteins Human genes 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 210000000170 cell membrane Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001086 cytosolic effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000004853 protein function Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a protein sub-nucleus positioning method for feature extraction and fusion based on improved PSSM, and relates to the technical field of biology and information. The protein sub-nucleus positioning method for extracting and fusing features based on the improved PSSM firstly adopts a Z-SoftMax function to standardize a position specificity scoring matrix for evolution information of a protein sequence; secondly, respectively extracting features of the position specificity scoring matrix in different directions and different jumping intervals by adopting the proposed SC-PSSM-C and SC-PSSM-R, and fixing the length of the PSSM; and finally, performing final classification prediction by using a W-SVM classifier after parameter optimization. The method can make up the limitation and singleness of the traditional characteristic extraction and improve the capability of protein subnuclear localization.
Description
Technical Field
The invention relates to the technical field of biology and information, in particular to a protein sub-nucleus positioning method for extracting and fusing features based on improved PSSM.
Background
With the popularization and improvement of human genome sequencing technology, protein sequences are produced in large quantities. In the last 20 years, the understanding of the protein function of newly detected sequences has become one of the hot spots in bioinformatics research. The function of a protein depends on its location in the cell, and determining the subcellular localization of a protein is considered to be an important step in understanding its function. The protein sub-nucleus localization information can provide important clues for the prevention, diagnosis and treatment of diseases. In recent years, with the rapid development of computer science, the research of protein sub-nucleus positioning by using a machine learning method becomes a hotspot of bioinformatics research, and the defects of high research and development cost and low prediction speed of the traditional method can be overcome.
At present, the key part of protein subcellular localization prediction research is the extraction of characteristic information and the construction of a classification algorithm model. Experiments of a large number of published papers show that evolution information has an important role in positioning and predicting subnuclei when being used for extracting characteristics of proteins, and how to convert effective evolution information of an extracted ordered sequence into an effective characteristic vector with fixed dimensions is a difficult point of current research. The most effective algorithms for improvement based on evolution information at present mainly include PSSM-CC proposed by Dong Q and Zhou S in 2009, "A multiple information fusion method for predicting sub cellular locations of two differences types of bacterial proteins and" k-segmented-bigrams-PSSM algorithm jointly proposed by Tokyo university, Australian Gregorphis university and Nantaiyang university in 2015 by jin Cheng.
In summary, the technical problems of the prior art are as follows: these models, while providing more information about the protein sequence of amino acid interactions, are still limited to valid discriminatory information in a column or row, or in two columns or rows with variable spacing; the extracted features are too single to express the overall features of the protein sequence. The extraction of effective features influences the classification result of the classifier, samples in proteomic data generally have the characteristic of high-dimensional features, and certain challenges still exist in how to effectively select the features of the data, remove irrelevant features and relieve 'dimensional disasters'; secondly, the data sets in the proteomics have unbalance problems, such as Mutipass membrane protein data sets and the like, the unbalance of the data sets causes the low class prediction precision of the small sample number, and the unbalance problem becomes a difficult point and a key research content in the proteomics. The existing problems are further researched on the basis of work of people before the summary, and a novel machine learning method is provided, so that the prediction accuracy of a few types can reach the result similar to the accuracy of a plurality of types in the final result, and the overall recognition effect is improved.
Disclosure of Invention
In view of the above problems in the prior art, a protein sub-nucleus localization method for performing feature extraction and fusion based on a Position Specificity Score Matrix (PSSM) is provided, a new feature extraction and fusion method is provided to improve the prediction recognition rate of the sub-nucleus protein, and a protein sub-nucleus localization method for performing feature extraction and fusion based on the improved PSSM is provided.
In order to achieve the technical purpose and achieve the technical effect, the invention is realized by the following technical scheme:
the protein sub-nucleus positioning method for performing feature extraction and fusion based on the improved PSSM comprises the following steps:
step 1: acquiring a protein data set, determining whether the acquired data set is a single-label problem or a multi-label problem, converting the data set into a standard fata format aiming at a single label, and labeling the categories of all samples;
step 2: setting the iteration parameter to be 3, setting the E-value of each protein in comparison search to be 0.001, and calculating the PSSM matrix of each piece of data;
and step 3: respectively adopting different feature expressions to construct a feature set for the features obtained in the step 2, and extracting richer complementary information;
and 4, step 4: selecting the features by adopting an improved maximum information coefficient aiming at the features acquired in the step 3;
and 5: judging whether the feature set obtained in the step 4 is a balanced data set, if so, skipping the step, and if not, performing sampling processing;
the balance data set judges the difference value of each type through setting;
step 6: and (4) constructing a classification model aiming at the data set obtained in the step (4).
Further, in the step 1, a corresponding threshold is set for the acquired data set according to the length of each piece of data, and the length of the threshold is greater than 50.
Further, the PSSM matrix for each piece of data is calculated, each protein is denoted by P, where P ═ P1, P2.., P20], Pj ═ P1j, P2 j.. PLj ] (j ═ 1, 2.. 20), and L represents the length of each protein.
Further, the step of constructing the feature set by respectively adopting different feature expressions for the features obtained in the step 2 includes the following steps:
performing dimension unification on the PSSM processed in the step 2, wherein the formula is as follows:wherein c represents the number of classes, and x represents the value of the original PSSM matrix;
normalizing the dimensionality-unified data set by the formula of (x-mu)/sigma, wherein x is a corresponding value processed in the step 3.1, mu is an average number, and sigma is a standard deviation;
and (3) carrying out feature extraction of an SC-PSSM-R algorithm on the processed data set, wherein the formula is as follows:wherein
When r is 0, it represents two adjacent peptides, when r is 1, it represents two peptides at a distance of 1, and so on;
extracting column direction characteristics of the data set subjected to dimension unification and standardized processing, wherein the formula is as follows:
the above formula can be extended to the formula:
whereinRepresenting the difference value of the corresponding values of the position specificity scoring matrix corresponding to the two peptides;
and setting the weight and the step length as 0.01 to traverse the score specificity evolution information under different jumping intervals in different directions, seeking the best feature set and analyzing the primary fusion effect of the features under different weights.
Further, the selecting of the features by using the improved maximum information coefficient for the obtained features includes the following steps:
the obtained maximum information coefficients are orderly arranged by scoring, the scoring conditions of different data sets are analyzed, different thresholds are set, and corresponding characteristics are selected;
and performing maximum information coefficient operation on the obtained features again, and forming a new feature set by taking the corresponding scores obtained as the weights of the features differently from the above.
Further, the constructing a classification model for the data set obtained in step 4 includes the following steps:
training classification models with different parameters according to the characteristics of different data sets, and optimizing the parameters by a global and local parameter optimization method;
and putting the processed protein test set data into a corresponding trained classification model for final classification prediction.
The invention has the beneficial effects that: the invention relates to a protein sub-nucleus positioning method for extracting and fusing features based on improved PSSM; firstly, preprocessing an obtained protein data set and calculating a position specificity score matrix of the obtained data set, and secondly, carrying out Z-Softmax function standardization processing on a PSSM matrix of the obtained position specificity score matrix, so that Nall data generated in the traditional method processing process is avoided; then, local and global characteristics of the rows and the columns of the processed PSSM matrix are extracted by setting different interval jump values R, namely SC-PSSM-R and SC-PSSM-L algorithms; then, carrying out feature selection and scoring weighting on the SC-PSSM-R and SC-PSSM-L feature matrixes subjected to weighting fusion twice by adopting the improved maximum information coefficient; and finally, performing final prediction evaluation through the classifier after the parameters are optimized. The PSSM improved feature extraction and fusion-based protein sub-nucleus localization research method provided by the invention can not only extract effective features of the position score specific matrix in different directions and different jump intervals, enhance the complementarity between effective information, but also remove redundancy by adopting an improved feature selection method. The feature extraction is the premise of classification, and the effective feature extraction can improve the recognition rate of the classifier. Compared with the traditional PSSM scoring matrix-based method, the method can extract more abundant and effective protein features.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a general flowchart of a protein sub-nucleus localization method for feature extraction and fusion based on modified PSSM according to an embodiment of the present invention;
FIG. 2 is a flowchart of an embodiment of the present invention, which is an implementation of a protein sub-nucleus localization method based on improved PSSM for feature extraction and fusion;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1-2
A protein sub-nucleus localization method for feature extraction and fusion based on improved PSSM comprises the following steps:
step 1: the method comprises the steps of acquiring a protein data set, determining whether the acquired data set is a single-label problem or a multi-label problem (the invention mainly aims at the single-label problem), converting the data set into a standard fata format, and labeling the category of all samples.
In step 1, a threshold value (generally, the length is greater than 50) is set according to the length of each piece of data for data screening of the acquired data set.
And 2, setting the iteration parameter to be 3, setting the E-value of each protein during comparison search to be 0.001, and calculating the PSSM matrix of each piece of data. Each protein is denoted by P, where P ═ P1, P2.., P20], Pj ═ P1j, P2 j.. PLj ] (j ═ 1, 2.. 20), and L represents the length of each protein.
And 3, respectively converting the position scoring matrixes obtained in the step 2, and respectively extracting corresponding characteristics to construct a characteristic set.
The first step of step 3 is: processing the PSSM obtained in the step 2 to enable the dimension of the PSSM to be unified, wherein the formula is as follows:where c represents the number of classes and x represents the value of the original PSSM matrix.
The second step is: and carrying out normalization processing on the data set subjected to the first-step dimension normalization, wherein the formula is that z is (x-mu)/sigma. Where x is the value after step 3.1, μ is the mean and σ is the standard deviation.
The third step is: and performing feature extraction of the SC-PSSM-R algorithm on the data set processed in the second step. The formula is as follows:
When r is 0, it indicates two adjacent peptides, when r is 1, it indicates two peptides at a distance of 1, and so on.
The fourth step: extracting column direction characteristics of the data set processed in the second step of the step 3, wherein the formula is as follows:
the formula can be expanded to be:
whereinRepresents the difference between the values of the position-specific score matrix corresponding to the two peptides. Wherein r is the same as the above steps.
The fifth step of step 3: and (4) traversing the fused score specific evolution information under different directions and different hop intervals by setting the weight and the step length as 0.01, and searching for the best feature set. As shown in fig. 2, the weights are continuously updated, the effect of the primary feature fusion under different weights is analyzed, and an optimal CRC-PSSM feature set is selected by comparison.
And 4, step 4: selecting the features by adopting the improved maximum information coefficient aiming at the features selected in the fifth step in the step 3;
the first step is as follows: and (4) orderly arranging the maximum information coefficients obtained in the step (4) by scoring, analyzing the scoring distribution condition of each feature, setting different thresholds aiming at different data sets, and selecting corresponding features.
The second step is that: and performing maximum information coefficient operation on the features obtained in the first step again, wherein the maximum information coefficient operation is different from the operation of performing operation on the features obtained in the first step by taking the corresponding scores as weights of the features and taking the weights as new features.
And 5: and (4) judging whether the feature set obtained in the second step of the step (4) is a balanced data set (judging whether the difference value of each class is out of the range by setting a class difference threshold), if so, skipping the step, and if not, carrying out sampling processing.
Step 6: and (4) constructing a classification model aiming at the data set obtained in the step (4).
And training classification models with different parameters for the characteristics of different data sets, and performing parameter optimization through a global and local parameter optimization method.
The classification model constructed in the above steps is applied to protein subcellular localization.
Example 2
The invention is experimentally verified based on the disclosed apoptotic protein data set ZD 98. ZD98 was established by Zhou and Doctor in 2003 and the data set contained apoptotic protein sequences at 4 subcellular locations, cytoplasmic proteins (CY), plasma membrane-bound proteins (ME), mitochondal proteins (MI) and OTHER proteins (OTHER), respectively. In Table I OA represents the overall correct recognition rate. The table result is that the feature is strictly fused according to the feature extraction method and the fusion strategy, and the dimension reduction is only carried out by adopting the traditional linear discriminant analysis algorithm in the aspect of feature selection, so that the result is superior to the traditional feature extraction method. As can be seen from table 1, the numerical values of the algorithm herein on these evaluation objective indices are more effective than other algorithms.
TABLE 1 fusion result graph based on different fusion methods
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.
Claims (5)
1. A protein sub-nucleus positioning method for feature extraction and fusion based on improved PSSM is characterized in that: the method comprises the following steps:
step 1: acquiring a protein data set, determining whether the acquired data set is a single-label problem or a multi-label problem, converting the data set into a standard fata format aiming at a single label, and labeling the categories of all samples;
step 2: setting the iteration parameter to be 3, setting the E-value of each protein in comparison search to be 0.001, and calculating the PSSM matrix of each piece of data;
and step 3: respectively adopting different feature expressions to construct a feature set for the features obtained in the step 2, and extracting richer complementary information;
and 4, step 4: selecting the features by adopting an improved maximum information coefficient aiming at the features acquired in the step 3 to obtain a feature set;
and 5: judging the difference value of each class by setting a class difference threshold value, judging whether the feature set obtained in the step 4 is a balanced data set, if so, skipping the step, and if not, performing sampling processing;
step 6: constructing a classification model aiming at the data set obtained in the step 4;
the method for constructing the feature set by respectively adopting different feature expressions for the features obtained in the step 2 comprises the following steps:
performing dimension unification on the PSSM processed in the step 2, wherein the formula is as follows:wherein c represents the number of classes, and x represents the value of the original PSSM matrix;
z is (x-mu)/sigma, wherein x is a corresponding value after the dimensionality unification treatment, mu is an average number, and sigma is a standard deviation;
carrying out the feature extraction of the SC-PSSM-R algorithm on the data set after the standardization treatment, wherein the formula is as follows: whereinWhen r is 0, it represents two adjacent peptides, when r is 1, it represents two peptides at a distance of 1, and so on;
extracting column direction characteristics of the data set subjected to dimension unification and standardized processing, wherein the formula is as follows:
the above formula can be extended to the formula:
whereinRepresenting the difference value of the corresponding values of the position specificity scoring matrix corresponding to the two peptides;
and setting the weight and the step length as 0.01 to traverse the score specificity evolution information under different jumping intervals in different directions, seeking the best feature set and analyzing the primary fusion effect of the features under different weights.
2. The method for protein sub-nuclear localization based on improved PSSM for feature extraction and fusion of claim 1, wherein: and step 1, setting a corresponding threshold value for the acquired data set according to the length of each piece of data to carry out data screening, wherein the length of the threshold value is more than 50.
3. The method for protein sub-nuclear localization based on improved PSSM for feature extraction and fusion of claim 1, wherein: the PSSM matrix for each piece of data is calculated, each protein being denoted by P, where P ═ P1, P2.., P20], Pj ═ P1j, P2 j.. PLj ] (j ═ 1, 2.. 20), and L represents the length of each protein.
4. The method for protein sub-nuclear localization based on improved PSSM for feature extraction and fusion of claim 1, wherein: and (3) selecting the features by adopting the improved maximum information coefficient aiming at the features acquired in the step (3), wherein the method comprises the following steps:
the first step is as follows: the obtained maximum information coefficients are orderly arranged by scoring, the scoring conditions of different data sets are analyzed, different thresholds are set, and corresponding characteristics are selected;
the second step is that: and performing maximum information coefficient operation on the obtained features again, wherein the maximum information coefficient operation is different from the step of performing maximum information coefficient operation on the obtained corresponding scores as the weights of the features to form a new feature set.
5. The method for protein sub-nuclear localization based on improved PSSM for feature extraction and fusion of claim 1, wherein: the method for constructing the classification model of the data set obtained in the step 4 comprises the following steps:
training classification models with different parameters according to the characteristics of different data sets, and optimizing the parameters by a global and local parameter optimization method;
and putting the processed protein test set data into a corresponding trained classification model for final classification prediction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811187766.XA CN109448787B (en) | 2018-10-12 | 2018-10-12 | Protein subnuclear localization method for feature extraction and fusion based on improved PSSM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811187766.XA CN109448787B (en) | 2018-10-12 | 2018-10-12 | Protein subnuclear localization method for feature extraction and fusion based on improved PSSM |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109448787A CN109448787A (en) | 2019-03-08 |
CN109448787B true CN109448787B (en) | 2021-10-08 |
Family
ID=65546092
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811187766.XA Expired - Fee Related CN109448787B (en) | 2018-10-12 | 2018-10-12 | Protein subnuclear localization method for feature extraction and fusion based on improved PSSM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109448787B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390995B (en) * | 2019-07-01 | 2022-03-11 | 上海交通大学 | Alpha spiral transmembrane protein topological structure prediction method and device |
CN110827923B (en) * | 2019-11-06 | 2021-03-02 | 吉林大学 | Semen protein prediction method based on convolutional neural network |
CN112242179A (en) * | 2020-09-09 | 2021-01-19 | 天津大学 | Method for identifying type of membrane protein |
CN113724779B (en) * | 2021-09-02 | 2022-06-17 | 东北林业大学 | SNAREs protein identification method, system, storage medium and equipment based on machine learning technology |
CN113764043B (en) * | 2021-09-10 | 2022-05-20 | 东北林业大学 | Vesicle transport protein identification method and identification equipment based on position specificity scoring matrix |
CN116130005B (en) * | 2023-01-30 | 2023-06-16 | 深圳新合睿恩生物医疗科技有限公司 | Tandem design method and device for multi-epitope vaccine, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101794351A (en) * | 2010-03-09 | 2010-08-04 | 哈尔滨工业大学 | Protein secondary structure engineering prediction method based on large margin nearest central point |
CN105046103A (en) * | 2015-07-03 | 2015-11-11 | 景德镇陶瓷学院 | Novel representation method for protein sequence fusing genetic information |
CN108595909A (en) * | 2018-03-29 | 2018-09-28 | 山东师范大学 | TA targeting proteins prediction techniques based on integrated classifier |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10448588B2 (en) * | 2013-03-15 | 2019-10-22 | Syngenta Participations Ag | Haploid induction compositions and methods for use therefor |
-
2018
- 2018-10-12 CN CN201811187766.XA patent/CN109448787B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101794351A (en) * | 2010-03-09 | 2010-08-04 | 哈尔滨工业大学 | Protein secondary structure engineering prediction method based on large margin nearest central point |
CN105046103A (en) * | 2015-07-03 | 2015-11-11 | 景德镇陶瓷学院 | Novel representation method for protein sequence fusing genetic information |
CN108595909A (en) * | 2018-03-29 | 2018-09-28 | 山东师范大学 | TA targeting proteins prediction techniques based on integrated classifier |
Non-Patent Citations (3)
Title |
---|
Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM;Yunyun Liang;《Computational and Mathematical Methods in Medicine》;20151215;1-9 * |
基于特征融合和降维算法的蛋白质亚核定位研究;刘树慧;《中国优秀硕士学位论文全文数据库基础科学辑》;20170215(第02期);第1.1.2节、1.4节、2.1.4节 * |
效应蛋白数据库的构建及预测方法研究;安怡;《中国优秀硕士学位论文全文数据库信息科技辑》;20180215(第02期);I138-1075 * |
Also Published As
Publication number | Publication date |
---|---|
CN109448787A (en) | 2019-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109448787B (en) | Protein subnuclear localization method for feature extraction and fusion based on improved PSSM | |
Bock et al. | Whole-proteome interaction mining | |
CN107463795A (en) | A kind of prediction algorithm for identifying tyrosine posttranslational modification site | |
CN102760210A (en) | Adenosine triphosphate binding site predicting method for protein | |
CN113488104B (en) | Cancer driving gene prediction method and system based on local and global network centrality analysis | |
CN113764034B (en) | Method, device, equipment and medium for predicting potential BGC in genome sequence | |
CN112489723B (en) | DNA binding protein prediction method based on local evolution information | |
CN112116950B (en) | Protein folding identification method based on depth measurement learning | |
CN115472221A (en) | Protein fitness prediction method based on deep learning | |
CN115206437A (en) | Intelligent screening system for mitochondrial effect molecules and construction method and application thereof | |
CN118038995B (en) | Method and system for predicting small open reading window coding polypeptide capacity in non-coding RNA | |
Yu et al. | SOMRuler: a novel interpretable transmembrane helices predictor | |
CN113053461B (en) | Gene cluster directional mining method based on target | |
CN107301323B (en) | Method for constructing classification model related to psoriasis | |
CN117637035A (en) | Classification model and method for multiple groups of credible integration of students based on graph neural network | |
CN113764031A (en) | Prediction method of N6 methyladenosine locus in trans-tissue/species RNA | |
CN114758721B (en) | Deep learning-based transcription factor binding site positioning method | |
CN111128300A (en) | Protein interaction influence judgment method based on mutation information | |
CN116386733A (en) | Protein function prediction method based on multi-view multi-scale multi-attention mechanism | |
CN112966702A (en) | Method and apparatus for classifying protein-ligand complex | |
CN114627964B (en) | Prediction enhancer based on multi-core learning and intensity classification method and classification equipment thereof | |
KR102166070B1 (en) | Analysis method for efficiency of programmable nuclease and apparatus for the same | |
CN114300036A (en) | Genetic variation pathogenicity prediction method and device, storage medium and computer equipment | |
JP3936851B2 (en) | Clustering result evaluation method and clustering result display method | |
CN113343918A (en) | Power equipment identification method, system, medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20211008 |