Nothing Special   »   [go: up one dir, main page]

CN109448787B - Protein subnuclear localization method for feature extraction and fusion based on improved PSSM - Google Patents

Protein subnuclear localization method for feature extraction and fusion based on improved PSSM Download PDF

Info

Publication number
CN109448787B
CN109448787B CN201811187766.XA CN201811187766A CN109448787B CN 109448787 B CN109448787 B CN 109448787B CN 201811187766 A CN201811187766 A CN 201811187766A CN 109448787 B CN109448787 B CN 109448787B
Authority
CN
China
Prior art keywords
pssm
protein
features
improved
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201811187766.XA
Other languages
Chinese (zh)
Other versions
CN109448787A (en
Inventor
聂仁灿
阮小利
周冬明
贺康建
李华光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN201811187766.XA priority Critical patent/CN109448787B/en
Publication of CN109448787A publication Critical patent/CN109448787A/en
Application granted granted Critical
Publication of CN109448787B publication Critical patent/CN109448787B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a protein sub-nucleus positioning method for feature extraction and fusion based on improved PSSM, and relates to the technical field of biology and information. The protein sub-nucleus positioning method for extracting and fusing features based on the improved PSSM firstly adopts a Z-SoftMax function to standardize a position specificity scoring matrix for evolution information of a protein sequence; secondly, respectively extracting features of the position specificity scoring matrix in different directions and different jumping intervals by adopting the proposed SC-PSSM-C and SC-PSSM-R, and fixing the length of the PSSM; and finally, performing final classification prediction by using a W-SVM classifier after parameter optimization. The method can make up the limitation and singleness of the traditional characteristic extraction and improve the capability of protein subnuclear localization.

Description

Protein subnuclear localization method for feature extraction and fusion based on improved PSSM
Technical Field
The invention relates to the technical field of biology and information, in particular to a protein sub-nucleus positioning method for extracting and fusing features based on improved PSSM.
Background
With the popularization and improvement of human genome sequencing technology, protein sequences are produced in large quantities. In the last 20 years, the understanding of the protein function of newly detected sequences has become one of the hot spots in bioinformatics research. The function of a protein depends on its location in the cell, and determining the subcellular localization of a protein is considered to be an important step in understanding its function. The protein sub-nucleus localization information can provide important clues for the prevention, diagnosis and treatment of diseases. In recent years, with the rapid development of computer science, the research of protein sub-nucleus positioning by using a machine learning method becomes a hotspot of bioinformatics research, and the defects of high research and development cost and low prediction speed of the traditional method can be overcome.
At present, the key part of protein subcellular localization prediction research is the extraction of characteristic information and the construction of a classification algorithm model. Experiments of a large number of published papers show that evolution information has an important role in positioning and predicting subnuclei when being used for extracting characteristics of proteins, and how to convert effective evolution information of an extracted ordered sequence into an effective characteristic vector with fixed dimensions is a difficult point of current research. The most effective algorithms for improvement based on evolution information at present mainly include PSSM-CC proposed by Dong Q and Zhou S in 2009, "A multiple information fusion method for predicting sub cellular locations of two differences types of bacterial proteins and" k-segmented-bigrams-PSSM algorithm jointly proposed by Tokyo university, Australian Gregorphis university and Nantaiyang university in 2015 by jin Cheng.
In summary, the technical problems of the prior art are as follows: these models, while providing more information about the protein sequence of amino acid interactions, are still limited to valid discriminatory information in a column or row, or in two columns or rows with variable spacing; the extracted features are too single to express the overall features of the protein sequence. The extraction of effective features influences the classification result of the classifier, samples in proteomic data generally have the characteristic of high-dimensional features, and certain challenges still exist in how to effectively select the features of the data, remove irrelevant features and relieve 'dimensional disasters'; secondly, the data sets in the proteomics have unbalance problems, such as Mutipass membrane protein data sets and the like, the unbalance of the data sets causes the low class prediction precision of the small sample number, and the unbalance problem becomes a difficult point and a key research content in the proteomics. The existing problems are further researched on the basis of work of people before the summary, and a novel machine learning method is provided, so that the prediction accuracy of a few types can reach the result similar to the accuracy of a plurality of types in the final result, and the overall recognition effect is improved.
Disclosure of Invention
In view of the above problems in the prior art, a protein sub-nucleus localization method for performing feature extraction and fusion based on a Position Specificity Score Matrix (PSSM) is provided, a new feature extraction and fusion method is provided to improve the prediction recognition rate of the sub-nucleus protein, and a protein sub-nucleus localization method for performing feature extraction and fusion based on the improved PSSM is provided.
In order to achieve the technical purpose and achieve the technical effect, the invention is realized by the following technical scheme:
the protein sub-nucleus positioning method for performing feature extraction and fusion based on the improved PSSM comprises the following steps:
step 1: acquiring a protein data set, determining whether the acquired data set is a single-label problem or a multi-label problem, converting the data set into a standard fata format aiming at a single label, and labeling the categories of all samples;
step 2: setting the iteration parameter to be 3, setting the E-value of each protein in comparison search to be 0.001, and calculating the PSSM matrix of each piece of data;
and step 3: respectively adopting different feature expressions to construct a feature set for the features obtained in the step 2, and extracting richer complementary information;
and 4, step 4: selecting the features by adopting an improved maximum information coefficient aiming at the features acquired in the step 3;
and 5: judging whether the feature set obtained in the step 4 is a balanced data set, if so, skipping the step, and if not, performing sampling processing;
the balance data set judges the difference value of each type through setting;
step 6: and (4) constructing a classification model aiming at the data set obtained in the step (4).
Further, in the step 1, a corresponding threshold is set for the acquired data set according to the length of each piece of data, and the length of the threshold is greater than 50.
Further, the PSSM matrix for each piece of data is calculated, each protein is denoted by P, where P ═ P1, P2.., P20], Pj ═ P1j, P2 j.. PLj ] (j ═ 1, 2.. 20), and L represents the length of each protein.
Further, the step of constructing the feature set by respectively adopting different feature expressions for the features obtained in the step 2 includes the following steps:
performing dimension unification on the PSSM processed in the step 2, wherein the formula is as follows:
Figure BDA0001826683820000041
wherein c represents the number of classes, and x represents the value of the original PSSM matrix;
normalizing the dimensionality-unified data set by the formula of (x-mu)/sigma, wherein x is a corresponding value processed in the step 3.1, mu is an average number, and sigma is a standard deviation;
and (3) carrying out feature extraction of an SC-PSSM-R algorithm on the processed data set, wherein the formula is as follows:
Figure BDA0001826683820000042
wherein
Figure BDA0001826683820000043
When r is 0, it represents two adjacent peptides, when r is 1, it represents two peptides at a distance of 1, and so on;
extracting column direction characteristics of the data set subjected to dimension unification and standardized processing, wherein the formula is as follows:
Figure BDA0001826683820000044
the above formula can be extended to the formula:
Figure BDA0001826683820000045
wherein
Figure BDA0001826683820000046
Representing the difference value of the corresponding values of the position specificity scoring matrix corresponding to the two peptides;
and setting the weight and the step length as 0.01 to traverse the score specificity evolution information under different jumping intervals in different directions, seeking the best feature set and analyzing the primary fusion effect of the features under different weights.
Further, the selecting of the features by using the improved maximum information coefficient for the obtained features includes the following steps:
the obtained maximum information coefficients are orderly arranged by scoring, the scoring conditions of different data sets are analyzed, different thresholds are set, and corresponding characteristics are selected;
and performing maximum information coefficient operation on the obtained features again, and forming a new feature set by taking the corresponding scores obtained as the weights of the features differently from the above.
Further, the constructing a classification model for the data set obtained in step 4 includes the following steps:
training classification models with different parameters according to the characteristics of different data sets, and optimizing the parameters by a global and local parameter optimization method;
and putting the processed protein test set data into a corresponding trained classification model for final classification prediction.
The invention has the beneficial effects that: the invention relates to a protein sub-nucleus positioning method for extracting and fusing features based on improved PSSM; firstly, preprocessing an obtained protein data set and calculating a position specificity score matrix of the obtained data set, and secondly, carrying out Z-Softmax function standardization processing on a PSSM matrix of the obtained position specificity score matrix, so that Nall data generated in the traditional method processing process is avoided; then, local and global characteristics of the rows and the columns of the processed PSSM matrix are extracted by setting different interval jump values R, namely SC-PSSM-R and SC-PSSM-L algorithms; then, carrying out feature selection and scoring weighting on the SC-PSSM-R and SC-PSSM-L feature matrixes subjected to weighting fusion twice by adopting the improved maximum information coefficient; and finally, performing final prediction evaluation through the classifier after the parameters are optimized. The PSSM improved feature extraction and fusion-based protein sub-nucleus localization research method provided by the invention can not only extract effective features of the position score specific matrix in different directions and different jump intervals, enhance the complementarity between effective information, but also remove redundancy by adopting an improved feature selection method. The feature extraction is the premise of classification, and the effective feature extraction can improve the recognition rate of the classifier. Compared with the traditional PSSM scoring matrix-based method, the method can extract more abundant and effective protein features.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a general flowchart of a protein sub-nucleus localization method for feature extraction and fusion based on modified PSSM according to an embodiment of the present invention;
FIG. 2 is a flowchart of an embodiment of the present invention, which is an implementation of a protein sub-nucleus localization method based on improved PSSM for feature extraction and fusion;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1-2
A protein sub-nucleus localization method for feature extraction and fusion based on improved PSSM comprises the following steps:
step 1: the method comprises the steps of acquiring a protein data set, determining whether the acquired data set is a single-label problem or a multi-label problem (the invention mainly aims at the single-label problem), converting the data set into a standard fata format, and labeling the category of all samples.
In step 1, a threshold value (generally, the length is greater than 50) is set according to the length of each piece of data for data screening of the acquired data set.
And 2, setting the iteration parameter to be 3, setting the E-value of each protein during comparison search to be 0.001, and calculating the PSSM matrix of each piece of data. Each protein is denoted by P, where P ═ P1, P2.., P20], Pj ═ P1j, P2 j.. PLj ] (j ═ 1, 2.. 20), and L represents the length of each protein.
And 3, respectively converting the position scoring matrixes obtained in the step 2, and respectively extracting corresponding characteristics to construct a characteristic set.
The first step of step 3 is: processing the PSSM obtained in the step 2 to enable the dimension of the PSSM to be unified, wherein the formula is as follows:
Figure BDA0001826683820000081
where c represents the number of classes and x represents the value of the original PSSM matrix.
The second step is: and carrying out normalization processing on the data set subjected to the first-step dimension normalization, wherein the formula is that z is (x-mu)/sigma. Where x is the value after step 3.1, μ is the mean and σ is the standard deviation.
The third step is: and performing feature extraction of the SC-PSSM-R algorithm on the data set processed in the second step. The formula is as follows:
Figure BDA0001826683820000082
wherein (m, n ═ 1, 2.. 20), wherein
Figure BDA0001826683820000083
When r is 0, it indicates two adjacent peptides, when r is 1, it indicates two peptides at a distance of 1, and so on.
The fourth step: extracting column direction characteristics of the data set processed in the second step of the step 3, wherein the formula is as follows:
Figure BDA0001826683820000084
the formula can be expanded to be:
Figure BDA0001826683820000085
wherein
Figure BDA0001826683820000086
Represents the difference between the values of the position-specific score matrix corresponding to the two peptides. Wherein r is the same as the above steps.
The fifth step of step 3: and (4) traversing the fused score specific evolution information under different directions and different hop intervals by setting the weight and the step length as 0.01, and searching for the best feature set. As shown in fig. 2, the weights are continuously updated, the effect of the primary feature fusion under different weights is analyzed, and an optimal CRC-PSSM feature set is selected by comparison.
And 4, step 4: selecting the features by adopting the improved maximum information coefficient aiming at the features selected in the fifth step in the step 3;
the first step is as follows: and (4) orderly arranging the maximum information coefficients obtained in the step (4) by scoring, analyzing the scoring distribution condition of each feature, setting different thresholds aiming at different data sets, and selecting corresponding features.
The second step is that: and performing maximum information coefficient operation on the features obtained in the first step again, wherein the maximum information coefficient operation is different from the operation of performing operation on the features obtained in the first step by taking the corresponding scores as weights of the features and taking the weights as new features.
And 5: and (4) judging whether the feature set obtained in the second step of the step (4) is a balanced data set (judging whether the difference value of each class is out of the range by setting a class difference threshold), if so, skipping the step, and if not, carrying out sampling processing.
Step 6: and (4) constructing a classification model aiming at the data set obtained in the step (4).
And training classification models with different parameters for the characteristics of different data sets, and performing parameter optimization through a global and local parameter optimization method.
The classification model constructed in the above steps is applied to protein subcellular localization.
Example 2
The invention is experimentally verified based on the disclosed apoptotic protein data set ZD 98. ZD98 was established by Zhou and Doctor in 2003 and the data set contained apoptotic protein sequences at 4 subcellular locations, cytoplasmic proteins (CY), plasma membrane-bound proteins (ME), mitochondal proteins (MI) and OTHER proteins (OTHER), respectively. In Table I OA represents the overall correct recognition rate. The table result is that the feature is strictly fused according to the feature extraction method and the fusion strategy, and the dimension reduction is only carried out by adopting the traditional linear discriminant analysis algorithm in the aspect of feature selection, so that the result is superior to the traditional feature extraction method. As can be seen from table 1, the numerical values of the algorithm herein on these evaluation objective indices are more effective than other algorithms.
TABLE 1 fusion result graph based on different fusion methods
Figure BDA0001826683820000101
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (5)

1. A protein sub-nucleus positioning method for feature extraction and fusion based on improved PSSM is characterized in that: the method comprises the following steps:
step 1: acquiring a protein data set, determining whether the acquired data set is a single-label problem or a multi-label problem, converting the data set into a standard fata format aiming at a single label, and labeling the categories of all samples;
step 2: setting the iteration parameter to be 3, setting the E-value of each protein in comparison search to be 0.001, and calculating the PSSM matrix of each piece of data;
and step 3: respectively adopting different feature expressions to construct a feature set for the features obtained in the step 2, and extracting richer complementary information;
and 4, step 4: selecting the features by adopting an improved maximum information coefficient aiming at the features acquired in the step 3 to obtain a feature set;
and 5: judging the difference value of each class by setting a class difference threshold value, judging whether the feature set obtained in the step 4 is a balanced data set, if so, skipping the step, and if not, performing sampling processing;
step 6: constructing a classification model aiming at the data set obtained in the step 4;
the method for constructing the feature set by respectively adopting different feature expressions for the features obtained in the step 2 comprises the following steps:
performing dimension unification on the PSSM processed in the step 2, wherein the formula is as follows:
Figure FDA0003221405180000011
wherein c represents the number of classes, and x represents the value of the original PSSM matrix;
z is (x-mu)/sigma, wherein x is a corresponding value after the dimensionality unification treatment, mu is an average number, and sigma is a standard deviation;
carrying out the feature extraction of the SC-PSSM-R algorithm on the data set after the standardization treatment, wherein the formula is as follows:
Figure FDA0003221405180000021
Figure FDA0003221405180000022
wherein
Figure FDA0003221405180000023
When r is 0, it represents two adjacent peptides, when r is 1, it represents two peptides at a distance of 1, and so on;
extracting column direction characteristics of the data set subjected to dimension unification and standardized processing, wherein the formula is as follows:
Figure FDA0003221405180000024
the above formula can be extended to the formula:
Figure FDA0003221405180000025
wherein
Figure FDA0003221405180000026
Representing the difference value of the corresponding values of the position specificity scoring matrix corresponding to the two peptides;
and setting the weight and the step length as 0.01 to traverse the score specificity evolution information under different jumping intervals in different directions, seeking the best feature set and analyzing the primary fusion effect of the features under different weights.
2. The method for protein sub-nuclear localization based on improved PSSM for feature extraction and fusion of claim 1, wherein: and step 1, setting a corresponding threshold value for the acquired data set according to the length of each piece of data to carry out data screening, wherein the length of the threshold value is more than 50.
3. The method for protein sub-nuclear localization based on improved PSSM for feature extraction and fusion of claim 1, wherein: the PSSM matrix for each piece of data is calculated, each protein being denoted by P, where P ═ P1, P2.., P20], Pj ═ P1j, P2 j.. PLj ] (j ═ 1, 2.. 20), and L represents the length of each protein.
4. The method for protein sub-nuclear localization based on improved PSSM for feature extraction and fusion of claim 1, wherein: and (3) selecting the features by adopting the improved maximum information coefficient aiming at the features acquired in the step (3), wherein the method comprises the following steps:
the first step is as follows: the obtained maximum information coefficients are orderly arranged by scoring, the scoring conditions of different data sets are analyzed, different thresholds are set, and corresponding characteristics are selected;
the second step is that: and performing maximum information coefficient operation on the obtained features again, wherein the maximum information coefficient operation is different from the step of performing maximum information coefficient operation on the obtained corresponding scores as the weights of the features to form a new feature set.
5. The method for protein sub-nuclear localization based on improved PSSM for feature extraction and fusion of claim 1, wherein: the method for constructing the classification model of the data set obtained in the step 4 comprises the following steps:
training classification models with different parameters according to the characteristics of different data sets, and optimizing the parameters by a global and local parameter optimization method;
and putting the processed protein test set data into a corresponding trained classification model for final classification prediction.
CN201811187766.XA 2018-10-12 2018-10-12 Protein subnuclear localization method for feature extraction and fusion based on improved PSSM Expired - Fee Related CN109448787B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811187766.XA CN109448787B (en) 2018-10-12 2018-10-12 Protein subnuclear localization method for feature extraction and fusion based on improved PSSM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811187766.XA CN109448787B (en) 2018-10-12 2018-10-12 Protein subnuclear localization method for feature extraction and fusion based on improved PSSM

Publications (2)

Publication Number Publication Date
CN109448787A CN109448787A (en) 2019-03-08
CN109448787B true CN109448787B (en) 2021-10-08

Family

ID=65546092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811187766.XA Expired - Fee Related CN109448787B (en) 2018-10-12 2018-10-12 Protein subnuclear localization method for feature extraction and fusion based on improved PSSM

Country Status (1)

Country Link
CN (1) CN109448787B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390995B (en) * 2019-07-01 2022-03-11 上海交通大学 Alpha spiral transmembrane protein topological structure prediction method and device
CN110827923B (en) * 2019-11-06 2021-03-02 吉林大学 Semen protein prediction method based on convolutional neural network
CN112242179A (en) * 2020-09-09 2021-01-19 天津大学 Method for identifying type of membrane protein
CN113724779B (en) * 2021-09-02 2022-06-17 东北林业大学 SNAREs protein identification method, system, storage medium and equipment based on machine learning technology
CN113764043B (en) * 2021-09-10 2022-05-20 东北林业大学 Vesicle transport protein identification method and identification equipment based on position specificity scoring matrix
CN116130005B (en) * 2023-01-30 2023-06-16 深圳新合睿恩生物医疗科技有限公司 Tandem design method and device for multi-epitope vaccine, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101794351A (en) * 2010-03-09 2010-08-04 哈尔滨工业大学 Protein secondary structure engineering prediction method based on large margin nearest central point
CN105046103A (en) * 2015-07-03 2015-11-11 景德镇陶瓷学院 Novel representation method for protein sequence fusing genetic information
CN108595909A (en) * 2018-03-29 2018-09-28 山东师范大学 TA targeting proteins prediction techniques based on integrated classifier

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10448588B2 (en) * 2013-03-15 2019-10-22 Syngenta Participations Ag Haploid induction compositions and methods for use therefor

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101794351A (en) * 2010-03-09 2010-08-04 哈尔滨工业大学 Protein secondary structure engineering prediction method based on large margin nearest central point
CN105046103A (en) * 2015-07-03 2015-11-11 景德镇陶瓷学院 Novel representation method for protein sequence fusing genetic information
CN108595909A (en) * 2018-03-29 2018-09-28 山东师范大学 TA targeting proteins prediction techniques based on integrated classifier

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM;Yunyun Liang;《Computational and Mathematical Methods in Medicine》;20151215;1-9 *
基于特征融合和降维算法的蛋白质亚核定位研究;刘树慧;《中国优秀硕士学位论文全文数据库基础科学辑》;20170215(第02期);第1.1.2节、1.4节、2.1.4节 *
效应蛋白数据库的构建及预测方法研究;安怡;《中国优秀硕士学位论文全文数据库信息科技辑》;20180215(第02期);I138-1075 *

Also Published As

Publication number Publication date
CN109448787A (en) 2019-03-08

Similar Documents

Publication Publication Date Title
CN109448787B (en) Protein subnuclear localization method for feature extraction and fusion based on improved PSSM
Bock et al. Whole-proteome interaction mining
CN107463795A (en) A kind of prediction algorithm for identifying tyrosine posttranslational modification site
CN102760210A (en) Adenosine triphosphate binding site predicting method for protein
CN113488104B (en) Cancer driving gene prediction method and system based on local and global network centrality analysis
CN113764034B (en) Method, device, equipment and medium for predicting potential BGC in genome sequence
CN112489723B (en) DNA binding protein prediction method based on local evolution information
CN112116950B (en) Protein folding identification method based on depth measurement learning
CN115472221A (en) Protein fitness prediction method based on deep learning
CN115206437A (en) Intelligent screening system for mitochondrial effect molecules and construction method and application thereof
CN118038995B (en) Method and system for predicting small open reading window coding polypeptide capacity in non-coding RNA
Yu et al. SOMRuler: a novel interpretable transmembrane helices predictor
CN113053461B (en) Gene cluster directional mining method based on target
CN107301323B (en) Method for constructing classification model related to psoriasis
CN117637035A (en) Classification model and method for multiple groups of credible integration of students based on graph neural network
CN113764031A (en) Prediction method of N6 methyladenosine locus in trans-tissue/species RNA
CN114758721B (en) Deep learning-based transcription factor binding site positioning method
CN111128300A (en) Protein interaction influence judgment method based on mutation information
CN116386733A (en) Protein function prediction method based on multi-view multi-scale multi-attention mechanism
CN112966702A (en) Method and apparatus for classifying protein-ligand complex
CN114627964B (en) Prediction enhancer based on multi-core learning and intensity classification method and classification equipment thereof
KR102166070B1 (en) Analysis method for efficiency of programmable nuclease and apparatus for the same
CN114300036A (en) Genetic variation pathogenicity prediction method and device, storage medium and computer equipment
JP3936851B2 (en) Clustering result evaluation method and clustering result display method
CN113343918A (en) Power equipment identification method, system, medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20211008