Multifaceted Protein-Protein Interaction Prediction Based On Siamese Residual RCNN
Multifaceted Protein-Protein Interaction Prediction Based On Siamese Residual RCNN
Multifaceted Protein-Protein Interaction Prediction Based On Siamese Residual RCNN
doi: 10.1093/bioinformatics/btz328
ISMB/ECCB 2019
Abstract
Motivation: Sequence-based protein–protein interaction (PPI) prediction represents a fundamental
computational biology problem. To address this problem, extensive research efforts have been
made to extract predefined features from the sequences. Based on these features, statistical algo-
rithms are learned to classify the PPIs. However, such explicit features are usually costly to extract,
and typically have limited coverage on the PPI information.
Results: We present an end-to-end framework, PIPR (Protein–Protein Interaction Prediction Based
on Siamese Residual RCNN), for PPI predictions using only the protein sequences. PIPR incorpo-
rates a deep residual recurrent convolutional neural network in the Siamese architecture, which
leverages both robust local features and contextualized information, which are significant for cap-
turing the mutual influence of proteins sequences. PIPR relieves the data pre-processing efforts
that are required by other systems, and generalizes well to different application scenarios.
Experimental evaluations show that PIPR outperforms various state-of-the-art systems on the bin-
ary PPI prediction problem. Moreover, it shows a promising performance on more challenging
problems of interaction type prediction and binding affinity estimation, where existing approaches
fall short.
Availability and implementation: The implementation is available at https://github.com/muhao
chen/seq_ppi.git.
Contact: muhaochen@ucla.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
predict PPIs in the form of binary classification. These approaches wide usability of our approach. Especially on the binding affinity es-
provide feasible solutions to the problem. However, the extracted timation of mutated proteins, PIPR is able to respond to the subtle
features used in these approaches only have limited coverage on changes of point mutations and provides the best estimation with
interaction information, as they are dedicated to specific facets of the smallest errors.
the protein profiles.
To mitigate the inadequacy of statistical learning methods, deep
learning algorithms provide the powerful functionality to process 2 Related works
large-scale data and automatically extract useful features for objective Sequence-based approaches provide a critical solution to the binary
tasks (LeCun et al., 2015). Recently, deep learning architectures have PPI prediction task. Homology-based methods (Philipp et al., 2016)
produced powerful systems to address several bioinformatics problems rely on BLAST to map a pair of sequences to known interacting pro-
related to single nucleotide sequences, such as genetic variants detec- teins. Alternatively, other works address the task with statistical
tion (Anderson, 2018), DNA function classification (Quang and Xie, learning models, including SVM (Guo et al., 2008; You et al.,
DNN-PPI does not incorporate physicochemical properties into the interaction type of two proteins. (iii) Binding affinity estimation
amino acid representations, and does not employ a Siamese learning aims at producing a regression model to estimate the strength of the
architecture to fully characterize pairwise relations of sequences. binding interaction.
Multi-Layer
Perceptron Binary cross-entropy Categorical cross-entropy
Pre-trained Embeddings
Protein Sequences
MQSPYPMTQVSNVDDGSLLK... MLERIQQLVNAVNDPRSDVAT...
...
Pooling Max Pool Max Pool Max Pool ... Max Pool
RCNN Unit
Convolution
Fig. 2. The structure of our residual RCNN encoder is shown on the right, and the RCNN unit is shown on the left. Each RCNN unit contains a convolution-pooling
layer followed a bidirectional residual GRU
max-pooling are fed into the bidirectional gated recurrent units in 2016). By adding the forwarded input values to the outputs, the corre-
our RCNN encoder. sponding neural layer is only required to capture the difference be-
The residual gated recurrent units. The gated recurrent unit tween the input and output values. This mechanism aims at improving
model (GRU) represents an alternative to the long-short-term mem- the learning process of non-linear neural layers by increasing the sensi-
ory (LSTM) network (Cho et al., 2014), which consecutively charac- tivity of the optimization gradients (He et al., 2016; Kim et al., 2016),
terizes the sequential information without using separated memory as well as preventing the model from the vanishing gradient problem.
cells (Dhingra et al., 2017). Each unit consists of two types of gates It has been widely deployed in deep learning architectures for various
to track the state of the sequence, i.e. the reset gate rt and the update tasks of image recognition (He et al., 2016), document classification
gate zt . Given the embedding vt of an incoming item (either a pre- (Conneau et al., 2017) and speech recognition (Zhang et al., 2017). In
trained amino acid embedding, or an output of the previous layer), our deep RCNN, the bidirectional GRU is incorporated with the re-
ð Þ
GRU updates the hidden state ht 3 of the sequence as a linear com- sidual mechanism, and will pass on the following outputs to its subse-
ð3Þ
bination of the previous state ht1 and the candidate state h ~ ð3Þ of a
t quent neural network layer:
new item vt , which is calculated as below. h! i
ð Þ
ht 5 ¼ ResGRUðvt Þ ¼ GRUðvt Þ þ vt ; GRU ðvt Þ þ vt :
ð Þ
ht 3 ¼ GRUðvt Þ ¼ zt h~ ð3Þ þ ð1 zt Þ hð3Þ
t t1
In our development, we have found that the residual mechanism
ð3Þ
zt ¼ r Mz vt þ Nz ht1 þ bz is able to drastically simplify the training process, and largely
decreases the epochs of parameter updates for the model to
~ ð3Þ ¼ tanh Ms vt þ rt Ns hð3Þ þ bs
h converge.
t t1
ð3Þ
rt ¼ r Mr vt þ Nr ht1 þ br : 3.2.2 Protein sequence encoding
Figure 2 shows the entire structure of our RCNN encoder. The
Notation thereof denotes the element-wise multiplication. The RCNN encoder ERCNN ðSÞ alternately stacks multiple occurrences of
update gate zt balances the information of the previous sequence the above two intermediary neural network components. A convolu-
and the new item, where capitalized M and N denote different tion layer serves as the first encoding layer to extract local features
weight matrices, b denote bias vectors and r is the sigmoid func-
from the input sequence. On top of that, a residual GRU layer takes
tion. The candidate state h ~ ð3Þ is calculated similarly to those in a
t in the preserved local features, whose outputs are passed to another
traditional recurrent unit, and the reset gate rt controls how much
convolution layer. Repeating of these two components in the net-
information of the past sequence contributes to h ~ ð3Þ . Note that GRU
t work structure conducts an automatic multi-granular feature aggre-
generally performs comparably to LSTM in sequence encoding
gation process on the protein sequence, while preserving the
tasks, but is less complex and requires much fewer computational
sequential and contextualized information on each granularity of
resources for training.
the selected features. The last residual GRU layer is followed by an-
A bidirectional GRU layer characterizes the sequential informa-
other convolution layer for a final roundh of local feature i selection to
tion in two directions. It contains the forward encoding process
!
produce the last hidden states H0 ¼ h01 ; h02 ; :::; h0jH0 j . Note that the
GRU that reads the input vector sequence X ¼ ½v1 ; v2 ; :::; vl from v1 dimensionality of the last hidden states does not need to equal that
of the previous hidden states. A high-level sequence embedding of
to vl , and a backward encoding process GRU that reads in the op-
the entire protein sequence is obtained from the global average-
posite direction. The encoding results of both processes are con- PjH0 j
ð Þ pooling (Lin et al., 2013) of H0 , i.e. ERCNN ðSÞ ¼ jH10 j i¼1 h0i .
catenated for each input item vt , i.e. ht 4 ¼ BiGRUðvt Þ ¼
!
½GRUðvt Þ; GRUðvt Þ. 3.2.3 Pre-trained amino acid embeddings
The residual mechanism passes on an identity mapping of the To support inputting the non-numerical sequence information, we
GRU inputs to its output side through a residual shortcut (He et al., provide a useful embedding method to represent each amino acid
PIPR i309
where n is the negative sampling size, and a0c;k is a negative sample 4.1 Datasets
that does not co-occur with ac;t in the same context. Guo’s datasets. Guo et al. (2008) generate several datasets from dif-
The second part aph represents the similarity of electrostaticity ferent species for the binary prediction of PPIs. Each dataset con-
and hydrophobicity among amino acids. The 20 amino acids can be tains a balanced number of positive and negative samples. Among
clustered into 7 classes based on their dipoles and volumes of the these resources, the Yeast dataset is a widely used benchmark by
side chains to reflect this property. Thus, aph is a one-hot encoding most state-of-the-art methods (Hashemifar et al., 2018; Wong et al.,
based on the classification defined by Shen et al. (2007). 2015; You et al., 2013, 2014). There are 2497 proteins forming
11 188 cases of PPIs, with half of them representing the positive
3.3 Learning architecture and learning objectives cases, and the other half the negative cases. The positive cases are
Our framework characterizes the interactions in the following two selected from the database of interacting proteins DIP_20070219
stages. (Salwinski et al., 2004), where proteins with fewer than 50 amino
acids or 40% sequence identity are excluded. We use the full pro-
tein sequences in our model, which are obtained from the UniProt
3.3.1 Siamese architecture
(Consortium et al., 2018). The negative cases are generated by ran-
Given a pair of proteins p ¼ ðS1 ; S2 Þ 2 I, the same RCNN encoder
domly pairing the proteins without evidence of interaction, and fil-
is used to obtain the sequence embeddings ERCNN ðS1 Þ and
tered by their sub-cellular locations. In other words, non-interactive
ERCNN ðS2 Þ of both proteins. Both sequence embeddings are com-
pairs residing in the same location are excluded.
bined using element-wise multiplication, i.e. ERCNN ðS1 Þ
In addition, we combine the data for Caenorhabditis elegans,
ERCNN ðS2 Þ. This is a commonly used operation to infer the
Escherichia coli and Drosophila melanogaster as the multi-species
relation of sequence embeddings (Hashemifar et al., 2018; Jiang
dataset. We use the cluster analysis of the CD-HIT (Li and Godzik,
et al., 2018; Rocktäschel et al., 2016; Tai et al., 2015). Note that
2006) program to generate non-redundant subsets. Proteins with
some works use the concatenation of sequence embeddings (Sun
fewer than 50 amino acids or high sequence identify (40, 25, 10 or
et al., 2017; Yin and Schütze, 2015) instead of multiplication, which
1%) are removed.
we find to be less effective in modeling the symmetric relations of
STRING datasets. The STRING database (Szklarczyk et al.,
proteins.
2016) annotates PPIs with their types. There are seven types of inter-
actions: activation, binding, catalysis, expression, inhibition, post-
3.3.2 Learning objectives
translational modification (ptmod) and reaction. We download all
An MLP with leaky ReLU (Maas et al., 2013) is applied to the previ-
interaction pairs for Homo sapiens from database version 10.5
ous sequence pair representation, whose output ^s p is either a vector
(Szklarczyk et al., 2016), along with their full protein sequences.
or a scalar, depending on whether the model solves a classification
Among the corresponding proteins, we randomly select 3000 pro-
or a regression task for the protein pair p. The entire learning archi-
teins and 8000 proteins that share <40% of sequence identity to
tecture is trained to optimize the following two types of losses
generate two subsets. In this process, we randomly sample instances
according to different PPI prediction problems.
of different interaction types to ensure a balanced class distribution.
i. Cross-entropy loss is optimized for the two classification prob- Eventually, the two generated datasets, denoted by SHS27k and
lems, i.e. binary prediction and interaction type prediction. In SHS148k, contain 26 945 cases and 148 051 cases of interactions re-
this case, the MLP output ^s p is a vector, whose dimensionality spectively. We use these two datasets for the PPI type prediction
equals the number of classes m. ^s p is normalized by a softmax task.
ð pÞ SKEMPI dataset. We obtain the protein binding affinity data
function, where the i-th dimension si p ¼ Pexp ^s i
expð^s j p Þ
corresponds
j from SKEMPI (the structural database of kinetics and energetics of
to the confidence score for the i-th class. The learning objective mutant protein interactions) (Moal and Fernández-Recio, 2012) for
is to minimize the following cross-entropy loss, where cp is a the affinity estimation task. It contains 3047 binding affinity
one-hot indicator for the class label of protein pair p. changes upon mutation of protein sub-units within a protein
i310 M.Chen et al.
complex. The binding affinity is measured by equilibrium dissoci- test the model so as to ensure an unbiased evaluation. We aggregate
ation constant (Kd), reflecting the strength of biomolecular interac- fix metrics on the test cases of each fold, i.e. the overall accuracy,
tions. The smaller Kd value means the higher binding affinity. Each precision, sensitivity, specificity, F1 and Matthews correlation coef-
protein complex contains single or multiple amino acid substitu- ficient (MCC) on positive cases. All these metrics are preferred to be
tions. The sequence of the protein complex is retrieved from the pro- higher to indicate better performance. Based on the reported accur-
tein data bank (PDB) (Berman et al., 2000). We manually replace acy over 5-folds, we also conduct two-tailed Welch’s t-tests (Welch,
the mutated amino acids. For duplicate entries, we take the average 1947) to evaluate the significance of the improvement on different
Kd. The final dataset results in the binding affinity of 2792 mutant pairs of approaches. The P-values are adjusted by the Benjamini–
protein complexes, along with 158 wild-types. Hochberg procedure (Benjamini and Hochberg, 1995) to control the
false discovery rate for multiple hypothesis testing.
4.2 Binary PPI prediction Results. As shown in Table 1, the CNN-based architecture,
Binary PPI prediction is the primary task targeted by a handful of DPPI, demonstrates state-of-the-art performance over other base-
Table 1. Evaluation of binary PPI prediction on the Yeast dataset based on 5-fold cross-validation. We report the mean and SD for the test
sets
Methods Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) F1-score (%) MCC (%)
SVM-AC 87.35 6 1.38 87.82 6 4.84 87.30 6 5.23 87.41 6 6.33 87.34 6 1.33 75.09 6 2.51
kNN-CTD 86.15 6 1.17 90.24 6 1.34 81.03 6 1.74 NA 85.39 6 1.51 NA
EELM-PCA 86.99 6 0.29 87.59 6 0.32 86.15 6 0.43 NA 86.86 6 0.37 77.36 6 0.44
SVM-MCD 91.36 6 0.4 91.94 6 0.69 90.67 6 0.77 NA 91.3 6 0.73 84.21 6 0.66
MLP 94.43 6 0.3 96.65 6 0.59 92.06 6 0.36 NA 94.3 6 0.45 88.97 6 0.62
RF-LPQ 93.92 6 0.36 96.45 6 0.45 91.10 6 0.31 NA 93.7 6 0.37 88.56 6 0.63
SAE 67.17 6 0.62 66.90 6 1.42 68.06 6 2.50 66.30 6 2.27 67.44 6 1.08 34.39 6 1.25
DNN-PPI 76.61 6 0.51 75.1 6 0.66 79.63 6 1.34 73.59 6 1.28 77.29 6 0.66 53.32 6 1.05
DPPI 94.55 96.68 92.24 NA 94.41 NA
Table 4. Accuracy (%) and fold changes over zero rule for PPI interaction type prediction on two STRING datasets based on 10-fold cross-
validation
Methods Rand Zero rule SVM RF AdaBoost kNN Logistic SVM RF AdaBoost kNN Logistic SCNN SRGRU PIPR
SHS27k 14.28 16.70 33.17 44.82 28.67 35.44 25.47 35.56 45.76 31.81 35.56 30.57 55.54 51.06 59.56
(fold) — 1.00 1.99 2.68 1.72 2.12 1.52 2.13 2.74 1.90 2.13 1.83 3.33 3.06 3.57
SHS148k 14.28 16.21 28.17 36.01 27.87 33.81 24.96 31.37 36.65 29.67 33.13 26.96 55.29 54.05 61.91
(fold) — 1.00 1.74 2.22 1.72 2.09 1.54 1.94 2.26 1.83 2.04 1.66 3.41 3.33 3.82
MSE (102 ) 1.70 2.20 1.77 1.98 1.86 1.84 1.49 1.84 0.87 0.95 0.63
MAE (102 ) 9.56 11.81 9.81 11.15 10.20 11.04 9.06 10.69 6.49 7.08 5.48
Corr 0.564 0.353 0.546 0.451 0.501 0.501 0.640 0.508 0.831 0.812 0.873
Note: Each measurement is an average of the test sets over 10-fold cross-validation.
Each boldfaced number indicates the best of the corresponding metric.
He,K. et al. (2016) Deep residual learning for image recognition. In: CVPR, Reddi,S.J. et al. (2018) On the convergence of Adam and Beyond. In:
pp. 770–778. International Conference on Learning Representations, pp. 1–23.
Ho,Y. et al. (2002) Systematic identification of protein complexes in OpenReview, Amherst, MA.
Saccharomyces cerevisiae by mass spectrometry. Nature, 415, 180–183. Rocktäschel,T. et al. (2016) Reasoning about entailment with neural atten-
Hu,B. et al. (2014) Convolutional neural network architectures for matching tion. In: International Conference on Learning Representations (ICLR), pp.
natural language sentences. In: Ghahramani,Z. et al. (eds) Advances in Neural 1–9. OpenReview, Amherst, MA.
Information Processing Systems 27. Curran Associates, Inc., pp. 2042–2050. Salwinski,L. et al. (2004) The database of interacting proteins: 2004 update.
Huang,Y.-A. et al. (2015) Using weighted sparse representation model com- Nucleic Acids Res., 32, D449–D451.
bined with discrete cosine transformation to predict protein–protein interac- Shen,J. et al. (2007) Predicting protein–protein interactions based only on
tions from protein sequence. BioMed Res. Int., 2015, 902198. sequences information. Proc. Natl. Acad. Sci. USA, 104, 4337–4341.
James,G. et al. (2013) An Introduction to Statistical Learning. Vol. 112. Silberberg,Y. et al. (2014) A method for predicting protein-protein interaction
Springer, New York. types. PLoS One, 9, e90904.
Jansen,R. et al. (2003) A Bayesian networks approach for predicting protein–- Skrabanek,L. et al. (2008) Computational prediction of protein–protein inter-