GPCR‐CA: A cellular automaton image approach for predicting G‐protein–coupled receptor functional classes

X Xiao, P Wang, KC Chou - Journal of computational chemistry, 2009 - Wiley Online Library
X Xiao, P Wang, KC Chou
Journal of computational chemistry, 2009Wiley Online Library
Given an uncharacterized protein sequence, how can we identify whether it is a G‐protein–
coupled receptor (GPCR) or not? If it is, which functional family class does it belong to? It is
important to address these questions because GPCRs are among the most frequent targets
of therapeutic drugs and the information thus obtained is very useful for “comparative and
evolutionary pharmacology,” a technique often used for drug development. Here, we present
a web‐server predictor called “GPCR‐CA,” where “CA” stands for “Cellular …
Abstract
Given an uncharacterized protein sequence, how can we identify whether it is a G‐protein–coupled receptor (GPCR) or not? If it is, which functional family class does it belong to? It is important to address these questions because GPCRs are among the most frequent targets of therapeutic drugs and the information thus obtained is very useful for “comparative and evolutionary pharmacology,” a technique often used for drug development. Here, we present a web‐server predictor called “GPCR‐CA,” where “CA” stands for “Cellular Automaton” (Wolfram, S. Nature 1984, 311, 419), meaning that the CA images have been utilized to reveal the pattern features hidden in piles of long and complicated protein sequences. Meanwhile, the gray‐level co‐occurrence matrix factors extracted from the CA images are used to represent the samples of proteins through their pseudo amino acid composition (Chou, K.C. Proteins 2001, 43, 246). GPCR‐CA is a two‐layer predictor: the first layer prediction engine is for identifying a query protein as GPCR on non‐GPCR; if it is a GPCR protein, the process will be automatically continued with the second‐layer prediction engine to further identify its type among the following six functional classes: (a) rhodopsin‐like, (b) secretin‐like, (c) metabotrophic/glutamate/pheromone; (d) fungal pheromone, (e) cAMP receptor, and (f) frizzled/smoothened family. The overall success rates by the predictor for the first and second layers are over 91% and 83%, respectively, that were obtained through rigorous jackknife cross‐validation tests on a new‐constructed stringent benchmark dataset in which none of proteins has ≥40% pairwise sequence identity to any other in a same subset. GPCR‐CA is freely accessible at http://218.65.61.89:8080/bioinfo/GPCR‐CA, by which one can get the desired two‐layer results for a query protein sequence within about 20 seconds. © 2008 Wiley Periodicals, Inc. J Comput Chem 2009
Wiley Online Library