Nothing Special   »   [go: up one dir, main page]

Proteins: Automated Prediction of Ligand-Binding Sites in Proteins

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

proteins

STRUCTURE O FUNCTION O BIOINFORMATICS

Automated prediction of ligand-binding


sites in proteins
Rodney Harris, Arthur J. Olson, and David S. Goodsell*
Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037

ABSTRACT INTRODUCTION

Methods for the prediction of protein function have progressed hand-in-hand with
We present a method, termed
the growth of structural information. For decades, visualization and simulation have
AutoLigand, for the predic-
tion of ligand-binding sites been used to test hypotheses about molecular recognition and catalysis, based on the
in proteins of known struc- crystallographic and NMR structures of macromolecules. One of the major successes of
ture. The method searches this work is rational drug design, wherein the knowledge of molecular interaction is
the space surrounding the applied to predict new, more effective inhibitors for a given target. Currently, with the
protein and finds the contig- growing body of structural data from structural genomics, these studies have expanded
uous envelope with the speci- to include de novo function prediction. The Protein Data Bank (http://www.pdb.org,
fied volume of atoms, which Oct 3, 2006 release) has over 500 structures designated as ‘‘hypothetical proteins,’’ for
has the largest possible inter- which a structure is known but the function is not. Methods are needed to predict
action energy with the pro- functions for these proteins and many others that will be provided in the near future
tein. It uses a full atomic by structural genomics projects.1–4 Both of these applications—rational drug design
representation, with atom
and function prediction—require a reliable method for identifying and characterizing
types for carbon, hydrogen,
oxygen, nitrogen and sulfur the primary ligand-binding site of a protein.
(and others, if desired), and Existing methods for ligand-site identification can be divided into three broad cate-
is designed to minimize the gories: genomic comparisons, ligand shape matching, and site geometry searching.
need for artificial geometry. Genomic techniques predict ligand-binding sites by the comparison of homologous
Testing on a set of 187 genomes or functional site residues and may include surface geometry.5–7 These tech-
diverse protein-ligand com- niques are based on the hypothesis that important binding site residues are conserved
plexes has shown that the within the gene families. Ligand-shape methods, on the other hand, are based on the
method is successful in pre- chemistry of a ligand, building three-dimensional templates or databases to represent
dicting the location and ap- characteristic distributions of hydrogen bonds, hydrophobicity, and other chemical pa-
proximate volume of the
rameters.8–12 However, the majority of binding-site prediction methods, including the
binding site in 73% of cases.
one described here, analyze the geometry of the receptor.
Additional testing was per-
formed on a set of 96 pro- Since most binding sites are found in clefts or pockets on a protein surface,13 man-
tein-ligand complexes with ual examination of a protein surface is a remarkably effective method for finding a
crystallographic structures of binding site. Thus, much of the work on structure-based binding site prediction has
apo and holo forms, and focused on geometric methods that look for clefts or pockets. The receptor geometry
AutoLigand was able to pre- methods can be divided into pure geometric methods,14–19 geometry with energy field
dict the binding site in 80% methods,20–22 and geometry mixed with other methods.23–25 Below, we describe
of the apo structures. three methods that are similar to the approach presented in this report. Each of these
Proteins 2008; 70:1506–1517. methods makes predictions based on the receptor structure only, with the addition of
V
C 2007 Wiley-Liss, Inc. sequence conservation information in one case.
The PocketFinder algorithm23 calculates a Lennard–Jones potential over a 1 Å grid,
Key words: functionalsitepre- clamps all unfavorable regions and smoothes the clamped potential grid, and then con-
diction; structural genomics; tours the grid to identify pockets. Finally, pockets of volumes smaller than 100 Å3 are
rational drug design; Auto- ignored. The method uses several tunable parameters, including a smoothing parameter
Dock.
and the contour level, which were derived from the analysis of a training step. The authors

*Correspondence to: David S. Goodsell, The Scripps Research Institute—MB5, 10550 N. Torrey Pines Road, La Jolla, CA 92037.
E-mail: goodsell@scripps.edu
Received 19 December 2006; Revised 9 May 2007; Accepted 30 May 2007
Published online 1 October 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.21645

1506 PROTEINS V
C 2007 WILEY-LISS, INC.
Prediction of Ligand-Binding Sites

report that in 80.9% of their test cases, at least 50% of the optimal site for binding, and the optimal shape for the
ligand overlapped with the largest predicted pocket, and in ligand, given this number of atoms. This approach was
11.8%, the ligand overlapped the second largest pocket. presented in our previous report, using a less compre-
Q-SiteFinder24 uses the GRID method26 to evaluate hensive version of the algorithm.29
the van der Waals potential of a methyl probe. A thresh-
old is then applied to identify the best binding sites, and
a clustering method is used to identify sites in close spa- METHODS
tial proximity. The method has several tunable parame-
Overview
ters, including the threshold and a ‘‘connection range’’
for clustering. The program is very efficient, quickly pro- AutoLigand identifies a contiguous envelope of maxi-
ducing clusters of favorable energy points. The authors mal affinity for a given macromolecular structure, using
report that in 90% of proteins tested, the known ligand a grid-based representation of the binding affinity poten-
was identified in one of the top three predicted sites. tial. We use the term ‘‘envelope’’ to represent a contigu-
SURFNET-ConSurf25 is a two-stage method that first ous region in space filled with points that represent
identifies clefts and then trims them according to residue potential atomic centers for ligand atoms. In brief, the
conservation. The SURFNET code27 is a pure geometry user first calculates an affinity potential grid around the
method that identifies clefts by filling the space with protein, and then a flood-fill and a site-optimization pro-
spheres and then clusters the spheres within pockets. cess are used to identify the best contiguous envelope
Using SURFNET, the authors found that the most within the energy grid. As shown in Figure 1, the results
ligand-binding sites occur in the largest pocket. In the may be quite different than results from a simple con-
second part of the method, the spheres are trimmed touring method. In real ligands, the different atoms in
based on their distance from conserved residues scored the full molecule rarely hold positions of equal binding
from the ConSurf-HSSP database.28 By mixing a pure affinity—some will provide strong contributions through
geometry method with a genomic method, this program hydrogen bonds or salt bridges, and others are present
combines the strengths of both methods to generate bet- merely to act as bridges between these high-affinity frag-
ter results than these methods achieve individually. The ments. A simple contouring algorithm will identify the
method has many tunable parameters that define the locations of the high-affinity fragments, but not the loca-
placement of spheres and incorporation of genomic in- tion (or need for) of the bridging atoms. In addition,
formation. The authors report that in 75% of proteins very small changes in contour thresholds often have very
tested, the known ligands will show at least a 20% over- large effects on the shape of the enclosed space. AutoLi-
lap with one of the four largest pockets. gand is designed to address these problems: it finds the
We present a binding site prediction method, termed contiguous envelope that achieves the highest total affin-
AutoLigand, that has several advantages. It uses a full ity, wherein some of the individual points may be of
atomic representation, with atom types for carbon, lower affinity offset by points of higher affinity. Thus,
hydrogen, oxygen, nitrogen, and sulfur (and others, if AutoLigand will consistently find an envelope that enclo-
desired), to yield a chemically detailed prediction of the ses the desired number of atoms, with the best total
shape of a ligand within a predicted binding site. It is binding energy, whereas a contouring method cannot
designed to minimize the need for artificial geometry: ensure that a similar solution will be found.
the user chooses (1) the particular force field that will be
used, and (2) the number of atoms in the ligand. The
Affinity potential grids
method then searches the entire protein and finds the
contiguous envelope with the specified volume of atoms, Affinity potentials are precalculated by AutoGrid,30
which has the largest possible interaction energy with the using the default AutoGrid parameters in all cases. The
protein. It is our hypothesis that protein-binding sites protein is placed within a large grid, and a probe atom is
have evolved to create a region of maximal affinity for traced through the full grid, tabulating the interaction
the given size and shape of their ligand(s), and that energy at each point. The AutoGrid3 empirical free
by identifying the contiguous region of maximal-binding energy force field includes terms for dispersion/repulsion,
affinity, we will identify functional ligand-binding sites. hydrogen bonding, electrostatics, and desolvation. Affin-
We envision two ways to use the method. With a pro- ity potentials were created for six atom types: aliphatic
tein of unknown function, the user will test a wide range carbon, aromatic carbon, hydrogen, oxygen, nitrogen,
of ligand volumes, choosing the one that shows the best and sulfur, using a grid spacing of 1 Å and a smoothing
overall binding strength to identify the predicted ligand value of 0.5 Å (see Fig. 1). The parameters for nitrogen
that binds within the predicted binding site. This and oxygen are so similar that they were combined into
approach is presented in this manuscript. The method a single type for this analysis. Finally, the affinity grids
may also be used in rational drug design, starting with a were combined by choosing the atom type with strongest
desired size for the inhibitor, and then identifying the affinity at each point. Two grids were created for use by

DOI 10.1002/prot PROTEINS 1507


R. Harris et al.

Figure 1
HIV-1 protease (PDB file 1hps). The backbone of the HIV-1 protease is shown with a blue tube. (a) The inhibitor is shown with balls-and-sticks, with carbon gray,
nitrogen blue, oxygen red, and hydrogen light blue. (b) The affinity potential for carbon is shown with white surfaces, and the potential for oxygen is shown with red
surfaces. (c) An AutoLigand envelope with 200 points is shown as spheres with carbon gray, oxygen/nitrogen in magenta, sulfur in yellow, and hydrogen in light blue. All
molecular figures in this report were generated using the Python Molecular Viewer.34

AutoLigand: one holds the combined affinity values, and points is attained. This flood-fill envelope is necessarily
the other stores the atom type that gives the best affinity contiguous due to being constructed only from neighbor
at each point. points.
The grid spacing of 1 Å has the possibility of causing To speed up this process, we actually initiate flood fills
significant artifacts, given the steeply rising potential of at a reduced set of points. Every other point in each
the repulsive part of the interaction energy. The smooth- direction is used for small envelopes of less that 50
ing process, however, is designed to address this problem, points (i.e., stepping through the grid in 2 Å incre-
as described in our previous work.30 We tested this in ments), and every fourth point is used for larger enve-
AutoLigand by generating HIV-1 protease envelopes from lopes (stepping through the grid in 4 Å increments),
maps translated by 0.5 Å in each of the three coordinate yielding identical results as the exhaustive search, which
directions. In all cases, the maps were very similar in uses every point (data not shown). Because nearby start-
form and total interaction energies. ing points may coalesce to the same flood solution with
the same total energy, each new starting point is also
required to start outside of the existing flood fills. The 10
AutoLigand
best flood fill solutions are then used in the next step.
AutoLigand identifies optimal envelopes within the The second step starts with each of the flood-fill enve-
grid maps in three steps: (1) flood fill, (2) local migra- lopes, which may have some grid points in common, and
tion, and (3) a ray-casting neighborhood search. allows the envelope to migrate via an ‘‘affinotaxis’’ move-
The migration and ray-casting steps are performed in ment (analogous to the chemotaxis movement of an
iterative cycles until a stable envelope is generated. Sam- amoeba). The movement of the constant-volume enve-
ple envelopes from each step in the process are shown in lope occurs by adding the neighbor point with the best
Figure 2. energy/volume value and removing the worst point from
In the first step, AutoLigand moves to each point in the current envelope list.
the combined affinity map and does a simple flood fill But, before this point-exchange movement can take
around that point. Points are added one at a time to the place, the point to be removed must be checked to see if
envelope by searching all orthogonal neighboring points its removal would break the contiguity requirement with
and adding the one of best binding energy. This process any other points in the envelope. The check is performed
is repeated until the user-specified number of envelope in two passes. The first pass checks the point in the enve-

1508 PROTEINS DOI 10.1002/prot


Prediction of Ligand-Binding Sites

ger and is only necessary if the much faster initial check


flags the point as nonremovable.
The initial removable/nonremovable states are assigned
as follows. The number and relative location of all of the
adjacent orthogonal and diagonal neighbor points to the
point being tested are determined. If there is only one
orthogonal neighbor, the test point is flagged removable
as it is at the end of a chain and will not disrupt conti-
nuity if it is removed. If there are two orthogonal neigh-
bors, then there must be a third point present that is di-
agonal to the test point and orthogonal to both of the
neighbor points (this forms a square among the four
points) for the test point to be flagged removable. The
presence of the diagonal point forming the connecting
square ensures that neither of the two orthogonal neigh-
bors will be disconnected from the contiguous envelope
if the test point is removed. The general state rule is that
there must be additional points diagonal to the test point
such that each orthogonal neighbor is connected through
at least one square so that no point is disconnected from
the contiguous envelope if the test point is removed. The
minimum number of connecting squares is one less than
the number of orthogonal neighbors; that is, one square
for two orthogonal neighbors, two squares for three or-
thogonal neighbors, and so forth.
After the first pass, each point assigned a nonremov-
able state must be rechecked to determine if it is part of
a loop structure, which would then allow it to be
removed and not leave any other points disconnected
from the contiguous volume. This is necessary because
the first pass only checks points adjacent to the test
point. A nonremovable flagged point is part of a loop
structure if all of its orthogonal neighbor points are able
to be linked to each other through points in the contigu-
ous envelope. Any nonremovable flagged point found as
part of a loop structure is reflagged as removable. The
state of any point must be rechecked after any movement
of the envelope during migration.
The flood-fill envelopes are allowed to migrate one
Figure 2 point at a time until the envelope converges on a low
Summary of the AutoLigand method. Optimal envelopes are found in three
steps, as shown here for a 60-point envelope in HIV protease. First (upper
energy/volume solution. If the lowest affinity point is
image), points are added around a seed point using a simple flood fill nonremovable, then the second lowest affinity point is
algorithm, typically yielding a compact envelope in the neighborhood of the seed used. The migration of the envelope stops when the total
point. Then, a migration step is performed, removing the worst point in the
envelope and adding the best neighboring point. The second image shows that energy/volume is not improved with further changes.
half a dozen points were removed and replaced elsewhere in this case. The final The final step in AutoLigand probes with a longer
step is a ray-casting process that searches at greater distances from the envelope.
Three sample rays are shown here in green. The final envelope is shown at the
reach than the shell of adjacent neighbors to look for
bottom. nearby pockets of higher affinity. AutoLigand extends a
set of linear rays of up to 10 grid points away from the
edge points of the migrated envelope to search for higher
affinity sites. In keeping with the amoeba analogy, this is
lope list and assigns a removable/nonremovable state to like protruding pseudopods in search of better affinity
the point based only upon the presence or absence of its zones. A ray is extended in the six orthogonal directions
orthogonal and diagonal neighbors. The second pass re- from each point in the volume in turn, and the total
evaluates any point flagged as nonremovable to check if energy is summed for all its points. The ray extension is
it is part of a loop structure and reflags it as removable if terminated when it intersects the receptor, intersects the
it is found to be in a loop. This second check takes lon- envelope, reaches the edge of the grid, or reaches the

DOI 10.1002/prot PROTEINS 1509


R. Harris et al.

maximum user-defined length (a maximum length of 10 remove alternate locations and to have consistent naming
points was used here). Sample rays from one particular of atoms. Hydrogen atoms were then added automati-
point in an envelope are shown in Figure 2. The total cally using Babel, and then nonpolar hydrogen atoms
energy value for each ray for each length is ranked and if were merged.
the highest ranked ray is better than the lowest total The test set of 96 protein complexes with apo and
value for the same number of removable points within holo forms was taken from the work of Gunasekaran and
the envelope, the set of ray grid points is added to the Nussinov.31 Coordinates for the apo and holo structures
envelope and the worst envelope points are removed. were taken from the Protein Data Bank, superimposed
Iterative cycles of migration and ray extension are per- using all alpha carbon positions, and then treated simi-
formed until the envelope converges on a consistent low larly to the other test set.
energy solution. We use an intersection/union (I/U) ratio to quantify
the overlap between a predicted envelope and a known
ligand in ligand-receptor test systems. The intersection is
Points versus atomic volumes
the number of points within the envelope that are within
A challenging aspect of the method is the difference 1 Å of the ligand. The union is the total footprint of
between the number of points in the contiguous enve- both the envelope and the ligand, including both regions
lope, and the number of atoms that it represents. In that overlap and regions that do not. The value of I/U is
these affinity grids, each point in the grid represents a close to 1.0 when the envelope overlaps all the ligand
possible position for an atom center, and so the envelope without any extra fill exterior to the ligand. Envelopes
is a representation of the collection of atom centers, not that protrude into nonligand occupied spaces produce a
a representation of the space-filling volume of the mole- lower value.
cule. The difference is underscored by a simple example:
imagine a hypothetical envelope of four points on a 1 Å
grid. If these points are arranged in a square, the block-
Comparison of envelopes
shaped envelope may represent a diatomic molecule,
with atoms placed on two diagonal points—the remain- In applications of the AutoLigand method, it will be
ing two points are too close to represent real atoms in a useful to compare envelopes from different proteins, for
small molecule. If, however, the four points are arranged instance, to assign the similar function to proteins with
in a line-shaped envelope, we can imagine a molecule similar ligand-binding envelopes or to explore the effect
composed of three atoms in a line that will fit within the of protein motion on the local characteristics of the
envelope represented by predicted points. Thus, the shape binding site. To compare the envelopes generated from
of the predicted set of points will affect the actual num- different receptor proteins, we have used a modified ver-
ber of atoms that it represents: blocky sets of points rep- sion of AutoDock to overlap two envelopes. AutoDock
resent fewer atoms than the linear sets of points. uses a grid-based technique for the fast evaluation of
We have solved this problem by calculating an atomic interaction energies during a docking simulation. In our
volume for each set of points. Atomic volumes are esti- modified version of AutoDock, we have developed a dif-
mated using the grid representation, by counting the ferent set of potential grids that describe the overlap of
number of points within a 1 Å radius to the points in two sets of points.
the envelope. However, it is conceptually difficult to do The overlap is performed in two steps. First, the set of
the search and optimization of the ligand envelope keep- points in one envelope is chosen as the ‘‘target’’ envelope,
ing the atomic volume of the envelope constant—as the embedded in a grid, and an overlap potential is created
shape of the envelope changes during the search—a given on that grid. The potential has a value of 21.0 for dis-
volume may correspond to different numbers of points tances of 0–1.0 Å from the nearest point, and values
depending on how extended or blocky it is. Instead, we from 21.0 to 0.0 at distances between 1.0 and 2.0 Å
perform the search using a constant number of points, from the nearest point. (Negative values are used for easy
but optimize the envelope based on the value of total compatibility with the existing free energy evaluation
energy/atomic volume. used in the AutoDock code). A separate overlap grid
is calculated for each atom type represented in the
Atomic coordinates and overlap calculation
envelope.
The second envelope is then overlapped on the target
The test set of 187 protein complexes was taken from envelope using the docking methods in AutoDock. If
the reported calibration of the AutoDock energy func- there is a perfect overlap, each point in the envelope will
tion.30 In that study, coordinates were obtained from the overlap a portion of the overlap grid with a value of 21,
Protein Data Bank (http://www.pdb.org). These com- so that the total docking score will be equal to the num-
plexes were checked and corrected if necessary for the ber of points in the envelope. For less perfect overlaps,
proper biological unit, and files were regularized to the score is a direct measure of the intersection volume.

1510 PROTEINS DOI 10.1002/prot


Prediction of Ligand-Binding Sites

Software the best solutions without a priori knowledge of the I/U


overlap.
AutoLigand and data-processing scripts were written in
The data are replotted in Figure 4(a), presenting total
Python 2.4 and they are available by request from the
energy/volume versus volume, using the predicted atomic
authors. Visualization and analysis were performed
volume of the envelope instead of the number of points.
within the Python Molecule Viewer and the visual-pro-
This reorders points on the graph based on how blocky
gramming environment Vision.32 To a first approxima-
or linear the envelope is, and a clear signal appears for
tion, computation times with the current version of
use in ligand volume prediction. The energy/volume rises
AutoLigand increase linearly with the number of points
rapidly and then declines as the size of each successive
in the envelope, requiring  2 min for a 100-point enve-
envelope is increased. The insets show what is happening
lope to 35 min for 600-point envelope, using a 3.4 GHz
as the binding site is filled. At 20 envelope points
Intel XEON-EMT CPU.
(volume 5 78 Å3), only part of the ligand is covered and
the envelope has not yet taken advantage of the entire
RESULTS AND DISCUSSION binding site. At the peak, 70 points (volume 5 183 Å3),
the envelope has completely engulfed the ligand. When
General approach the envelope becomes too large, 180 points (volume 5
Figure 1 demonstrates the AutoLigand approach and 391 Å3), the additional points fit tightly into regions of
contrasts it with the simple isocontour methods. The lower affinity (mostly hydrogens in this case), thus reduc-
central image shows typical affinity potentials calculated ing the energy/volume value. The points colored red in
for oxygen and carbon probes around HIV-1 protease. the graph show that the I/U values also peak at the maxi-
When isocontoured, these types of affinity potentials mal energy/volume, thus verifying the prediction.
show a characteristic set of disconnected, small favorable This spike-shaped signal is particularly clear because
regions. It is apparent that a large ligand, such as the one the arabinose-binding protein has an enclosed binding
shown in Figure 1(a), would need to span between sev- pocket, and similar profiles were found to be a character-
eral of these small regions of high affinity to achieve the istic signature of small enclosed binding sites. In other
highest total affinity. receptor–ligand systems, wherein the ligand-binding
Figure 1(c) shows an AutoLigand envelope composed pocket is in a cleft or tunnel, larger envelopes often
of 200 points. The envelope fills most of the high affinity ‘‘leak’’ out onto the surface of the receptor, producing
region at the center of the receptor and has overlapped long chains of points that increase the total volume with-
most of the ligand molecule. The areas of mismatch out adding much to the total affinity, and thus decrease
between the fill volume and the ligand identify possible the energy/volume values in the plot.
regions, wherein the improvements could be made to the For example, Figure 4(b) presents results for the HIV-1
inhibitor, by removing atoms in the ligand that extend protease (PDB 1hps). The plot of energy/volume versus
out of the envelope and adding atoms to regions in the volume for this large ligand–receptor system shows a
envelope that are not occupied ligand atoms. peak at a higher volume without a sharp spike shape. An
envelope of 240 points (volume 5 665 Å3) only covers
part of the ligand, but when 350 points (volume 5 854
Points and atomic volumes
Å3) are used, most of the ligand is covered. The ligand-
The consequences of using a volume metric instead of binding pocket of the HIV protease is much more open
number of points is shown in Figure 3, for a particularly than the binding pocket of the arabinose-binding protein
well-behaved system: L-arabinose bound to L-arabinose- and when 500 points (volume 5 1189 Å3) are used, the
binding protein (PDB 1abe). In this protein, the binding envelope extends out onto the surface of the receptor.
site is completely buried, providing a particularly easy The long snake-like protrusion follows a cleft of moder-
test case for location and size prediction. Envelopes were ate affinity along the surface of the receptor. Long narrow
generated varying from 10 to 600 points in increments of structures increase the total volume more than clustered
10 points. Figure 3 shows a plot of the total energy/point structures and thus reduce the overall energy/volume of
versus the number of points. This plot starts high at very the envelope.
small envelopes, as the envelope chooses the very best
region of affinity, then it drops as the number of points Site prediction in a set of diverse complexes
is increased, and the larger envelopes start to include
less-favorable points. This result is not unexpected: a We tested AutoLigand on a set of 187 ligand-receptor
similar profile was obtained in a survey of ligand-binding systems (see Table I). The set contains large, medium,
energetics by Kuntz and coworkers.33 The points that and small ligands and receptors with diverse chemical
represent the actual binding site are colored by I/U over- properties, providing a varied test of the ability of Auto-
lap. On the basis of this graph, it would be difficult to Ligand to identify ligand-binding sites. For each receptor,
make a prediction as to which of these points represent envelopes were generated varying from 10 to 600 points

DOI 10.1002/prot PROTEINS 1511


R. Harris et al.

beyond the minimum and maximum receptor atom


coordinates.
In each case, we initiated the search with the 10 best
flood fill solutions, but in many cases, these converged to
the same final envelope after the migration and ray-cast-
ing optimization. Both the smaller and larger envelope
volumes tended to lead to multiple results, whereas the
mid-sized envelopes typically led to one or two con-
verged results. Looking at the many widely scattered
small pockets of high affinity in Figure 1(a), it is not sur-
prising that small envelope volumes tend to produce
many different solutions. The mid-sized envelope vol-
umes tend to converge to the same solutions because the
method has a fairly large radius of convergence and the
initial flood fill solutions may be fairly close, which pulls
the starting fills into the same affinity pocket. But for
larger envelopes, the nonoverlapping criterion forces the
10 initial start points to be distant from one another,
outside of the radius of convergence of the optimization
methods, so that larger envelopes tend to give multiple
Figure 3 solutions.
AutoLigand analysis is done using the number of points. Each circle in the graph Figure 5 compiles the results from the whole set of
represents one AutoLigand run, using a range of envelope sizes from 10 to 600 complexes by selecting the highest energy/volume value
points. The protein is L-arabinose-binding protein (PDB entry 1abe). Colors
represent intersection/union values with red for I/U 5 1.0 (best) and blue for at each volume bin and placing them in a single row. For
I/U 5 0.0 (worst). Multiple solutions are found at each envelope size, since the instance, this is equivalent to starting with the graphs in
algorithm starts with multiple-seed points scattered through the space around
the protein. The steeply falling set of points ranging from blue to red
Figure 4(a,b) and then taking the uppermost point in
corresponds to envelopes in the binding site, and the more gradually falling sets each volume bin. Figure 5(a) is colored by the I/U ratio
of points in blue corresponds to envelopes elsewhere on the surface of the protein. of each envelope. This graph shows that in most cases,
by scanning through a range of envelope volumes, Auto-
Ligand will find the binding site of the protein in one or
in increments of 10 points. In all cases, we used a 1 Å more of them. In 85% (159/187) of cases, at least one of
spacing and the search space encompassed the entire re- these high energy/volume points will have an I/U value
ceptor molecule with an additional 2 Å in each direction of greater than 0.2. When a less stringent criterion of

Figure 4
AutoLigand analysis is done using the volume of the envelope. (a) The envelope for L-arabinose-binding protein is calculated by optimizing the value of total energy/
volume. As in Figure 3, the data points are colored by the I/U values of each envelope. Insets show three sample envelopes, colored by predicted atom type as in Figure
1(c). The ligand, L-arabinose, is shown with balls-and-sticks. (b) A similar analysis of HIV-1 protease (PDB entry 1hps). Insets show three sample envelopes.

1512 PROTEINS DOI 10.1002/prot


Prediction of Ligand-Binding Sites

choose the largest volume that shows an energy/volume


Table I greater than 0.8 of this value. This threshold of 0.8 was
PDB Accession Codes for Protein Complexes
determined empirically, as shown in Figure 6. Four paral-
1a4w 1aem 1bmn 1dwb 1hhj 1lgr 1rbp 2ctc 3er5 5tln lel experiments were performed, calibrating the threshold
1aaq 1aen 1bra 1dwc 1hhk 1lyb 1rne 2dbl 3ptb 5tmn on three-quarters of the complexes and testing the pre-
1abe 1aeo 1bto 1dwd 1hih 1mcb 1sbp 2er6 3tmn 6abp
1abf 1aeq 1bzm 1ebg 1hld 1mcf 1sre 2er7 4dfr 6cpa
dictive ability on the remaining quarter of the complexes.
1ac4 1aes 1cbx 1eed 1hos 1mch 1stp 2gbp 4er1 6cpp In each of the experiments, a broad peak was found
1ac8 1aet 1cil 1ela 1hps 1mcj 1tmn 2ifb 4er2 6tmn
1adb 1aeu 1cim 1elc 1hpx 1mcs 1tng 2phh 4er4 7abp
1adc 1aev 1cin 1ent 1hsl 1mnc 1tnh 2tsc 4hmg 7cpa
1add 1ajv 1cnw 1epo 1hte 1nnb 1tni 2upj 4hvp 7cpp
1adf 1ajx 1cnx 1epp 1htf 1nsc 1tnj 2vaa 4phv 7hvp
1ae8 1apb 1cny 1etr 1htg 1nsd 1tnk 2vab 4tln 7upj
1aeb 1apt 1ctt 1ets 1hvi 1okl 1tnl 2wea 4tmn 8abp
1aed 1apu 1dbb 1ett 1hvj 1pgp 1tpp 2web 4tsl 8cpp
1aee 1apv 1dbj 1fkf 1hvk 1ppc 1ulb 2wec 5abp 9aat
1aef 1apw 1dbk 1gno 1hvl 1pph 1uvs 2xis 5cna 9abp
1aeg 1avn 1dbm 1hbv 1hvr 1ppk 1vac 2ypi 5cpp 9hvp
1aeh 1bap 1dif 1hdt 1lde 1ppl 1xig 3bto 5er2
1aej 1bid 1dih 1hhg 1ldm 1ppm 2acs 3cpa 5hvp
1aek 1bmm 1dog 1hhi 1ldy 1pso 2cpp 3er3 5tim

success is used, requiring a simple intersection of 0.2 of


the ligand with the envelope instead of the more strin-
gent I/U measurement, AutoLigand identifies the ligand
site in 99% (185/187) of the cases. The use of I/U values
is more informative as it penalizes overly large envelopes
that overlap the ligand-binding site merely by flooding
over a large surface area of the receptor.
Figure 5(b) is a more stringent test of AutoLigand,
testing if the method is able to predict a volume and
location for the binding site. The graph is colored by
energy/volume for the best envelope in each size bin.
Ideally, we would want to be able to pick the envelope
with the best energy/volume and find that it corresponds
to the proper ligand position and volume. In practice,
these graphs are dominated by peaks at very small vol-
umes (on the order of 5–6 atoms) because small high-
affinity crevasses may be found in nearly all proteins. In
proteins with enclosed binding sites and small ligands,
this large peak at small volume is sharp and well defined.
In these cases, the sharp peak is by far the major feature
of the graph [as in Fig. 4(a) and the lowest complexes in
Fig. 5(b)], so that site and volume prediction is obvious.
With medium and large ligands, however, the graphs
tend to have a large peak at small volumes overlaid with Figure 5
a broader peak that corresponds to the binding site vol- AutoLigand analysis of 187 ligand-receptor test systems. Each row corresponds to
one ligand-receptor complex, with rows sorted by the number of atoms in the
ume. This pattern may be seen in Figure 5(b), wherein ligand. Each point in the graph is assigned by looking at 40 Å bins of volume in
the larger ligands show a long tail of favorable energy a graph like those shown in Figure 4 and picking the uppermost point (the
from left to right, and the smaller ligands fall to less envelope with the largest energy/volume) within each volume bin. Then, the
graph is colored by the appropriate property of the envelope represented by that
favorable energies more rapidly. The broad form of these point. (a) Colored by the intersection/union, with red for I/U > 0.2 and blue
peaks is due to the more open nature of these binding for I/U 5 0.0. (b) Colored by the energy/volume, normalized so that the
greatest value found for the complex is colored red and zero energy is colored
sites, which allows the envelopes to spread into neighbor- blue. The ragged edge on the right side of each graph and the scattering of
ing valleys on the protein surface. empty points throughout the graphs are due to the lack of regular sampling
when the volume of the envelope is calculated from the number of points. The
To automate the selection of a predicted ligand size PDB accession codes are given on the left side, and the two complexes shown in
from the energy/volume graphs, we identify the highest Figure 4 are highlighted in red.
energy/volume value found for the system and then

DOI 10.1002/prot PROTEINS 1513


R. Harris et al.

Site prediction using apo-enzyme structures

One ultimate goal of this work is to predict binding


sites in proteins of unknown function. To test the success
of AutoLigand in these types of applications, we used a
test set of 96 ligand/receptor systems wherein both
ligand-bound holo forms and ligand-free apo forms are
known. This set was compiled by Gunasekaran and
Nussinov,31 and is separated into three increasingly chal-
lenging categories: Class I, which shows small conforma-
tional changes of <0.5Å upon ligand binding; an inter-
mediate Class II with displacements between 0.5 and
2.0 Å, and Class III with conformational changes >2.0 Å
(Table II).
Figure 8 compiles the results of AutoLigand analysis of
this set of holo/apo structures. As in Figure 5, the graphs
include the points of best energy/volume from each vol-
ume bin and color the points by the I/U ratio of each
Figure 6 envelope. The I/U values for the apo structures were cal-
Threshold for prediction of ligand volume. A range of thresholds were tested for culated by comparing the predicted envelope with the
choosing the envelope of the largest volume that is within a given percentage of
the maximal energy/volume found for all envelopes. known ligand conformation in the overlapped holo
structure.
As we might expect, AutoLigand performs well with
the Class I structures, finding envelopes with nonzero
between about 0.65 and 0.85. When the threshold value
values of I/U in 90% (36/40) of the holo structures and
is below 0.6 of the highest energy/volume value, at least
83% (33/40) of the apo structures. The Class II structures
some of the ligand is overlapped in 56% of the systems,
showed similar results, finding envelopes that overlapped
which corresponds simply to selecting the largest volume
the known location in 89% (31/35) of holo structures
tested in 56% of the cases (the larger ligands). When
and 86% (30/35) of the apo structures. When the Class
the threshold approaches the maximum energy/volume,
III structures were tested, performance decreased some-
the number of correctly predicted envelopes drops to the
what for the apo structures, but still gave good predic-
number of small tight-binding ligand-receptor systems
tion in the majority, finding envelopes with nonzero I/U
included in the test set [as seen in Fig. 4(a)], wherein the
for 95% (20/21) of the holo structures and dropping to
highest energy/volume value is at the best overlap vol-
67% (14/21) of the apo structures.
ume. We chose a threshold of 0.8 at the upper side of
this range, so that the predicted values of ligand size
would be near the peak in the energy/volume graphs.
At the optimal threshold of 0.8 of the highest energy/
volume value, a ligand site with nonzero I/U is found in
72.7% of the cases (72.1%, 72.1%, 74.3%, and 72.3% for
the four cross-validation experiments), and a site with I/U
greater than 0.2 is found in 49.2% of the cases (50.7%,
50.7%, 48.6%, and 46.8% for the four experiments).
Figure 7 shows a plot of the envelope volume versus the
ligand volume for the points with at least 0.2 I/U. A clear
trend between the predicted volume and the actual ligand
volume can be seen.
AutoLigand did not find the binding location in a few
systems. The failures were found with oligomeric pro-
teins, including two Fab fragments (1mcf and 1mcj),
xylose isomerase (2xis), and lactate dehydrogenase
(1ldm). The false hits for these systems landed at the
interface between the subunits of the protein, in a deep
groove formed between the subunits. When AutoLigand
Figure 7
was rerun using only a single subunit instead of the Predicted envelope volumes and observed ligand volumes for the 92 envelopes
entire biological unit, the proper binding site was found with greater than 0.2 I/U.
in all cases (data not shown).

1514 PROTEINS DOI 10.1002/prot


Prediction of Ligand-Binding Sites

Figure 10
Figure 8
Overlap of HIV protease envelopes. (a) A sample overlap of a 300-point 1aaq
AutoLigand analysis of holo ligand-receptor complexes and corresponding apo envelope onto a 300-point 1hvk envelope. (b) Composite of fifteen 300-point
structures. Figures are generated similarly to Figure 5(a), with each point HIV-1 protease envelopes, colored by predicted atom type as in Figure 1(a). The
colored by the I/U value of the envelope. (a) Holo enzyme structures separated larger spheres show regions of the overlapped volumes that are most similar
into three classes based on the amount of conformational change induced by between the different envelopes, and the smallest spheres show regions that are
ligand binding. In each class, the smallest ligands are at the bottom and the found in only 1 of 15.
largest are at the top. (b) Corresponding apo enzyme structures.

Figure 9 demonstrates the types of problems that may the ligand binds. Thus, the AutoLigand envelope from
be encountered with flexible proteins that undergo con- the holo structure fully overlaps the ligand position, but
formational change upon ligand binding. In retinol-bind- the half of the binding site is filled with the phenylala-
ing protein, a phenylalanine sidechain is displaced when nine sidechain in the apo structure, so that the envelope

Figure 9
Comparison of envelopes generated for the structures of human retinol-binding protein. (a) Holo enzyme form bound to retinol (PDB entry 1brp). (b) Apo form of the
enzyme (PDB entry 1brq). Retinol is shown in magenta and a particularly mobile phenylalanine is shown in green.

DOI 10.1002/prot PROTEINS 1515


R. Harris et al.

Overlapping envelopes
Table II
PDB Accession Codes for Holo-Apo Protein Structures Our test set included 15 different HIV-1 protease
complexes with various ligands. The 300-point enve-
Holo Apo Holo Apo Holo Apo Holo Apo
Class I
lopes for each of these 15 systems were overlapped using
154l 153l 1bk9 1psj 1dmy 1dmx 1nft 1tfa a modified method in AutoDock3, as described in the
1a0t 1a0s 1bm7 1bmz 1duc 1dun 1pnf 1png methods. Table III shows the percentage of overlap for
1a8u 1a7u 1br6 1rtc 1dud 1dup 1pnl 1pnk each of these comparisons. The average overlap between
1afa 1afd 1bso 2blg 1eus 1eur 1rca 1aqp
1aha 1ahc 1bxq 3app 1gmp 1gmq 1tal 2alp
the different envelopes was found to be 92.6%. This
1aqm 1aqh 1byq 1yer 1hor 1dea 1vps 1vpn result indicates that the method is capable of finding
1arm 1yme 1com 2chs 1hvq 2hvm 1xzb 1xza the ligand site consistently for different crystal struc-
1awb 2hhm 1dcp 1dco 1icm 1ifb 2enb 1ena tures of the same receptor. It also implies that the
1b5d 1b49 1did 1xla 1kev 1ped 4tim 1ag1
1bj0 1bjz 1dil 2sil 1log 1loe 5enl 3enl various ligands only induce differences of about 7.4% in
the highest affinity zone for the HIV-1 protease recep-
Class II tor. Figure 10(a) shows the overlap of two sample enve-
1a26 2paw 1epb 1epa 1jul 1igs 1ra1 5dfr lopes with a moderate overlap value. Figure 10(b) shows
1adl 1alb 1fga 2fgf 1kel 1kem 1ses 1sry
1ai2 3icd 1fkl 1fkk 1ltt 1lts 1ubw 1ubv a composite of all 15 HIV-1 protease envelopes. The
1aj0 1ajz 1fut 1fus 1mjl 1mjk 1xva 1bhj larger spheres depict points of higher occupancy com-
1alw 1alv 1gca 1gcg 1mzm 1mzl 2gal 1bkz mon in most of the envelopes and the single dots show
1aq7 2ptn 1gd1 2gd1 1nhk 2nck 2izf 2izd points filled by only a single result. The distinctive cross
1beh 1bd9 1igb 1amp 1pgn 2pgd 3pca 2pcd
1ben 1trz 1jdc 1jda 1ptr 1ptq 6cha 4cha shape of typical protease inhibitors shows up, with the
1ddt 1sgk 1jef 135l 1qpq 1qpo P1 and P10 positions extending up and down from the
center and the P2 and P20 positions extending to left
Class III and right. The strong preference for a hydrogen-bond
1a9p 1a9o 1cen 1ceo 1mpj 3ins 3gal 1bkz
1aj7 2rcs 1ebg 1ebh 1o0r 1fgx 4fua 1fua acceptor at the center is also seen to be a common de-
1ake 4ake 1hex 1xaa 1swd 1swa 7tim 1ypi terminant of binding across these enzyme structures.
1anf 1omp 1hii 1hsi 1vrt 1rtj This hydrogen-bonding position is occupied by a con-
1brp 1brq 1hnl 1lz4 2bgu 2bgt
served water molecule in most structures, which is dis-
1byb 1bya 1lca 4tms 2cht 2chs
placed by an oxygen atom in cyclic urea inhibitors. We
are currently exploring the use of this technique to
compare envelopes from different mutants of the same
receptor to characterize the features of the binding site
is shifted, overlapping only part of the known ligand that change during the development of drug resistance
position. In these cases, a dynamic model is needed to and compare the binding sites of different proteins that
explore the range of possible conformations of sidechains bind to the same ligand to characterize the determinants
to make a fully correct prediction. of ligand specificity.

Table III
Percentage Overlap for Aligned HIV-1 Protease Envelopes

1aaq 1hbv 1hih 1hps 1hpx 1htf 1htg 1hvi 1hvj 1hvk 1hvl 1hvr 4phv 5hvp 9hvp
1aaq 95.8 93.0 97.8 98.3 95.4 95.9 94.4 96.9 94.6 95.7 94.1 96.8 97.2 96.7
1hbv 91.3 87.1 94.3 90.4 91.4 89.4 86.8 92.0 91.1 93.6 94.0 89.4 91.1 92.1
1hih 86.5 87.2 89.3 90.8 75.4 81.3 90.6 90.5 91.0 91.0 82.8 86.5 80.3 86.6
1hps 93.4 93.8 89.3 93.6 84.1 92.2 92.1 92.8 92.1 95.0 91.0 92.5 92.1 93.1
1hpx 95.4 92.8 94.0 96.5 93.1 94.8 95.2 98.0 98.4 98.1 90.1 95.3 95.8 92.7
1htf 91.4 91.0 88.3 93.9 92.9 97.8 96.3 87.7 94.9 95.0 89.3 94.0 94.5 90.7
1htg 92.8 86.1 83.3 96.3 96.1 92.3 92.1 91.2 87.0 94.8 87.4 95.7 95.3 91.0
1hvi 82.2 82.4 92.1 88.7 93.6 84.2 88.2 94.1 94.4 95.1 79.0 90.2 88.2 84.5
1hvj 91.2 87.1 88.4 91.1 94.7 86.5 92.1 93.7 98.3 98.3 84.0 91.2 87.9 89.0
1hvk 88.0 84.2 89.1 88.7 95.8 85.6 87.5 95.8 96.4 99.3 84.5 84.3 76.1 79.6
1hvl 88.2 89.2 90.4 93.3 97.1 85.0 92.8 96.2 97.8 99.4 85.6 95.5 92.8 91.2
1hvr 91.8 97.3 87.8 94.8 94.3 94.6 94.9 88.3 90.7 89.2 92.5 94.2 92.1 94.6
4phv 93.2 96.1 91.8 97.4 97.1 95.1 98.1 83.5 96.7 96.2 98.2 89.1 99.0 94.6
5hvp 95.6 93.4 93.1 94.7 98.7 94.0 98.9 95.6 92.2 97.1 82.6 83.3 98.3 89.6
9hvp 97.3 97.4 91.7 98.4 97.9 91.3 95.7 94.4 96.8 96.6 96.2 96.8 96.9 96.7
Ave 91.3 91.0 90.0 93.9 95.1 89.1 92.8 92.5 93.8 94.3 94.7 87.9 92.9 91.4 90.4
SD 4.0 4.8 2.9 3.2 2.7 5.8 4.9 3.9 3.3 3.7 4.2 5.1 4.1 6.5 4.5

1516 PROTEINS DOI 10.1002/prot


Prediction of Ligand-Binding Sites

CONCLUSIONS 11. Gunther J, Bergner A, Hendlich M, Klebe G. Utilising structural


knowledge in drug design strategies: applications and using Reli-
AutoLigand identifies ligand-binding pockets based base. J Mol Biol 2003;326:621–636.
strictly on the properties of the receptor, identifying the 12. Campbell SJ, Gold ND, Jackson RM, Westhead DR. Ligand binding:
functional site location, similarity and docking. Curr Opin Struct
optimal ligand volume, shape, and atom types. The Biol 2003;13:389–395.
strength of the contiguous envelope technique comes 13. Laskowski RA, Luscombe NM, Swindells MB, Thornton JM. Protein
from the fact that it allows regions of high affinity to be clefts in molecular recognition and function. Prot Sci 1996;5:2438–2452.
linked through the regions of low affinity as long as the 14. Kleywegt GJ, Jones TA. Detection, delineation, measurement and
total affinity of the volume is optimized. By running a display of cavities in macromolecular structures. Acta Cryst 1994;
D50:178–185.
series of different volume sizes, the method is able to 15. Peters KP, Fauck J, Frommel C. The automatic search for ligand
identify the ligand site without any prior knowledge of binding sites in protein of known three-dimensional structure using
the ligand-receptor system for a majority of cases. The only geometric criteria. J Mol Biol 1996;256:201–213.
use of an atomic volume representation of the envelope, 16. Friedman JM. Fourier-filtered van der Waals contact surfaces: accurate
instead of a simple sum of points, is essential for ruling ligand shapes from protein structures. Prot Eng 1997;10:851–863.
17. Brady GP, Stouten PFW. Fast prediction and visualization of pro-
out long snake-like or spiderweb-like envelopes that are tein binding pockets with PASS. J Comp-Aided Mol Des 2000;
unlikely to represent real ligands. 14:383–401.
The next goal in this work will be to develop methods 18. Morris RJ, Najmanovich RJ, Kahraman A, Thornton JM. Real spheri-
to identify specific ligands that correspond to the enve- cal harmonic expansion coefficients as 3D shape descriptors for pro-
lopes determined by the program. This will be essential tein binding pocket and ligand comparisons. Bioinformatics 2005;
21:2347–2355.
for use of AutoLigand for the prediction of putative 19. Coleman RG, Sharp KA. Travel depth, a new shape descriptor for
ligands in proteins from high-throughput crystallization macromolecules: application to ligand binding. J Mol Biol 2006;
projects. It will also be essential for use of the method in 362:441–458.
computer-aided drug design and for use in drug develop- 20. Laskowski RA, Watson JD, Thornton JM. From protein structure to
ment of known ligand-binding sites. biochemical function? J Struct Funct Genom 2003;4:167–177.
21. Shulman-Peleg A, Nussinov R, Wolfson HJ. Recognition of func-
tional sites in protein structures. J Mol Biol 2004;339:607–633.
ACKNOWLEDGMENTS 22. Bate P, Warwicker J. Enzyme/non-enzyme discrimination and pre-
diction of enzyme active site location using charged-based methods.
This work was performed with support from grants J Mol Biol 2004;340:263–276.
R24-CA095830 and R01-GM069832 from the NIH. This 23. An J, Torov M, Abagyan R. Pocketome via comprehensive identifi-
is manuscript 18559 from the Scripps Research Institute. cation and classification of ligand binding envelopes. Mol Cell Pro-
teom 2005;4:752–761.
24. Lauri ATR, Jackson RM. Q-SiteFinder: an energy-based method for
REFERENCES the prediction of protein-ligand binding sites. Bioinformatics
2005;21:1908–1916.
1. Burley SK, Bonanno JB. Structural genomics of proteins from con- 25. Glaser F, Morris RJ, Najmanovich RJ, Laskowski RA, Thornton JM.
served biochemical pathways and processes. Curr Opin Struct Biol A method for localizing ligand binding pockets in protein struc-
2002;12:383–391. tures. Proteins 2006;62:479–488.
2. Wild DL, Saqi MAS. Structural proteomics: inferring function from 26. Goodford PJ. Computational procedure for determining energeti-
protein structure. Curr Proteom 2004;1:59–65. cally favorable binding sites on biologically important molecules.
3. Teichmann SA, Murzin AG, Chothia C. Determination of protein J Med Chem 1985;28:849–857.
function, evolution and interactions by structural genomics. Curr 27. Laskowski RA. SURFNET: a program for visualizing molecular sur-
Opin Struct Biol 2001;11:354–363. faces, cavities, and intermolecular interactions. J Mol Graph Model
4. Honig B, Goldsmith-Fischman S. Structural genomics: compu- 1995;13:323–330.
tational methods for structural analysis. Prot Sci 2006;12:1813– 28. Glaser F, Rosenberg Y, Kessel A, Pupko T, Ben Tal N. The ConSurf-
1821. HSSP database: the mapping of evolutionary conservation among
5. Aloy P, Querol E, Aviles FX, Sternberg MJE. Automated structure- homologs onto PDB structures. Proteins 2004;58:610–617.
based prediction of functional sites in proteins: applications to 29. Beuscher A, Olson AJ, Goodsell DS. Identifying protein binding
assessing the validity of inheriting protein function from homology sites and optimal ligands. Lett Drug Des Discov 2005;2:255–259.
in genome annotation and to protein docking. J Mol Biol 2001;311: 30. Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew
395–408. RK, Olson AJ. Automated docking using a Lamarckian genetic algo-
6. Stark A, Shkumatov A, Russell RB. Finding functional sites in struc- rithm and an empirical binding free energy function. J Comp
tural genomics proteins. Structure 2004;12:1405–1412. Chem 1998;19:1639–1662.
7. Shapiro L, Harris T. Finding functional sites in structural genomics 31. Gunasekaran K, Nussinov R. How different are structurally flexible
proteins. Curr Opin Biotechnol 2000;11:31–35. and rigid binding sites? Sequence and structural features discrimi-
8. Kuntz ID. Structure-based strategies for drug design and discovery. nating proteins that do and do not undergo conformational change
Science 1992;257:1078–1082. upon ligand binding. J Mol Biol 2006;365:257–273.
9. Schmitt S, Kuhn D, Klebe G. A new method to detect related func- 32. Sanner MF. A component-based software environment for visualiz-
tion among proteins independent of sequence and fold homology. ing large macromolecular assemblies. Structure 2005;13:447–462.
J Mol Biol 2002;323:387–406. 33. Kuntz ID, Chen K, Sharp KA, Kollman PA. The maximal affinity of
10. Exner TE, Keil M, Brickmann J. Pattern recognition strategies for ligands. Proc Natl Acad Sci USA 1999;96:9997–10002.
molecular surfaces. I. Pattern generation using fuzzy set theory. 34. Sanner MF. Python: a programming language for software integra-
J Comp Chem 2002;23:1176–1187. tion and development. J Mol Graph Mod 1999;17(February):57–61.

DOI 10.1002/prot PROTEINS 1517

You might also like