Nothing Special   »   [go: up one dir, main page]

(PDF Download) Algorithms and Methods in Structural Bioinformatics Nurit Haspel Filip Jagodzinski Kevin Molloy Eds Fulll Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Full download test bank at ebookmeta.

com

Algorithms and Methods in Structural


Bioinformatics Nurit Haspel Filip Jagodzinski
Kevin Molloy Eds
For dowload this book click LINK or Button below

https://ebookmeta.com/product/algorithms-and-
methods-in-structural-bioinformatics-nurit-haspel-
filip-jagodzinski-kevin-molloy-eds/

OR CLICK BUTTON

DOWLOAD EBOOK

Download More ebooks from https://ebookmeta.com


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Algorithms in Bioinformatics Theory and Implementation


1st Edition Gagniuc Paul A

https://ebookmeta.com/product/algorithms-in-bioinformatics-
theory-and-implementation-1st-edition-gagniuc-paul-a/

International Handbook of Structural Fire Engineering


Kevin Lamalva

https://ebookmeta.com/product/international-handbook-of-
structural-fire-engineering-kevin-lamalva/

Algorithms 4th Edition Robert Sedgewick Kevin Wayne

https://ebookmeta.com/product/algorithms-4th-edition-robert-
sedgewick-kevin-wayne/

Rotating Machinery Optical Methods Scanning LDV Methods


Volume 6 Proceedings of the 39th IMAC A Conference and
Exposition on Structural Dynamics 2021 Dario Di Maio
Javad Baqersad Eds
https://ebookmeta.com/product/rotating-machinery-optical-methods-
scanning-ldv-methods-volume-6-proceedings-of-the-39th-imac-a-
conference-and-exposition-on-structural-dynamics-2021-dario-di-
Advances in Protein Molecular and Structural Biology
Methods 1st Edition Timir Tripathi (Editor)

https://ebookmeta.com/product/advances-in-protein-molecular-and-
structural-biology-methods-1st-edition-timir-tripathi-editor/

Bioinformatics and Medical Applications: Big Data Using


Deep Learning Algorithms 1st Edition A. Suresh (Editor)

https://ebookmeta.com/product/bioinformatics-and-medical-
applications-big-data-using-deep-learning-algorithms-1st-edition-
a-suresh-editor/

Matrix, Numerical, and Optimization Methods in Science


and Engineering 1st Edition Kevin W. Cassel

https://ebookmeta.com/product/matrix-numerical-and-optimization-
methods-in-science-and-engineering-1st-edition-kevin-w-cassel/

Translational Bioinformatics and Systems Biology


Methods for Personalized Medicine 1st Edition Qing Yan

https://ebookmeta.com/product/translational-bioinformatics-and-
systems-biology-methods-for-personalized-medicine-1st-edition-
qing-yan/

Expression Purification and Structural Biology of


Membrane Proteins Methods in Molecular Biology 2127
Camilo Perez (Editor)

https://ebookmeta.com/product/expression-purification-and-
structural-biology-of-membrane-proteins-methods-in-molecular-
biology-2127-camilo-perez-editor/
Computational Biology

Nurit Haspel
Filip Jagodzinski
Kevin Molloy Editors

Algorithms
and Methods
in Structural
Bioinformatics
Computational Biology

Advisory Editors
Gordon Crippen, University of Michigan, Ann Arbor, MI, USA
Joseph Felsenstein, University of Washington, Seattle, WA, USA
Dan Gusfield, University of California, Davis, CA, USA
Sorin Istrail, Brown University, Providence, RI, USA
Thomas Lengauer, Max Planck Institute for Computer Science, Saarbrücken,
Germany
Marcella McClure, Montana State University, Bozeman, MT, USA
Martin Nowak, Harvard University, Cambridge, MA, USA
David Sankoff, University of Ottawa, Ottawa, ON, Canada
Ron Shamir, Tel Aviv University, Tel Aviv, Israel
Mike Steel, University of Canterbury, Christchurch, New Zealand
Gary Stormo, Washington University in St. Louis, St. Louis, MO, USA
Simon Tavaré, University of Cambridge, Cambridge, UK
Tandy Warnow, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Lonnie Welch, Ohio University, Athens, OH, USA

Editor-in-Chief
Andreas Dress, CAS-MPG Partner Institute for Computational Biology, Shanghai,
China
Michal Linial, Hebrew University of Jerusalem, Jerusalem, Israel
Olga Troyanskaya, Princeton University, Princeton, NJ, USA
Martin Vingron, Max Planck Institute for Molecular Genetics, Berlin, Germany

Editorial Board Members


Robert Giegerich, University of Bielefeld, Bielefeld, Germany
Janet Kelso, Max Planck Institute for Evolutionary Anthropology, Leipzig,
Germany
Gene Myers, Max Planck Institute of Molecular Cell Biology and Genetics,
Dresden, Germany
Pavel Pevzner, University of California, San Diego, CA, USA
Endorsed by the International Society for Computational Biology, the Computa-
tional Biology series publishes the very latest, high-quality research devoted to
specific issues in computer-assisted analysis of biological data. The main emphasis
is on current scientific developments and innovative techniques in computational
biology (bioinformatics), bringing to light methods from mathematics, statistics
and computer science that directly address biological problems currently under
investigation.
The series offers publications that present the state-of-the-art regarding the
problems in question; show computational biology/bioinformatics methods at work;
and finally discuss anticipated demands regarding developments in future method-
ology. Titles can range from focused monographs, to undergraduate and graduate
textbooks, and professional text/reference works.
Nurit Haspel • Filip Jagodzinski • Kevin Molloy
Editors

Algorithms and Methods


in Structural Bioinformatics
Editors
Nurit Haspel Filip Jagodzinski
Department of Computer Science Computer Science
University of Massachusetts Boston Western Washington University
Boston, MA, USA Bellingham, WA, USA

Kevin Molloy
ISAT/CS Building Room 216
James Madison University
Harrisonburg, VA, USA

ISSN 1568-2684 ISSN 2662-2432 (electronic)


Computational Biology
ISBN 978-3-031-05913-1 ISBN 978-3-031-05914-8 (eBook)
https://doi.org/10.1007/978-3-031-05914-8

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

The three-dimensional structure and function of molecules present many challenges


and opportunities for developing an understanding of biological systems. With the
increasing availability of molecular structures and the advancing accuracy of struc-
ture predictions and molecular simulations, the space for algorithmic advancement
on many analytical and predictive problems is both broad and deep. To support this
field, a rich set of methods and algorithms are available, addressing a variety of
important problems such as protein-protein interactions, the effect of mutations on
protein structure and function, and protein structure determination.
Recently, a deep learning-based algorithm, AlphaFold, made tremendous
progress in predicting the three-dimensional structures of proteins, known as
the protein folding problem. However, many problems still remain unsolved.
In particular, the experimental resolution of protein structures, especially large
macromolecules, still lags behind the availability of protein sequences. Modeling
protein-protein interactions and protein binding still remain a challenge. Even
when the three-dimensional structure of two interacting proteins is known, it is still
difficult to determine the complex formed by the two proteins. It is also challenging
to model and analyze conformational transitions in proteins, due to the transient
nature of intermediate structures.
Chapter “Protein-Ligand Binding with Applications in Molecular Docking”
presents a multi-dimensional analysis approach to protein-ligand binding. Under-
standing protein-ligand binding from different perspectives (energetics, structure,
homology, etc.) can provide insights to further drug design and protein conformation
studies. This chapter reviews basic principles and recent advances in protein-
ligand binding, including the underlying thermodynamics basis, computational
methodologies, and freely available databases. End-point free energy methods,
which have shown to be significantly more efficient than the popular alchemical and
transitional path sampling methods, are discussed in more detail. The chapter ends
with a brief review of molecular docking and its applications to high throughput
screening in early-stage drug discovery.
Chapter “Explaining Small Molecule Binding Specificity with Volumetric Rep-
resentations of Protein Binding Sites” presents advances in algorithmic approaches

v
vi Preface

for studying the volumetric properties of molecular surfaces and electrostatic


isopotentials. Elucidating the surface properties of molecules is needed for advances
in drug design and protein ligand binding.
Chapter “Machine Learning-Based Approaches for Protein Conformational
Exploration” surveys computational methods for conformational exploration of
proteins. The chapter provides a detailed discussion of the challenges of using
these methods, their strengths, and their shortcomings. The surveyed topics include
physics-based methods such as molecular dynamics as well as geometry-based
methods and a focus on new machine learning-based strategies that have been a
research hot spot for computational biologists in recent years.
Chapter “Low Rank Approximation Methods for Identifying Impactful Pairwise
Protein Mutations” describes recent advances in the use of machine learning-based
approaches, including low rank sampling, for efficiently identifying similarities
among proteins and studying the effects of pairwise mutations. Identifying protein
classes and similarities among sets of proteins has relevance to homology modeling
and computational experiments that aim to better understand new protein structures
based on their similarity to other biomolecules.
Chapter “Detection and Analysis of Amino Acid Insertions and Deletions”
showcases on-going work that relies on robotics and coarse-grained combinatorial
approaches for predicting the effects on insertion and deletions (indels) on protein
structural stability. Even a single amino acid substitution can cause significant
changes to a protein’s shape and function. A variety of approaches have been
developed in the past decade for inferring the effects of substitution mutations, but
very few address the effects of insertion or deletion (indel) mutations.
Chapter “DeepTracer Web Service for Fast and Accurate De Novo Protein
Complex Structure Prediction from Cryo-EM” introduces DeepTracer—a Web
service for deep learning-based de novo protein complex structure prediction from
Cryo-EM. Cryo-EM is increasingly being used to resolve protein structures. The
resolution of most Cryo-EM resolved entries in the protein data bank (PDB) is
medium or low, in which fine-level details are obscured, but new technology and
improved computational methods allow for more accurate structure prediction.

Boston, MA, USA Nurit Haspel


Bellingham, WA, USA Filip Jagodzinski
Harrisonburg, VA, USA Kevin Molloy
January 2022
Contents

Protein-Ligand Binding with Applications in Molecular Docking . . . . . . . . . 1


Nikita Mishra and Negin Forouzesh
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Thermodynamic Basis of Protein-Ligand Interaction . . . . . . . . . . . . . . . . . 2
2 Computational Methods for Estimating Binding Free Energy . . . . . . . . . . . . . . 4
2.1 Alchemical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Transition Path Sampling Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 End-Point Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Protein-Ligand Binding Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Molecular Docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Explaining Small Molecule Binding Specificity with Volumetric
Representations of Protein Binding Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Ziyi Guo and Brian Y. Chen
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.1 Comparison Algorithms for Examining Specificity . . . . . . . . . . . . . . . . . . . 18
2 Specificity Assignment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1 Binding Site Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Metrics for Binding Site Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Comparison Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Statistical Models for Binding Site Comparison . . . . . . . . . . . . . . . . . . . . . . 25
3 Component Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1 Foundations of Structure-Based Component Localization . . . . . . . . . . . . 27
3.2 Using CSG for Component Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Statistical Models for Component Localization . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Volumetric Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Flexible Representations for Component Localization. . . . . . . . . . . . . . . . 35
3.6 Solid Representations of Electrostatic Isopotentials . . . . . . . . . . . . . . . . . 37
4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

vii
viii Contents

4.1 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40


References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Machine Learning-Based Approaches for Protein
Conformational Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Fatemeh Afrasiabi, Ramin Dehghanpoor, and Nurit Haspel
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2 Biophysical and Empirical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3 Physics-Based Computational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1 Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Monte Carlo Based Search Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 Geometric and Robotics-Inspired Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1 Motion Planning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Machine Learning-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.1 Dimensionality Reduction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6 Toolkits for Applying Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.1 Topology and Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Using a priori Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Low Rank Approximation Methods for Identifying Impactful
Pairwise Protein Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Chris Daw, Brian Barragan-Cruz, Nicholas Majeske, Filip Jagodzinski,
Tanzima Islam, and Brian Hutchinson
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.1 Phase 1: Generate Exhaustive Pairwise Data . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2 Phase 2: Sampling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3 Phase 3: Smooth Approximation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4 Results: SVD Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1 SVD Approximation and Sampling Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5 Results: Case Study on 2LZM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Detection and Analysis of Amino Acid Insertions and Deletions . . . . . . . . . . . 89
Muneeba Jilani, Nurit Haspel, and Filip Jagodzinski
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2 Computational Methods of InDel Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3 Computational Methods of InDel Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.1 Machine Learning Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Contents ix

3.2 Detecting Functional and Fitness Effects of InDels on


Protein Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.3 Plasticity of Proteins to InDels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
DeepTracer Web Service for Fast and Accurate De Novo Protein
Complex Structure Prediction from Cryo-EM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Dong Si, Hanze Meng, Jonas Pfab, Yinrui Deng, Yutong Xie, Jackson
Tan, Sheung Him Martin Chow, Jason Chen, and Aditi Jain
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2 Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.2 Prediction Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.3 UI/UX Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.1 Design Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.2 Future Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Protein-Ligand Binding with
Applications in Molecular Docking

Nikita Mishra and Negin Forouzesh

1 Introduction

Protein-ligand binding is one of the most fundamental processes in various organ-


isms. There are several examples of this process occurring in nature: enzymes bind
to substrates to allow for energy production via catabolic and anabolic pathways,
viruses attach their proteins onto receptors of the host cell to transfer their genetic
material, and signaling ligands bind to intercellular receptors to implement a signal
transduction process. Protein-ligand binding also plays a key role in the molecular
recognition that is central to drug design and discovery [1]. Therefore, it is vital
to understand protein-ligand binding mechanism at molecular level in biological
systems.
One of the key features of protein-ligand interactions is the binding free energy
change that occurs between the protein and the ligand upon the ligand’s attachment.
Binding free energy heavily dictates how strongly a protein and ligand interact.
This is a particularly useful physiochemical feature to understand for drug design,
studying infectious diseases, and signal transduction pathways in the cell [2]. There
are various ways to determine the binding free energy. Some common experi-
mental techniques include isothermal titration calorimetry [3], surface plasmon
resonance [4], fluorescence polarization [5], and MicroScale thermophoresis [6],
but these types of methods are often costly and labor intensive. Thus, efficient

N. Mishra
Department of Chemistry and Biochemistry, California State University, Los Angeles,
Los Angeles, CA, USA
e-mail: nmishra2@calstatela.edu
N. Forouzesh ()
Department of Computer Science, California State University, Los Angeles, Los Angeles,
CA, USA
e-mail: neginf@calstatela.edu

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 1


N. Haspel et al. (eds.), Algorithms and Methods in Structural Bioinformatics,
Computational Biology, https://doi.org/10.1007/978-3-031-05914-8_1
2 N. Mishra and N. Forouzesh

computational methods have emerged for the quantitative determination of binding


free energy, e.g., alchemical methods, transition path sampling, and endpoint
methods [7, 8].
The intersection between computational and biochemical fields has led to new
technologies that can be used for computer-aided drug design [9]. One such novel
method is molecular docking, which can be used to screen various compounds as
they attach to their target proteins [10]. By examining many protein-ligand different
binding conformations and orientations, evaluating these poses, and ranking them
in accordance to a score, docking methods determine the most likely conformation
of a ligand [11]. Molecular docking is a powerful tool that can significantly expedite
and cut costs of drug discovery.
This chapter will discuss basic principles and recent advances in computational
study of protein-ligand binding. First, we review binding mechanisms from a
thermodynamic point of view. Next, popular computational methods are introduced
followed by a thorough description of three relevant binding databases. Finally,
molecular docking algorithms, scoring functions, and software are explained.

1.1 Thermodynamic Basis of Protein-Ligand Interaction

A property of a thermodynamic system, the heat content of a system at constant


pressure is defined as enthalpy, H . Enthalpy is not measured directly, and therefore
physical and chemical systems are generally more concerned with the net change
in enthalpy, H . In a protein-ligand system, enthalpy measures total energy—
both the internal energy of a system and the energy required to create the system
due to the volume it displaces in its environment [12]. Binding interactions, such
as covalent and non-covalent bonds, in biological systems impact the enthalpy
value. Covalent bonds involve the direct sharing of electrons between two atoms
and are very strong interactions. Noncovalent interactions are weak bonds that
do not involve sharing electrons; although weak, they are vital in determining
physiological macromolecule behavior. There are four basic types of noncovalent
interactions in biological systems: hydrogen bonds, ionic interactions, van der
Waals interactions, and hydrophobic effects [13]. The formation of noncovalent
interactions is energetically favorable and also releases heat, meaning that it is
exothermic (H < 0); breaking such interactions absorbs heat, and is considered
endothermic (H > 0).
There are examples of chemical interactions occurring although they are energet-
ically unfavorable, indicating that there are other factors besides enthalpy impacting
the progression and favorability of an interaction. One such factor is known as
entropy, S. Entropy is a measure of the dispersal of energy in a system, often
referred to as its disorder. On a molecular level, entropy is strongly related to the
degrees of freedom (d.o.f) of a molecule, which describe a molecule’s movement
in space. As such, there are three types of d.o.f to consider: translational degrees
of freedom (specifying the center of mass), rotational d.o.f (specifying orientation),
Protein-Ligand Binding with Applications in Molecular Docking 3

and vibrational d.o.f (specifying the relative positions of various nuclei) [14]. With
higher d.o.f., the atoms are more free to move around and therefore, the disorder
increases. More disordered systems are favored, but as we will see later in this
section, the favorability of a reaction is a compromise between enthalpy and entropy.
Given the two factors that contribute to energetic favorability, Gibbs free energy,
G, is known as an essential thermodynamic quantity that describes the spontaneity
of a reaction under isobaric and isothermal conditions [14]. The change in G is
defined as:

G = H − T S, (1)

where T is the temperature of the system in kelvin and H and S refer to the
enthalpy and entropy, respectively. Processes are spontaneous when G is negative
and non-spontaneous when G is positive. This equation exemplifies the “toss-up”
between enthalpy and entropy in determining the favorability of a reaction: the aim
is to find a balance between decreasing energy and increasing disorder to minimize
G [13, 14].
Thermodynamic factors are important to consider when determining G, but
kinetic factors play an important role as well. In the kinetic context, protein-ligand
binding is dependent on the concentrations of the protein and the ligand (denoted
as [P ] and [L], respectively), as this determines the speed with which protein-
ligand complexes can form (the concentration of the complex is denoted as [P L]).
Michaelis-Menten kinetics [13] introduces the equilibrium association constant, Ka ,
which relates to how quickly the protein-ligand complex can form. This is inversely
proportional to Kd , which relates to how quickly a protein and ligand unattach from
one another.
[P L] 1
Ka = = (2)
[P ][L] Kd

Higher Ka values and lower Kd values are associated with higher binding affinity
and therefore a lower G◦ , which is the standard free energy. Binding free energy
(BFE), G, is the change in free energy when a protein and ligand bind to one
another. It is defined as

G = G◦ + RT lnQ (3)

where R is the gas constant and Q is the reaction quotient, which is equal to Ka
at equilibrium. The extent of the protein-ligand interaction is determined by how
negative the change in free energy is upon binding [7].
4 N. Mishra and N. Forouzesh

2 Computational Methods for Estimating Binding


Free Energy

There are several methods by which binding free energy can be calculated based
on computational analysis of the protein and the ligand, including three principle
methods: alchemical methods, transition path sampling, and end-point methods.
Each of these techniques and recent advancements will be discussed in greater detail
below.

2.1 Alchemical Methods

Free energy is a state function, meaning that the G value does not depend on
the path taken to reach the starting conformation to the bound conformation [15].
Alchemical methods take advantage of this fact and introduce “dummy” molecules,
or other chemical species, to bridge the high-probability region between the
unbound and bound states of the protein and the ligand [7, 16]. The use of these
chemical species are considered non-physical (alchemical) intermediate states,
and their use allows for a robust computational method that can calculate large
G values [16]. The general idea of alchemical free energy calculations can be
summarized by a thermodynamic cycle, shown in Fig. 1. An important parameter in
alchemical free energy calculations is the alchemical reaction coordinate, λ. It helps
define the thermodynamic path connecting the two states and shifts from 0 to 1 as
the coordinate moves.
In the alchemical category, there are two primary methodologies used for free
energy calculations; the first is free energy perturbation (FEP). This method relates
the relative free energy between the unbound and bound protein-ligand complex
to an average function of their energy [17]. Potential energies based on FEP
theory can be averaged over ensembles generated via molecular dynamics (MD)
or Monte Carlo (MC) simulations [17–19]. FEP simulations drive the perturbation
of one molecule into another by using alchemical intermediate states, also known
as windows, that lead one state into the other [7, 19]. Once the calculations are

Fig. 1 The vertical arrows


indicate a biochemical
reaction that shifts the ligand
and receptor from being free
in solution to bound at the
binding site. The horizontal
arrows indicate a
transmutation that
alchemically alters the ligand.
G = GA − GB =
GC − GD
Protein-Ligand Binding with Applications in Molecular Docking 5

complete, the convergence free energy results of each of the windows is estimated.
One caveat with FEP is that the convergence is only straightforward when U (the
change in potential) between the initial and final states is small.
The second method is known as thermodynamic integration (TI). It calculates the
change in free energy as follows:
 λ=1  δH 
G = dλ (4)
λ=0 δλ λ

As shown in Fig. 1, the G is the difference between the G of the bound and
unbound states calculated from Eq. 4. The important characteristics of TI are that it
relies on the average force exerted in a system undergoing an alchemical transition,
and integrating it along the alchemical coordinate. This method is often considered
more straightforward than other alchemical approaches to binding free energy [20].
There have been a few techniques developed to try and integrate TI into simulations.
One method is known as slow-growth TI, which progresses through transitions
in an extremely gradual manner and keeps the system very close to equilibrium.
The work done through the transition is equivalent to the free energy change [21].
Another method is discrete TI, which functions very similarly to
 FEP
 and divides the
thermodynamic path into separate windows. A calculation for δH δλ is done, followed
by the integration necessary to get the value [20, 22, 23].

2.2 Transition Path Sampling Methods

Proteins and ligands undergo several conformational changes in stable states


before reaching the ideal and most stable bound conformation. Graphically, this
is represented by a curve with several local minima until the global minimum at
the end of the curve (i.e., the overall transition path). The transitions between each
of these minima creates a free energy map along the progression of the reaction
pathway (i.e., the reaction coordinate) [7]. This map is known as the potential
of mean force [24]. To analyze the free energy change using these minima and
transition states, MD simulations generate trajectories between the stable states of
the protein and ligand to analyze the nature of the binding mechanism [25]. The
beauty of the transition path sampling (TPS) method is the unbiased nature of
the transition trajectories: a trial trajectory is randomly generated from the initial
trajectory and then analyzed and subject to the Metropolis rule [26]. This rule either
accepts a trial trajectory if it connects two minima or rejects the trial trajectory if it
does not. The ensemble of paths from this analysis can be used to create the reaction
coordinate and free energy landscape, which is known as the potential of mean force
(PMF) [25, 26]. In the context of protein-ligand binding, the PMF-based method
restrains the ligand into the conformation it has while bound, and then translates
it into the binding site and removing the restraints; this technique is particularly
useful with large solvation free energies [2]. The more traditional double decoupling
6 N. Mishra and N. Forouzesh

methods pull the ligand from the binding site back into the solvent. There are also
non-equilibrium methods based on the Jarzynski relationship that can be used to
obtain the BFE [27, 28].
While TPS has many advantages, especially regarding its speed and ability
to analyze rare dynamical events that ordinary MD simulations cannot, there are
disadvantages that also coincide with this technique. One of the primary issues
is the delicacy of defining the stable states. If states are not defined precisely,
conformations can get “stuck” in a specific transition state or in an intermediate
state, which leads to insufficient sampling of the conformations and transition
pathways [7, 26]. Additionally, the unbiased nature of TPS makes it powerful,
however, it is also more difficult to implement because of the bookkeeping necessary
to keep track of both the forward and backward time integrations employed in
this method [26]. There have been some techniques based off of TPS which have
enhanced sampling and led to more accurate and faster PMF calculations. For
example, temperature acceleration method uses high temperatures to accelerate
calculations [29]. Other enhanced sampling techniques include metadynamics [30],
blue-moon sampling [31], umbrella sampling [32], and replica exchange [33].

2.3 End-Point Methods

End-point methods are the final and most efficient G calculation technique
discussed in this chapter. The low expense of these methods relies on the fact that
they sample only the final states of a system (unlike alchemical and path sampling,
which sample non-physical and physical intermediates, respectively) [34]. End-
point methods aim to serve as an intermediate between the speed of scoring
approaches and the accuracy of computationally expensive alchemical and path
sampling methods [35, 36]. Molecular mechanics Poisson–Boltzmann surface area
(MM/PBSA) and molecular mechanics generalized Born surface area (MM/GBSA)
are of the most commonly used end-point methods and are closely related [37]. The
basis of the MM/PB(GB)SA approach relies on some of the ideas discussed in the
introduction to this chapter. The change in binding free energy is:

G = H − T S = GP L − GP − GL (5)

Where P refers to the protein, L to the ligand, and P L to the complex. This equation
can be broken down into more specific contributions based on different interactions
with the protein-ligand system [7, 36, 38]. Specifically, the enthalpic contribution
is the sum of the changes in gas-phase molecular mechanics and the solvation free
energy:

G = EMM + Gsolv − T S (6)


Protein-Ligand Binding with Applications in Molecular Docking 7

EMM is the gas-phase energy of the solute and consists of the changes in internal
energy (i.e., the changes from bond, angle and dihedral energies), van der Waals
energy (such as the formation and breaking of hydrogen bonds), and the change in
electrostatic energy [36, 38, 39]. Gsolv is the solvation free energy which consists
of both polar and nonpolar components. The polar component is calculated via
PB (or GB) [40–44] and the nonpolar component is calculated using the solvent-
accessible surface area [45, 46]. The −T S term refers to the absolute temperature
multiplied by the entropy, estimated from normal mode analysis (NMA). This
method is quite computationally expensive and is often disregarded in PB/GB
calculations. Recently, a few computational and theoretical techniques have been
introduced to overcome the intractability of NMA calculations, such as system
truncation [39], consideration of distance-dependent dielectric constant [47], and
reduction in translational and rotational freedom of the ligand upon protein-ligand
binding [48].
MD simulations can be employed for MM/PB(GB)SA calculations in one of
three ways [49]. The first way, the separate-trajectory method (3A), uses three
different trajectories to analyze the protein, ligand, and bound complex separately.
Another method is known as the single trajectory (1A) method and uses one
trajectory to generate the ensemble of snapshots [36, 38, 39]. The performance of
the separate versus the single method was found to vary between systems and test
models, but the separate trajectory approach has been shown to have greater standard
errors and uncertainties, leading the 1A method to be generally deemed as more
accurate [50–52]. Another benefit of the single trajectory method is that the change
in bond energy is no longer relevant to EMM as it cancels out [49]. One of the
downfalls with the single trajectory approach is that it disregards structural changes
to the ligand upon binding to the protein/receptor that are potentially important in
getting accurate characterization of the protein-ligand interaction [7, 53]. To remedy
this, there have been suggestions of a “compromise”—the 2A approach, which
employs trajectories to sample both the complex as well as the ligand in order
to account for energetic changes due to restructuring the ligand [49, 53]. The 2A
method has been found to improve results of binding affinity prediction [54].
The computational protocol for MM/PB(GB)SA can be generally described as
follows: first, an explicit solvent model is used to perform an MD simulation on the
protein-ligand complex to get the free energy contributions from the protein, ligand,
and complex and produce the necessary ensemble of snapshots. Then, the solvent
molecules and charged ions are deleted from each snapshot to prepare for the use
of the implicit solvent model [55], and then the MM/PB(GB)SA method is used to
analyze and compute the total Gsolv for each snapshot. The overall Gsolv is the
sum of the individual contributions [7, 34, 39]. This thermodynamic cycle has been
shown in Fig. 2. The use of an explicit solvent model in the MD simulation followed
by the implicit solvent model for calculation might seem inconsistent since different
energy functions are used, but studies have actually found that using an implicit
solvent model for the MD simulation as well has led to less accurate results [56].
The unreliable double implicit model has sometimes led to a dissociation of the
protein or ligand, rendering it less functional than an explicit simulation followed
by an implicit calculation [57].
8 N. Mishra and N. Forouzesh

Fig. 2 The thermodynamic


cycle for calculating the
solvation component of
binding free energy,
Gsolv , using implicit
solvent modeling. The water
environment is shown in blue
and vacuum is in white

3 Protein-Ligand Binding Databases

Several freely available protein-ligand binding databases have been organized to


examine the affinities of ligands to various proteins, with some databases containing
several thousands of protein targets and over one million test compounds. While
these expansive databases are useful for many types of analysis, they are particularly
helpful in the process of drug discovery; similar to molecular docking (discussed
later in the chapter), they allow for pre-screening of candidate drugs to target
proteins. This section will briefly discuss a few of the protein-ligand binding
databases available as web services.
First introduced in 1995, BindingDB [58] primarily focuses on proteins discov-
ered to be drug targets, with one of the goals of the database being a resource to
help design self-assembling systems in vitro [59, 60]. The database has an expansive
query system, allowing for search by structure, name, sequence, pathways, journal
articles, affinity range, and more. One of the interesting features of BindingDB is
that it provides virtual screening on the web service, wherein the user gives a ligand
training set alongside candidate ligands; BindingDB will then use the training set to
rank the candidates [59–61]. As of September 2021, BindingDB has 8,625 protein
targets and 1,006,573 test compounds. This database is a vast and incredibly useful
resource, especially considering its accessibility and embedded features.
Another popular protein-ligand binding database is PDBBind, which aims to
align itself with biomolecular complexes in the Protein Data Bank (PDB). PDBbind
considers only experimentally determined complexes and reports binding data
obtained via experiment [62–65]. Each year, automated programs organize com-
plexes from the PDB database into four categories: protein-small ligand complexes,
nucleic acid-small ligand complexes, protein-nucleic acid complexes, and protein-
protein complexes. Once this categorization is complete, three sets are created from
this information: the general set, refined set, and core set. The general set is the
most broad, containing the biomolecular complexes for which binding affinity data
are provided. The refined set is more specific with complexes from the general set
that undergo several filters to obtain those with higher quality data. Finally, the
core set is further filtered to provide complexes for validating docking and scoring
methods [62–65]. The 2020 update of PDBBind found 78,460 valid complexes,
placed 24,496 in the general set, 5,316 in the refined set, and 285 complexes in the
core set.
Protein-Ligand Binding with Applications in Molecular Docking 9

The final popular database is Binding Mother of All Databases, better known as
Binding MOAD [66]. Similar to PDBBind, Binding MOAD has also generated sets
based on high-quality data from the PDB. Binding MOAD finds proteins with x-
ray crystal structures, and the subset is said to fall between PDBbind’s general and
refined sets [66, 67]. Binding MOAD provides a myriad of information and data.
The webpage for each protein shows ligand information (both valid and invalid
reports), any available binding data, the chemical structures of the ligands, and
proteins within the 90%, 70%, and 50% homologies. These tools are all extremely
useful because they allow for an easy way to search for similar structures [66, 67].
As of the 2019 release, there are 38,702 protein-ligand structures, 14,324 instances
of binding data, 18,939 ligands, and 10,500 protein families. Overall, Binding
MOAD is one of the most robust protein-ligand databases available, providing large
amounts of information across many types of complexes.

4 Molecular Docking

Molecular docking is a computational method used to determine the binding mode


of ligands when they interact with proteins of known structure. By examining
many protein-ligand binding conformations and orientations—known as poses—
evaluating these poses, and ranking them via a score, docking methods determine
the most likely conformation of a ligand [68]. Molecular docking has shown
applicability in a variety of research areas, particularly computer-aided drug design
(CADD). Docking can be used to screen various compounds as they attach to
their target proteins. It is a powerful tool that can significantly expedite and
cut costs of drug discovery [7]. This section will discuss the main principles
of protein-ligand docking as well as how protein-ligand (PL) docking relates to
concepts discussed earlier in this chapter. The end-goal of PL docking is to achieve
high-throughput screening of various compounds to select the most viable drug
candidates; determining the general nature of in silico binding between a drug and a
human receptor will allow for more efficient drug discovery methods [68–70]. There
are two main parts of molecular docking: sampling to find putative ligand binding
modes (poses), and scoring to rank the poses from sampling phase [69, 71].
The sampling step of PL docking accounts for both the nature of the ligand and
the way it binds to the receptor as well as the flexibility of the protein [69]. Protein-
ligand binding is generally considered to occur through the induced fit model of
protein-ligand interaction. In this model, the protein is considered flexible enough
to slightly change conformation upon a ligand reaching its binding site to better
accommodate the ligand [13, 72, 73]. This model allows for a tighter overall fit and
accounts for proteins and ligands that bind but do not have matching shapes prior
to interaction. Accounting for protein flexibility has shown to be one of the biggest
challenges in PL docking because of the large number of states possible for the
protein as well as the computational cost of analysis for the full conformational
space around the protein [74]. Sampling is done via search algorithms, which
10 N. Mishra and N. Forouzesh

have been developed using energy functions to pull out the putative binding poses
between a protein and ligand [10]. There are many algorithms by which these poses
can be found, and one goal of algorithms currently being developed is to expedite
the search process [10, 75].
The other part of the docking process is the scoring phase, wherein the various
poses are ranked from most to least likely. The scoring function is used to assess the
binding affinity between the protein and ligand after docking; ideally, the scoring
function should predict both the binding free energy as well as allow for high-
throughput screening of various drugs [7]. Different scoring functions have been
developed for docking purposes, and they can be broken down into three principal
groups: empirically-based functions, knowledge-based functions, and force-field
based functions.
Empirically-based scoring functions are perhaps the most simplistic of those
listed above. This model involves getting a descriptor for the binding, a training
set of a vast number and types of protein-ligand complexes with the experimental
binding affinities, and a regression, classification, or machine learning algorithm
to help form a relationship between the descriptor(s) and the experimental affinity
values [76, 77]. Empirical scoring functions rely on many different energy terms
to determine favorability; these types of functions are more simplistic in that they
do not consider the underlying physical interactions of the protein and ligand. Some
areas of research within empirically-based functions include finding accurate energy
terms to incorporate into the function as well as avoiding overfitting energy terms,
which is an issue that stems from the large number of energy terms used in empirical
functions [78].
Knowledge-based scoring functions are statistically-based functions which come
from the assumption that interactions between the protein and ligand that occur
more frequently than expected by random chance contribute more favorably to the
binding affinity [79]. These types of scoring functions also rely on training sets
to find the statistical potentials of various interactions. One of the biggest issues
with knowledge-based scoring functions is their computational implementation,
however, they serve as a “happy medium” between empirical and force-field based
scoring functions due to their relative swiftness and simplicity, as well as their ability
to avoid some of the challenges posed by empirical functions [80].
Force-field based approaches account for the underlying physical interactions
occurring between binding partners. As mentioned previously, entropic calculations
are computationally expensive; thus, they are often ignored. However, it is important
to acknowledge that entropic effects can play a major role in the binding affinity,
generating a need for a more efficient method to find the entropic contribution.
Force-field based methods are of particular relevance because they can employ
the implicit solvent models within PBSA and GBSA to account for the physical
interactions of the solvent, improving the accuracy with which the binding affinity
can be predicted [81, 82]. One of the benefits of using MM/PB(GB)SA is that such
methods can explore structure-activity relationships within ligands as well as their
selectivity profiles, and are less computationally demanding than FEP techniques.
Additionally, computation time can be decreased by modifying MM/PB(GB)SA to
Protein-Ligand Binding with Applications in Molecular Docking 11

shorter MD simulations or other parameter changes, allowing for a more flexible


method of calculation that can evaluate multiple complexes (i.e. multiple binding
poses) at once [80, 83]. Although this method is perhaps the most complex and
computationally expensive of the three types of functions discussed, the consid-
eration of the physical interactions involved in the binding mechanisms leads to
more specific binding affinity values and therefore more accurate rankings within
the scoring phase of docking.
The versatile nature of molecular docking and the different algorithms that have
been developed to try and maximize the accuracy and efficiency of this technique
has led to a vast number of available online web services and downloadable
software for docking. One of the older freewares developed for docking is Docking
wIth eVolutionary AlgorIthms, also known as DIVALI. It was developed in 1995
based off AMBER-type potential functions within the search algorithm [84]. More
modern docking resources include the Automated Active Site Detection, Docking,
and Scoring (AADS) protocol which uses an MC based method to construct a
robust web service with accurate results for protein-ligand binding [85]. AutoDock
and AutoDock Vina from the Scripps Research Institute are perhaps two of the
most commonly used docking software; Vina was developed as a faster and more
accurate version of AutoDock. Vina and Autodock each have their own advantages;
comparative studies between Vina and AutoDock 4 (the most recent version of
AutoDock) have found that Vina is better at predicting the experimental protein-
ligand binding poses, but AutoDock calculates more accurate and precise binding
energies [86, 87]. The variation among available docking software allows for a lot
of versatility within protein-ligand calculations.

5 Conclusion

Computational study of protein-ligand binding is useful in several fields, including


drug design and discovery. In this chapter, we presented a general background to
protein-ligand binding with emphasis on binding free energy calculation. Leading
computational methods, binding databases, and relevant applications to molecular
docking have been introduced and discussed thoroughly.

Acknowledgments This works has been partially supported by a California State University
Program for Education and Research in Biotechnology (CSUPERB) New Investigator funding
awarded to N.F.

References

1. W. L. Jorgensen, “The many roles of computation in drug discovery,” Science, vol. 303,
no. 5665, pp. 1813–1818, 2004.
12 N. Mishra and N. Forouzesh

2. H.-J. Woo and B. Roux, “Calculation of absolute protein–ligand binding free energy from
computer simulations,” Proceedings of the National Academy of Sciences, vol. 102, no. 19,
pp. 6825–6830, 2005.
3. M. M. Pierce, C. Raman, and B. T. Nall, “Isothermal titration calorimetry of protein–protein
interactions,” Methods, vol. 19, no. 2, pp. 213–221, 1999.
4. R. Karlsson and A. Fält, “Experimental design for kinetic analysis of protein-protein inter-
actions with surface plasmon resonance biosensors,” Journal of Immunological Methods,
vol. 200, no. 1-2, pp. 121–133, 1997.
5. A. M. Rossi and C. W. Taylor, “Analysis of protein-ligand interactions by fluorescence
polarization,” Nature Protocols, vol. 6, no. 3, pp. 365–387, 2011.
6. M. Jerabek-Willemsen, T. André, R. Wanner, H. M. Roth, S. Duhr, P. Baaske, and D. Breit-
sprecher, “Microscale thermophoresis: Interaction analysis and beyond,” Journal of Molecular
Structure, vol. 1077, pp. 101–113, 2014.
7. X. Du, Y. Li, Y.-L. Xia, S.-M. Ai, J. Liang, P. Sang, X.-L. Ji, and S.-Q. Liu, “Insights
into protein–ligand interactions: mechanisms, models, and methods,” International Journal
of Molecular Sciences, vol. 17, no. 2, p. 144, 2016.
8. M. K. Gilson and H.-X. Zhou, “Calculation of protein-ligand binding affinities,” Annual Review
of Biophysics and Biomolecular Structure, vol. 36, pp. 21–42, 2007.
9. S. J. Y. Macalino, V. Gosu, S. Hong, and S. Choi, “Role of computer-aided drug design in
modern drug discovery,” Archives of Pharmacal Research, vol. 38, no. 9, pp. 1686–1701, 2015.
10. P. H. Torres, A. C. Sodero, P. Jofily, and F. P. Silva-Jr, “Key topics in molecular docking for
drug design,” International Journal of Molecular Sciences, vol. 20, no. 18, p. 4574, 2019.
11. J. Li, A. Fu, and L. Zhang, “An overview of scoring functions used for protein–ligand
interactions in molecular docking,” Interdisciplinary Sciences: Computational Life Sciences,
vol. 11, no. 2, pp. 320–328, 2019.
12. H. Li, Y. Xie, C. Liu, and S. Liu, “Physicochemical bases for protein folding, dynamics, and
protein-ligand binding,” Science China Life Sciences, vol. 57, no. 3, pp. 287–302, 2014.
13. R. Miesfeld and M. McEvoy, Biochemistry. W.W. Norton, 2017.
14. D. A. McQuarrie and J. D. Simon, Physical Chemistry: a Molecular Approach, vol. 1.
University science books Sausalito, CA, 1997.
15. J. D. Chodera, D. L. Mobley, M. R. Shirts, R. W. Dixon, K. Branson, and V. S. Pande,
“Alchemical free energy methods for drug discovery: progress and challenges,” Current
Opinion in Structural Biology, vol. 21, no. 2, pp. 150–160, 2011.
16. A. S. Mey, B. Allen, H. E. B. Macdonald, J. D. Chodera, M. Kuhn, J. Michel, D. L. Mobley,
L. N. Naden, S. Prasad, A. Rizzi, et al., “Best practices for alchemical free energy calculations,”
arXiv preprint arXiv:2008.03067, 2020.
17. W. L. Jorgensen and L. L. Thomas, “Perspective on free-energy perturbation calculations for
chemical equilibria,” Journal of Chemical Theory and Computation, vol. 4, no. 6, pp. 869–876,
2008.
18. Y. Meng, D. Sabri Dashti, and A. E. Roitberg, “Computing alchemical free energy differences
with Hamiltonian replica exchange molecular dynamics (H-REMD) simulations,” Journal of
Chemical Theory and Computation, vol. 7, no. 9, pp. 2721–2727, 2011.
19. W. Jespers, M. Esguerra, J. Åqvist, and H. Gutiérrez-de Terán, “Qligfep: an automated
workflow for small molecule free energy calculations in q,” Journal of Cheminformatics,
vol. 11, no. 1, pp. 1–16, 2019.
20. V. Gapsys, S. Michielssens, J. H. Peters, B. L. de Groot, and H. Leonov, “Calculation of binding
free energies,” in Molecular Modeling of Proteins, pp. 173–209, Springer, 2015.
21. M. J. Mitchell and J. A. McCammon, “Free energy difference calculations by thermodynamic
integration: difficulties in obtaining a precise value,” Journal of Computational Chemistry,
vol. 12, no. 2, pp. 271–275, 1991.
22. M. Jorge, N. M. Garrido, A. J. Queimada, I. G. Economou, and E. A. Macedo, “Effect of the
integration method on the accuracy and computational efficiency of free energy calculations
using thermodynamic integration,” Journal of Chemical Theory and Computation, vol. 6, no. 4,
pp. 1018–1027, 2010.
Protein-Ligand Binding with Applications in Molecular Docking 13

23. S. Bruckner and S. Boresch, “Efficiency of alchemical free energy simulations. I. A practical
comparison of the exponential formula, thermodynamic integration, and Bennett’s acceptance
ratio method,” Journal of Computational Chemistry, vol. 32, no. 7, pp. 1303–1319, 2011.
24. W. You, Z. Tang, and C.-e. A. Chang, “Potential mean force from umbrella sampling
simulations: What can we learn and what is missed?,” Journal of Chemical Theory and
Computation, vol. 15, no. 4, pp. 2433–2443, 2019.
25. S. Wan, A. P. Bhati, S. J. Zasada, and P. V. Coveney, “Rapid, accurate, precise and reproducible
ligand–protein binding free energy prediction,” Interface Focus, vol. 10, no. 6, p. 20200007,
2020.
26. P. Bolhuis and C. Dellago, “Practical and conceptual path sampling issues,” The European
Physical Journal Special Topics, vol. 224, no. 12, pp. 2409–2427, 2015.
27. C. Jarzynski, “Nonequilibrium equality for free energy differences,” Physical Review Letters,
vol. 78, no. 14, p. 2690, 1997.
28. C. F. Narambuena, D. M. Beltramo, and E. P. Leiva, “Polyelectrolyte adsorption on a charged
surface. free energy calculation from Monte Carlo simulations using Jarzynski equality,”
Macromolecules, vol. 41, no. 21, pp. 8267–8274, 2008.
29. L. Maragliano and E. Vanden-Eijnden, “A temperature accelerated method for sampling free
energy and determining reaction pathways in rare events simulations,” Chemical Physics
Letters, vol. 426, no. 1-3, pp. 168–175, 2006.
30. A. Laio and M. Parrinello, “Escaping free-energy minima,” Proceedings of the National
Academy of Sciences, vol. 99, no. 20, pp. 12562–12566, 2002.
31. G. Ciccotti, R. Kapral, and E. Vanden-Eijnden, “Blue moon sampling, vectorial reaction
coordinates, and unbiased constrained dynamics,” ChemPhysChem, vol. 6, no. 9, pp. 1809–
1814, 2005.
32. J.-F. St-Pierre, M. Karttunen, N. Mousseau, T. Rog, and A. Bunker, “Use of umbrella sampling
to calculate the entrance/exit pathway for z-pro-prolinal inhibitor in prolyl oligopeptidase,”
Journal of Chemical Theory and Computation, vol. 7, no. 6, pp. 1583–1594, 2011.
33. M. Fajer, D. Hamelberg, and J. A. McCammon, “Replica-exchange accelerated molecular
dynamics (REXAMD) applied to thermodynamic integration,” Journal of Chemical Theory
and Computation, vol. 4, no. 10, pp. 1565–1569, 2008.
34. E. Wang, H. Sun, J. Wang, Z. Wang, H. Liu, J. Z. Zhang, and T. Hou, “End-point binding
free energy calculation with MM/PBSA and MM/GBSA: strategies and applications in drug
design,” Chemical Reviews, vol. 119, no. 16, pp. 9478–9508, 2019.
35. E. A. Rifai, M. van Dijk, and D. P. Geerke, “Recent developments in linear interaction energy
based binding free energy calculations,” Frontiers in Molecular Biosciences, vol. 7, p. 114,
2020.
36. S. Genheden and U. Ryde, “Comparison of the efficiency of the lie and MM/GBSA methods
to calculate ligand-binding energies,” Journal of Chemical Theory and Computation, vol. 7,
no. 11, pp. 3768–3778, 2011.
37. H. Gohlke and D. A. Case, “Converging free energy estimates: Mm-pb (gb) sa studies on the
protein–protein complex ras–raf,” Journal of Computational Chemistry, vol. 25, no. 2, pp. 238–
250, 2004.
38. J. Srinivasan, T. E. Cheatham, P. Cieplak, P. A. Kollman, and D. A. Case, “Continuum solvent
studies of the stability of DNA, RNA, and phosphoramidate- DNA helices,” Journal of the
American Chemical Society, vol. 120, no. 37, pp. 9401–9409, 1998.
39. N. Forouzesh and N. Mishra, “An effective MM/GBSA protocol for absolute binding free
energy calculations: A case study on sars-cov-2 spike protein and the human ace2 receptor,”
Molecules, vol. 26, no. 8, p. 2383, 2021.
40. A. Onufriev, D. Bashford, and D. A. Case, “Modification of the generalized born model suitable
for macromolecules,” J. Phys. Chem. B, vol. 104, no. 15, pp. 3712–3720, 2000.
41. N. Forouzesh, S. Izadi, and A. V. Onufriev, “Grid-based surface generalized born model
for calculation of electrostatic binding free energies,” Journal of Chemical Information and
Modeling, vol. 57, no. 10, pp. 2505–2513, 2017.
14 N. Mishra and N. Forouzesh

42. N. Forouzesh, A. Mukhopadhyay, L. T. Watson, and A. V. Onufriev, “Multidimensional global


optimization and robustness analysis in the context of protein–ligand binding,” Journal of
Chemical Theory and Computation, vol. 16, no. 7, pp. 4669–4684, 2020.
43. C. Tan, L. Yang, and R. Luo, “How well does Poisson- Boltzmann implicit solvent agree with
explicit solvent? a quantitative analysis,” Journal of Physical Chemistry B, vol. 110, no. 37,
pp. 18680–18687, 2006.
44. D. Chen, Z. Chen, C. Chen, W. Geng, and G.-W. Wei, “MIBPB: a software package for
electrostatic analysis,” Journal of Computational Chemistry, vol. 32, no. 4, pp. 756–770, 2011.
45. M. K. Gilson and B. Honig, “Calculation of the total electrostatic energy of a macromolecular
system: solvation energies, binding energies, and conformational analysis,” Proteins: Structure,
Function, and Bioinformatics, vol. 4, no. 1, pp. 7–18, 1988.
46. J. Wang, T. Hou, and X. Xu, “Recent advances in free energy calculations with a combination
of molecular mechanics and continuum models,” Current Computer-Aided Drug Design, vol. 2,
no. 3, pp. 287–306, 2006.
47. S. Genheden, O. Kuhn, P. Mikulskis, D. Hoffmann, and U. Ryde, “The normal-mode entropy
in the MM/GBSA method: effect of system truncation, buffer region, and dielectric constant,”
Journal of Chemical Information and Modeling, vol. 52, no. 8, pp. 2079–2088, 2012.
48. I. Y. Ben-Shalom, S. Pfeiffer-Marek, K.-H. Baringhaus, and H. Gohlke, “Efficient approxima-
tion of ligand rotational and translational entropy changes upon binding for use in mm-pbsa
calculations,” Journal of Chemical Information and Modeling, vol. 57, no. 2, pp. 170–189,
2017.
49. S. Genheden and U. Ryde, “The MM/PBSA and MM/GBSA methods to estimate ligand-
binding affinities,” Expert Opinion on Drug Discovery, vol. 10, no. 5, pp. 449–461, 2015.
50. S. Genheden and U. Ryde, “Comparison of end-point continuum-solvation methods for
the calculation of protein–ligand binding free energies,” Proteins: Structure, Function, and
Bioinformatics, vol. 80, no. 5, pp. 1326–1342, 2012.
51. P. Mikulskis, S. Genheden, and U. Ryde, “Effect of explicit water molecules on ligand-binding
affinities calculated with the MM/GBSA approach,” Journal of Molecular Modeling, vol. 20,
no. 6, pp. 1–11, 2014.
52. D. A. Pearlman, “Evaluating the molecular mechanics Poisson- Boltzmann surface area free
energy method using a congeneric series of ligands to p38 map kinase,” Journal of Medicinal
Chemistry, vol. 48, no. 24, pp. 7796–7807, 2005.
53. J. M. Swanson, R. H. Henchman, and J. A. McCammon, “Revisiting free energy calculations:
a theoretical connection to MM/PBSA and direct calculation of the association free energy,”
Biophysical Journal, vol. 86, no. 1, pp. 67–74, 2004.
54. C.-Y. Yang, H. Sun, J. Chen, Z. Nikolovska-Coleska, and S. Wang, “Importance of ligand
reorganization free energy in protein- ligand binding-affinity prediction,” Journal of the
American Chemical Society, vol. 131, no. 38, pp. 13709–13721, 2009.
55. A. Onufriev, “Implicit solvent models in molecular dynamics simulations: A brief overview,”
Annual Reports in Computational Chemistry, vol. 4, pp. 125–137, 2008.
56. A. Weis, K. Katebzadeh, P. Söderhjelm, I. Nilsson, and U. Ryde, “Ligand affinities predicted
with the MM/PBSA method: dependence on the simulation method and the force field,”
Journal of Medicinal Chemistry, vol. 49, no. 22, pp. 6596–6606, 2006.
57. F. Godschalk, S. Genheden, P. Söderhjelm, and U. Ryde, “Comparison of MM/GBSA
calculations based on explicit and implicit solvent simulations,” Physical Chemistry Chemical
Physics, vol. 15, no. 20, pp. 7731–7739, 2013.
58. T. Liu, Y. Lin, X. Wen, R. N. Jorissen, and M. K. Gilson, “BindingDB: a web-accessible
database of experimentally determined protein–ligand binding affinities,” Nucleic Acids
Research, vol. 35, no. suppl_1, pp. D198–D201, 2007.
59. X. Chen, Y. Lin, M. Liu, and M. K. Gilson, “The binding database: data management and
interface design,” Bioinformatics, vol. 18, no. 1, pp. 130–139, 2002.
60. X. Chen, Y. Lin, and M. K. Gilson, “The binding database: overview and user’s guide,”
Biopolymers: Original Research on Biomolecules, vol. 61, no. 2, pp. 127–141, 2001.
Protein-Ligand Binding with Applications in Molecular Docking 15

61. M. K. Gilson, T. Liu, M. Baitaluk, G. Nicola, L. Hwang, and J. Chong, “BindingDB in 2015: a
public database for medicinal chemistry, computational chemistry and systems pharmacology,”
Nucleic Acids Research, vol. 44, no. D1, pp. D1045–D1053, 2016.
62. R. Wang, X. Fang, Y. Lu, and S. Wang, “The PDBbind database: Collection of binding affinities
for protein- ligand complexes with known three-dimensional structures,” Journal of Medicinal
Chemistry, vol. 47, no. 12, pp. 2977–2980, 2004.
63. R. Wang, X. Fang, Y. Lu, C.-Y. Yang, and S. Wang, “The PDBbind database: methodologies
and updates,” Journal of Medicinal Chemistry, vol. 48, no. 12, pp. 4111–4119, 2005.
64. M. Su, Q. Yang, Y. Du, G. Feng, Z. Liu, Y. Li, and R. Wang, “Comparative assessment of
scoring functions: the casf-2016 update,” Journal of Chemical Information and Modeling,
vol. 59, no. 2, pp. 895–913, 2018.
65. Z. Liu, M. Su, L. Han, J. Liu, Q. Yang, Y. Li, and R. Wang, “Forging the basis for developing
protein–ligand interaction scoring functions,” Accounts of Chemical Research, vol. 50, no. 2,
pp. 302–309, 2017.
66. L. Hu, M. L. Benson, R. D. Smith, M. G. Lerner, and H. A. Carlson, “Binding moad (mother of
all databases),” Proteins: Structure, Function, and Bioinformatics, vol. 60, no. 3, pp. 333–340,
2005.
67. R. D. Smith, J. J. Clark, A. Ahmed, Z. J. Orban, J. B. Dunbar Jr, and H. A. Carlson, “Updates
to binding moad (mother of all databases): polypharmacology tools and their utility in drug
repurposing,” Journal of Molecular Biology, vol. 431, no. 13, pp. 2423–2433, 2019.
68. X. Zhang, H. Perez-Sanchez, and F. C Lightstone, “A comprehensive docking and MM/GBSA
rescoring study of ligand recognition upon binding antithrombin,” Current Topics in Medicinal
Chemistry, vol. 17, no. 14, pp. 1631–1639, 2017.
69. S.-Y. Huang and X. Zou, “Advances and challenges in protein-ligand docking,” International
Journal of Molecular Sciences, vol. 11, no. 8, pp. 3016–3034, 2010.
70. S. F Sousa, N. MFSA Cerqueira, P. A Fernandes, and M. Joao Ramos, “Virtual screening
in drug design and development,” Combinatorial Chemistry & High Throughput Screening,
vol. 13, no. 5, pp. 442–453, 2010.
71. R. G. Coleman, M. Carchia, T. Sterling, J. J. Irwin, and B. K. Shoichet, “Ligand pose and
orientational sampling in molecular docking,” PloS One, vol. 8, no. 10, p. e75992, 2013.
72. K. A. Johnson, “Role of induced fit in enzyme specificity: a molecular forward/reverse switch,”
Journal of Biological Chemistry, vol. 283, no. 39, pp. 26297–26301, 2008.
73. N. Forouzesh, M. R. Kazemi, and A. Mohades, “Structure-based analysis of protein binding
pockets using von Neumann entropy,” in International Symposium on Bioinformatics Research
and Applications, pp. 301–309, Springer, 2014.
74. S. F. Sousa, A. J. Ribeiro, J. Coimbra, R. Neves, S. Martins, N. Moorthy, P. Fernandes, and
M. Ramos, “Protein-ligand docking in the new millennium–a retrospective of 10 years in the
field,” Current Medicinal Chemistry, vol. 20, no. 18, pp. 2296–2314, 2013.
75. I. Halperin, B. Ma, H. Wolfson, and R. Nussinov, “Principles of docking: An overview
of search algorithms and a guide to scoring functions,” Proteins: Structure, Function, and
Bioinformatics, vol. 47, no. 4, pp. 409–443, 2002.
76. I. A. Guedes, F. S. Pereira, and L. E. Dardenne, “Empirical scoring functions for structure-
based virtual screening: applications, critical aspects, and challenges,” Frontiers in Pharma-
cology, vol. 9, p. 1089, 2018.
77. L. P. Pason and C. A. Sotriffer, “Empirical scoring functions for affinity prediction of protein-
ligand complexes,” Molecular Informatics, vol. 35, no. 11-12, pp. 541–548, 2016.
78. S.-Y. Huang, S. Z. Grinter, and X. Zou, “Scoring functions and their evaluation methods for
protein–ligand docking: recent advances and future directions,” Physical Chemistry Chemical
Physics, vol. 12, no. 40, pp. 12899–12908, 2010.
79. I. Muegge, “PMF scoring revisited,” Journal of Medicinal Chemistry, vol. 49, no. 20, pp. 5895–
5902, 2006.
80. S. Z. Grinter and X. Zou, “Challenges, applications, and recent advances of protein-ligand
docking in structure-based drug design,” Molecules, vol. 19, no. 7, pp. 10150–10176, 2014.
16 N. Mishra and N. Forouzesh

81. W. C. Still, A. Tempczyk, R. C. Hawley, and T. Hendrickson, “Semianalytical treatment


of solvation for molecular mechanics and dynamics,” J. Am. Chem. Soc., vol. 112, no. 16,
pp. 6127–6129, 1990.
82. G. D. Hawkins, C. J. Cramer, and D. G. Truhlar, “Pairwise solute descreening of solute charges
from a dielectric medium,” Chemical Physics Letters, vol. 246, no. 1-2, pp. 122–129, 1995.
83. C. Granchi, M. Lapillo, S. Glasmacher, G. Bononi, C. Licari, G. Poli, M. El Boustani,
I. Caligiuri, F. Rizzolio, J. Gertsch, et al., “Optimization of a benzoylpiperidine class identifies
a highly potent and selective reversible monoacylglycerol lipase (magl) inhibitor,” Journal of
Medicinal Chemistry, vol. 62, no. 4, pp. 1932–1958, 2019.
84. K. P. Clark, “Flexible ligand docking without parameter adjustment across four ligand–receptor
complexes,” Journal of Computational Chemistry, vol. 16, no. 10, pp. 1210–1226, 1995.
85. T. Singh, D. Biswas, and B. Jayaram, “Aads-an automated active site identification, docking,
and scoring protocol for protein targets based on physicochemical descriptors,” Journal of
Chemical Information and Modeling, vol. 51, no. 10, pp. 2515–2527, 2011.
86. N. T. Nguyen, T. H. Nguyen, T. N. H. Pham, N. T. Huy, M. V. Bay, M. Q. Pham, P. C. Nam,
V. V. Vu, and S. T. Ngo, “Autodock vina adopts more accurate binding poses but autodock4
forms better binding affinity,” Journal of Chemical Information and Modeling, vol. 60, no. 1,
pp. 204–211, 2019.
87. S. Forli, R. Huey, M. E. Pique, M. F. Sanner, D. S. Goodsell, and A. J. Olson, “Computational
protein–ligand docking and virtual drug screening with the autodock suite,” Nature Protocols,
vol. 11, no. 5, pp. 905–919, 2016.
Explaining Small Molecule Binding
Specificity with Volumetric
Representations of Protein Binding Sites

Ziyi Guo and Brian Y. Chen

1 Introduction

Biological systems depend on proteins to perform almost every chemical function


in the cell. The catalysis of essential reactions, the transport of critical molecules,
the mechanical integrity of cells and many other functions rely on these diverse
and specialized worker molecules. Specialization is apparent in the way proteins
interact with other molecules: While there are tens of thousands of unique molecules
in the cell, most proteins only bind a narrow range of partners. This property, of
preferentially forming interactions with select molecules, is called specificity and it
organizes proteins into teams that function robustly even though they share crowded
cellular spaces with many unrelated molecules. Building an understanding of the
mechanisms that achieve specificity is a common goal in many areas of molecular
biology because it could reveal how teams of molecules function and how they
might be manipulated or reengineered for medical or industrial purposes. This
chapter aims to describe current computational methods by which specificity, and
the mechanisms that govern it, may be discovered.
The challenge of uncovering the function and specificity of all proteins is
staggering. In the human body alone there exist tens of thousands of unique proteins
and each is a complex machine that acts through multiple biophysical phenomena
to achieve particular functions. Some proteins achieve specificity using precise
structural complementarity. Others use the hydrophobic effect, the attraction and
repulsion of electric fields, or combinations of these and other phenomena. This
mechanistic complexity, coupled with the sheer number of proteins, is the reason
why the mechanisms controlling the binding preferences of most proteins remain

Z. Guo · B. Y. Chen ()


Lehigh University, Bethlehem, PA, USA
e-mail: zig312@lehigh.edu; byc210@lehigh.edu

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 17


N. Haspel et al. (eds.), Algorithms and Methods in Structural Bioinformatics,
Computational Biology, https://doi.org/10.1007/978-3-031-05914-8_2
18 Z. Guo and B. Y. Chen

uncertain: The space of hypothetical interactions and interaction mechanisms is


enormous, and exhaustive testing is out of the question. Despite this difficulty, the
expert insights of investigators in structural and molecular biology steadily reveal
the mechanisms of action employed by several more families of proteins every year.
We make progress in part because what we learn from the mechanisms of past
proteins has always informed the way we study the next protein. Crystallography
has revealed many interactions where close steric or electrostatic complementarity
occurs between molecules that bind tightly (e.g. [77]). Evolutionary conservation
exposes the presence of functional regions on molecular surfaces (e.g. [55]). These
lessons alert us to ways in which new interactions might occur. By systematizing
and reapplying these concepts, computational methods can be tools for hypothesis
generation, thereby narrowing the set of experiments necessary to discover binding
preferences and mechanisms. For example, algorithms for function annotation
automate the identification of geometrically similar functional sites in an effort to
identify proteins that catalyze the same reaction. This chapter reviews an emerging
subset of function annotation methods that specialize in specificity annotation,
the algorithmic prediction, deconstruction and functional analysis of the structural
mechanisms that achieve binding specificity. We maintain a focus on specificity
annotation for interactions between proteins and small molecules ( ligands), for
which a variety of recent techniques have been developed.

1.1 Comparison Algorithms for Examining Specificity

Specificity is a property that is fundamentally defined by comparison: A protein


prefers one ligand because it binds with less affinity to other ligands, not because of
absolute affinity. These preferences arise because, for example, the preferred ligand
has a shape that is more complementary than other ligands, not because of absolute
complementarity. The comparative nature of specificity, as a phenomenon, makes
protein sequence and structure comparison algorithms an ideal class of methods for
examining specificity.
At least two prediction problems in the field of specificity annotation relate to
protein-ligand binding specificity. Here, we refer to these problems using descriptive
names to clarify the aims of existing methods, even though consensus surrounding
the actual terminology in this emerging field has yet to be reached. The most
commonly studied problem is the specificity assignment problem of predicting the
binding preferences of a protein with unknown specificity. A second and emerging
category of problems is the component localization problem, the challenge of
identifying amino acids and other elements of protein sequence and structure
that influence specificity. Using structure comparison algorithms, some approaches
achieve objectives in multiple categories.
All protein structure comparison techniques share four integrated components,
which are the focus of specific methodological advancements in the field. First,
they employ a digital representation of protein structure that explicitly represents
Explaining Specificity with Volumetric Representations 19

the geometric and chemical aspects of the proteins being compared. Second,
the digital representation is paired with a comparison metric, which is used to
evaluate similarity between representations of multiple protein structures. Third,
a comparison algorithm is developed to identify the fairest comparison of two or
more proteins based on the metric. Finally, a statistical model is used to evaluate
the significance of the final measurement relative to baseline structural noise.
Enhancements in representations, metrics, comparison algorithms and statistical
models, while sometimes presented together as a singular method, represent
progress in the field towards the solution of problems in specificity annotation.
Due to their integrated nature, novel designs for some components, especially new
representations of protein structure, can represent major advancements to the field,
because they can permit new analytical capabilities, as we will discuss later in this
chapter.

2 Specificity Assignment

The prediction of small molecule binding specificity is founded on the principle


that ligand binding sites that are sufficiently similar to a site on another protein
will exhibit the same binding preferences. Therefore, if a geometric comparison
algorithm can find proteins that have binding sites with identical atoms in nearly
identical places, then we predict that those proteins prefer to catalyze the same
reaction on the same binding partners. It should be noted that some proteins catalyze
reactions with the same substrates using different molecular mechanisms. That
situation does not preclude the prediction of specificity based on similarity: it simply
means that the lack of similarity is not an indicator for different binding preferences.
This overall approach to structure comparison, with its focus on finding similar
binding sites, originates from and overlaps with earlier work intended to find
proteins with similar functions based on similar binding sites (e.g. function pre-
diction). Below, we review many methods developed for binding site comparison,
even if some have not been explicitly tested for identifying proteins with identical
specificity. In many cases, such methods could be modified or refined for specificity
assignment. Excluding them would paint an incomplete picture of the space of
technologies that can identify proteins with similar binding preferences. This section
organizes advancements in specificity assignment around the four common compo-
nents of structure comparison algorithms: First, we describe several computational
representations of binding sites (Sect. 2.1). We then describe the standard metric of
structural comparison, LRMSD, used to measure how similar two binding sites are
(Sect. 2.2). Algorithms in Sect. 2.3 describe how matching binding sites are found.
Finally, we describe several types of statistical models used to interpret whether
the matches indicate that matched binding sites exhibit similar binding preferences
(Sect. 2.4).
The prediction accuracy of most specificity assignment algorithms can be
evaluated in several ways. One way is to a benchmark against a nomenclature of
20 Z. Guo and B. Y. Chen

enzymes developed by the Nomenclature Committee of the International Union of


Biochemistry and Molecular Biology. This classification of an individual protein
is generally called the Enzyme Committee number (EC number), and it is written
as four period-delimited integers, w.x.y.z. The integer w denotes the broadest class
of chemical reactions catalyzed by proteins. x and y denote narrower categories
of distinct reactions. The integer z is used to classify proteins with identical
function and generally different specificity [84]. A specificity assignment method
thus operates correctly when it predicts the four digit EC number of a given protein
structure.

2.1 Binding Site Representations

The comparison of protein-ligand binding sites depends on a digital representation


called a motif that describes the atoms critical for binding and catalysis. Some
methods refer to this representation as a template (e.g. [2]). These atoms or amino
acids must first be selected for comparison, most frequently through literature
search [9], manual selection by biochemical experts [10, 41], or from databases
of condensed active site information [28, 69]. Other motifs selections are based
on the analysis of structure or sequence data, such as the largest cavity [80], or
using evolutionarily significant amino acids close to known ligand binding sites
[10, 29, 47]. The correct selection of atoms is essential for prediction accuracy in
specificity and function annotation, as even small variations can lead to considerable
variations in prediction accuracy [22]. Fortunately, several methods have been
developed to automatically refine motif designs [7, 18, 20–22, 66]. Once the atoms
or amino acids are selected, several binding site representations have been developed
to enable the comparison of the site.
The earliest specificity assignment and function prediction algorithms represent
binding sites with groups of points in three dimensions that describe the positions of
individual atoms. Labels have been used to identify points in space as representing
alpha carbons [10, 33], sidechain atoms [1, 60], or conserved binding patterns [52,
53]. Labeled points have also been used to represent molecular surfaces [51, 67, 68]
as well as electrostatic potentials on the molecular surface [43].
To reduce matches between unrelated binding sites, motifs of labeled points
were enhanced with additional information. These enhancements began with more
sophisticated labels, such as label sets that represent substitutions that may occur
in major evolutionary divergences [2, 10] (Fig. 1a). Other enhancements include
the incorporation of hinges that could partially represent the flexibility of protein
structures [54], pairs of points to describe sidechain orientation [31] (Fig. 1b), and
alpha shapes [30, 38, 80, 81] or spheres [19, 20] (Fig. 1c) that describe the empty
interior of a binding site. When the atoms of a protein match the motif, spheres
can be used to detect if other atoms occupy the region that must remain empty
to accommodate the ligand (Fig. 1c). If the ligand binding regions are not empty,
then the match can be eliminated. This elimination strategy can eliminate 80% of
Explaining Specificity with Volumetric Representations 21

a) b) c)
{g,v,c}
{r,k}

{f,w,y}
{a,v,i,l}

Fig. 1 Examples of point-based motif designs. (a) Multiple amino acid labels describe variations
that occur in major evolutionary divergences. (b) Vectors at each motif point encode sidechain
orientation. (c) Cavity spheres denote regions of the binding cavity that must remain empty in the
matching structure, to accommodate the ligand

statistically significant false positives while maintaining 87% of true positives (some
true positives are also lost in the process) [19].
Beyond the enhancement of motifs with additional information, algorithms have
also been developed to better select the points in the motif, to achieve superior
prediction accuracy. By repeatedly comparing variations of potential motifs against
a nonredundant subset of the PDB, Geometric Sieving [22] identifies motifs that
exhibit greater sensitivity than if an arbitrary set of points near the binding site were
used for the motif. Composite motifs use multiple structures of the same binding
site atoms to generate averaged structures can exhibit greater similarity to proteins
with similar binding preferences [18]. These composite motifs can be significantly
extended to represent the same protein in different conformations and variations of
the same family of proteins with different binding specificities [8].
Spherical harmonics have also been developed as an alternative representation
of protein-ligand binding pockets [58]. These infinitely differentiable single-value
functions on two polar coordinates can represent some three dimensional regions,
as long as they fulfill the star-convexity property: A region r is star-convex if there
exists a point p0 ∈ r such that for every point p ∈ r, all points on the segment
p0 p are inside r. Whereas the comparison of point-based binding sites requires
many pairs of corresponding points to be found and aligned, spherical harmonics
can be compared by finding optimal rotational superpositions. Spherical harmonics
can also be used to represent ligands and other small molecules.

2.2 Metrics for Binding Site Comparison

Structural similarity between point-based motifs is almost always measured using


least root mean squared distance (LRMSD). This measure is evaluated between n
pairs of corresponding points {m1 , p1 }, {m2 , p2 }, . . . {mn , pn }, where mi is a motif
point and pi is a point from the matching protein. RMSD is evaluated with the
expression:
22 Z. Guo and B. Y. Chen



1 n
RMSD = d(mi , pi ),
n
1

where d(mi , pi ) is the Euclidean distance between two points. LRMSD is the
minimal value of RMSD when over all rigid transformations of the motif, which
can be rapidly determined in linear time using eigenvector [44, 82] or quaternion
[27] methods. Because of the minimal nature of LRMSD, measuring LRMSD yields
both a measurement, in Angstroms, and a rigid superposition of one structure onto
the other. Many methods refer to LRMSD as RMSD, because there is no value in
using RMSD as a comparison metric when it has not been minimized.
Many methods treat this core element of protein structure comparison as a black
box, focusing instead on the selection of atoms to align and on the algorithm for
rapidly finding atoms that yield low LRMSD alignments. Since LRMSD amounts
to measuring the minimum geometric mean of the distances between points, newer
methods have developed additional geometric criteria to exclude other biologically
irrelevant matches. For example, Cavity Aware Match Augmentation [19] uses
spheres to detect and eliminate proteins that do not maintain an empty binding cavity
within the motif. pvSOAR maps points around a pocket onto the unit sphere, and
measures their LRMSD to evaluate structural similarity [81].
An important constraint on LRMSD is the fact that it depends on bijections
between two equally sized pointsets. This constraint has serious implications for
specificity assignment in two ways: First, the structure of different amino acids
cannot be fully compared, because different amino acids have different numbers of
atoms. Many methods partially sidestep this problem by representing amino acids
with only a single point. Unfortunately, that approach partially ignores variations
in sidechain geometry, causing important differences in shape to be overlooked.
Second, depending on a bijection requires that any amino acid that occupies
different positions in two binding sites, thereby altering specificity, must also be
part of the motif, otherwise the difference may not be detected. As a result, effective
motif design, whether by expert design (e.g. [41]) or algorithmic refinement (e.g.
[7, 18, 20, 22]) is mandatory for practical specificity assignment.

2.3 Comparison Algorithms

Comparison algorithms for function annotation and specificity assignment are


search algorithms that seek to identify bijective correspondences between the
motif points and the points of the protein that satisfy all matching criteria and
minimize LRMSD. An exhaustive approach is impractical, especially in cases where
evolutionary labels can cause the algorithm to consider comparing a single motif
point to many target points. Nonexhaustive approaches are practical, and generally
identify correct matches except when the only possible matches are so poor as to be
Explaining Specificity with Volumetric Representations 23

biologically irrelevant. Even though they are biologically irrelevant, however, very
poor matches must be accounted for in any statistical models, to prevent algorithmic
bias [32].
Geometric Hashing Point-based motifs can be rapidly compared using Geometric
Hashing [33, 48], a technique that represents points in a rotationally and translation-
ally invariant vector. Vectors are generated for every triplet in the motif and stored
in a range search structure. When searching for matches for a given motif inside a
protein structure, the triplets of points in the structure are encoded into invariants and
compared against the invariants of the motif. Similar invariants represent a triplet of
corresponding points that can be used to generate initial alignments [10] or used in
voting systems to generate final correspondences of points between the protein and
the motif.
Depth First Search A second group of algorithms uses depth first search techniques
to systematically construct bijections between motif points and target points (Fig. 2)
[73]. Algorithms like Match Augmentation [10], and Labelhash [57], begin with
one or a handful of point-point correspondences. From this initial state, depth first
searches superpose the corresponding points and then identify additional correspon-
dences between motif points that are brought into proximity with acceptable target
points. Multiple potential correspondences can be detected for each motif point,
creating a branching nature in the depth first search as different possibilities are
explored. After exploring many possibilities, both geometric hashing and depth
first search matching algorithms return the match with the largest number of
corresponding points.

c1 d1 e)

a) b)

c2 d2
X
?

Fig. 2 Operation of depth first search comparison algorithms. (a) three points (black, white have
correspondences (thin black lines), and we seek to find acceptable correspondences for the next
motif point (grey). (b) Range search (circle) for points that may correspond (white) with the
next motif point. At this point, the depth first search first considers one possibility (c1–e), and
eventually considers the other (c2, d2). (c1) A tentative correspondence is considered with the
first point. (d1) LRMSD superposition between corresponding points is computed. (e) The final
distance between points in LRMSD superposition is acceptable, so we consider the next motif
point (grey). (c2) A tentative correspondence is considered with the second point. (d2) The final
distance between points in LRMSD superposition is not acceptable, because they get too far apart.
This set of correspondences is not used further, and we backtrack to other correspondences
24 Z. Guo and B. Y. Chen

2.3.1 Data Structures

Algorithms that compare points in space always require a data structure that enables
range search. A common structure that achieves this purpose is a three dimensional
kd-tree [4], a space partitioning data structure which is well documented and does
not need to be described further. Due to the fact that atoms can only pack into limited
densities, however, three dimensional cubic lattice structures, Lattice Hashes, can
also be very useful. Geometric comparisons generally search for atoms nearby a
given point, which amounts to a range search on a spherical range. Lattice Hashes
can support spherical range search very easily.
Construction To construct a Lattice Hash, we begin with a given set of points in
three dimensions and a resolution that defines the size of each cube in the lattice.
We first find the bounding box of the input points and determine the smallest lattice
of cubes, based on the resolution parameter, that fully contains the bounding box
(Fig. 3a). Rather than allocating each cube immediately, we refer to each cube using
a unique index:

index = xpos ∗ (ydim ∗ zdim) + ypos ∗ (zdim) + zpos

where xpos, ypos, and zpos are the position of the cube, counting from the low-x,
low-y, low-z corner of the lattice. ydim and zdim are the number of cubes along
the y and z dimensions of the lattice. All cubes that contain points are stored in a
hash table based on this index. To insert each point into the lattice, we associate it
with the cube that contains it, allocating memory only for cubes that contain points
(Fig. 3b).

Spherical Range Search We begin with a point and a radius that define the spherical
range. To discover points in the lattice within the range, we first identify the set of
cubes that contain the bounding box of the range, and then identify individual cubes

a) b) c) d)

Fig. 3 Lattice Hash construction and operation. (a) The bounds of the Lattice Hash (edge of
lattice) surround the points to be represented (black circles) in uniform cubes. (b) Only occupied
cubes (black squares) are allocated on a hash table (second row). (c) Spherical range search (large
circle) centered at a point (white with heavy black outline). (d) Range search first identifies cubes
that intersect the range (black squares), then identifies occupied cubes (shaded squares) based on
their presence in the hash table
Explaining Specificity with Volumetric Representations 25

that overlap the sphere, using the distance of cube corners and the intersection of
cube segments (Fig. 3c). Finally, individual points are tested, and points within the
range are returned.

2.4 Statistical Models for Binding Site Comparison

Protein structure comparison algorithms identify matches that indicate geometric


similarity between the atoms of two protein structures. When only a few atoms are
being compared, as in the situation of motif comparison, matches occur frequently
by random chance. The existence of a match is therefore insufficient to assign
similar binding specificity. Furthermore, matches of motifs with just a few points
generally have lower LRMSDs than matches to motifs with larger numbers of
points, making any specific LRMSD threshold also insufficient to assign specificity.
To address this problem, statistical models can be used to assess the likelihood of
observing matches of a given motif at a particular LRMSD. When the likelihood p
of observing the match is so low that it could not have occurred by random chance,
it is called statistically significant.
Statistical significance is often measured with parametric models, which estimate
the probability of observing a match based on given parameters of a motif, such as
the number of atoms and the type of amino acids it represents. These parameters
are used to approximate the probability density function (PDF), which maps any
value of LRMSD l to the probability of observing matching sites with similarity
l. Parameters are also used to approximate the cumulative distribution function
(CDF), an analog of the PDF. The shape of the PDF has been approximated
using categories of functions that include extreme value distributions [1], empirical
distributions [6, 81], and mixtures of Gaussian distributions [41]. The advantage
of this approach is that the relationship between motif characteristics and the
coefficients of approximating function can be rapidly trained for a larger set of
motifs, and that the statistical significance of a given match can be evaluated in
constant time. Unfortunately, in unusual cases where the approximating function
cannot resemble the PDF, parametric models can produce inaccurate probabilities
and thus incorrectly label some matches as statistically significant.
Nonparametric statistical models use density estimation to create a custom PDF
that reflects any motif. Rather than generating the PDF based on motif parameters,
the shape of the PDF is sampled from a set of matches with the motif, before
statistical significance can be computed. This approach can be considerably more
accurate, but it also requires a large amount of computation to generate the PDF,
whereas a parametric model is simply a curve specified by motif parameters.
MASH [22, 32] and LabelHash [57] use nonparametric statistical models that can
eliminate 99% of matches to proteins that have different 4-digit EC numbers [32].
Leveraging parallel computation to train the model as rapidly as possible, MASH
demonstrated that training a nonparametric statistical modeling was not so compu-
tationally onerous as to be impractical. Enhancements to MASH demonstrated that
26 Z. Guo and B. Y. Chen

nonparametric models could compensate for incomplete matching algorithms that


are not guaranteed to identify matches when only low quality matches exist [32].
This kind of compensation is impossible with parametric models, which cannot alter
the shape of their probability density function based on algorithmic limitations.
Whether a parametric or nonparametric model is used, statistically significant
matches have been found to be strong predictors of proteins with identical binding
specificity (e.g. [1, 10, 32, 41]). As a result, statistical models act as a quantitative
tool for interpreting the output of specificity assignment algorithms, separating
matches that are likely to indicate that two proteins have similar specificity from
other matches that do not support the same inference.

3 Component Localization

Many parts of a protein work together to achieve specificity. Finding these com-
ponents within the rest of the protein can be extremely difficult. However, once
the molecular mechanisms that implement specificity are found, they can provide a
crucial platform for applied research: Later studies could mutate these mechanisms
to examine how, for example, the protein could be reengineered to achieve novel
binding preferences for industrial applications, or how the protein might mutate
to achieve inhibitor resistance. Likewise, before the molecular mechanism of
specificity is known, predicting elements of protein structures that might control
specificity can narrow the space of possibilities that must be examined in order
to unravel how specificity is actually achieved. By finding individually influential
components, component localization is the new effort to use computational preci-
sion and speed to suggest hypotheses about binding mechanisms that might have
been overlooked by human investigators.
The approach that component localization algorithms take is to search for subtle
differences in very similar proteins that cause different binding preferences. This
approach differs fundamentally from that of specificity assignment algorithms,
which search for subtle similarities that necessitate similar binding preferences
in very different proteins. Second, there are many ways in which differences in
protein structure can cause differences in binding preferences, including differences
in steric hindrance at binding sites, differences in electric fields, and so on. Finding a
difference that necessitates differences in binding preferences requires a biophysical
rationale for different binding preferences, and it also requires the consideration
of many different kinds of components. While amino acids are one kind of
component that influences specificity, empty clefts and cavities also play a role in
accommodating binding partners, as do local patterns of molecular flexibility. Many
components of protein structure and their biophysical roles must be considered for
a comprehensive approach to component localization.
Component localization algorithms originated in sequence based methods that
analyze alignments of related sequences in different specificity groups, looking for
sequence positions that are conserved within groups and different between groups
Explaining Specificity with Volumetric Representations 27

[11, 12, 45, 61, 63, 65]. Amino acids with these properties are under evolutionary
pressure to support the different binding preferences of distinct groups. Generally,
these methods require sequences that are categorized into groups with different
binding preferences, though some exceptions exist [34, 62, 70]. This evolutionary
rationale offers one explanation for the role of each amino acid in specificity.
Evolutionary information can be mapped onto protein structures for component
localization. Evolutionary Trace Annotation [2] (ETA), is a specificity assignment
algorithm that uses Paired Distance matching [83] to identify binding sites with
specificity identical to that of motifs generated with the Evolutionary Trace [61].
Since ETA motifs are generated using the Evolutionary Trace, an algorithm that
performs sequence-based component localization, the amino acids that structurally
match the motif are also implied to influence specificity in the same way as
the amino acids in the motif sequences. Recently, this approach was verified
experimentally against carboxylesterases with different binding preferences [2].

3.1 Foundations of Structure-Based Component Localization

While sequence-based approaches can reveal an evolutionary rationale for the


influence of particular structural components on specificity, an analysis of protein
structures can add biophysical rationales as well. In such cases, the components
localized are not simply correlated with specificity, as in sequence-based methods,
but actually implicated in binding mechanisms. Here we describe a purely structural
method for identifying amino acids for which mutations alter steric influences on
specificity called VASP (Volumetric Analysis of Surface Properties) [23]. VASP
uses solid representations of ligand binding sites and algorithms from constructive
solid geometry (CSG) to identify regions where patterns of steric hindrance are
different and thus implicated in different binding preferences. We explain this
method in Sect. 3.2 after introducing fundamental concepts and methods necessary
for generating and manipulating solid representations of protein structures.

3.1.1 Comparing Solid Representations with CSG

Three dimensional solids can be defined with a closed surface as a boundary for a
solid region. To represent proteins, the molecular surface (also known as the solvent
excluded surface[49]) is a useful boundary, and practical surfaces, constructed from
thousands of triangles, are provided by many existing methods, including MSMS
[75], GRASP2 [64], VADAR [85], and others [26]. Geometric solids, like spheres,
cubes, and tetrahedra, can also define useful boundaries for solid representations.
Comparisons of these solid representations are not founded on an analysis of
surface geometry, but actual comparisons of the region occupied by the solid. This
kind of comparison is made possible through CSG operations (Fig. 4), also called
boolean operations. Solid comparisons of protein structures, therefore, are direct
28 Z. Guo and B. Y. Chen

Fig. 4 A demonstration of CSG operations union, intersection, and difference, illustrating the
borders of two input (dotted) and output regions (solid)

comparisons of the patterns of steric hindrance imposed by protein structure onto


interacting ligands.
CSG operations can be computed using Marching Cubes [50]. Marching Cubes
is a computer graphics algorithm for extracting the surface of a three-dimensional
solid, and it has been used in the past for visualization [59]. Here, we paraphrase
how Marching Cubes can be used to compute CSG operations, as others have
demonstrated [42]. As input, Marching Cubes begins with two closed regions A
and B, a resolution parameter that defines the precision of the output and one
parameter that defines the nature of the CSG operation: boolean intersection, union
or difference (Fig. 4). The output generated is the boundary surface approximating
the boolean intersection, union or difference defined by a triangular mesh. When
describing sets of serial CSG operations, we will use the operators −, ∪, and ∩ to
denote difference, union, and intersection operations (Fig. 5).
Here, we paraphrase the union operation as an example CSG operation. First,
a cubic lattice is constructed to cover both the regions A and B with the input
resolution. The lattice can be described as a grid of points oriented along the three
primary axes, or as a collection of line segments connecting two adjacent co-axial
lattice points, or as a set of unit cubes formed by multiple lattice segments. Second,
it is decided whether each lattice point p is inside or outside the region of A and
B. This procedure can be achieved using a randomly orientated ray starting from
p and counting the number of intersections with A and with B: an even number
of intersections indicates that p is outside, while an odd number of intersections
indicates that p is inside. If p is inside either A or B, then p is inside the union
region. Third, a series of lattice segments are selected where one point is inside the
union region and the other is outside the union region. Since A and B are closed
regions, their union must also be closed. Therefore, on the selected segment there
must be a crossing point that intersects the boundary of A, or the boundary of B,
or both. In the end, all the crossing points are connected to form triangles that
approximate the boundary of the union region [50]. Variations on this procedure
can be used to compute intersection and difference operations.
Explaining Specificity with Volumetric Representations 29

a b c

d e f

Fig. 5 The union operation computed with Marching Cubes. (a) The input solid regions: A and
B (gray). (b) The regions in the lattice. (c) Interior points (dark gray) and exterior points (light
gray). (d) The lines connecting one interior point and one exterior point. (e) The crossing points
that intersects the boundary of the union. (f) Output triangles (solid lines) for union approximation

a b c d

Fig. 6 Computing the volume of a given closed region. (a) Input region A and the its geometric
centroid (black dot). (b) One triangle and its normal vector. (c) Tetrahedra (triangles inclosed in
solid black lines) based on triangles (thick black lines) with normals (while arrows) facing away
from the centroid. (d) Tetrahedra (triangles inclosed in solid black lines) based on triangles (thick
black lines) with normals (black arrows) facing towards the centroid. The volume of the region is
the difference between volume sum in (c) and volume sum in (d)

Computing the volume within a closed region A is also a crucial aspect of VASP,
and we summarize it here: First, the centroid of the corners of all triangles is
determined. In A, the three points of each triangle and the centroid together define
a tetrahedron. Since the triangle either faces towards or away from the centroid,
by the Surveyor’s Formula [76], we can evaluate the volume within A by adding
the volumes of all tetrahedra that face away from the centroid, and subtracting
the volumes of all tetrahedra that face towards the centroid. The volume of each
tetrahedron can be accurately computed using Tartaglia’s Rule [5] (Fig. 6).
30 Z. Guo and B. Y. Chen

3.1.2 Solid Representations of Binding Cavities

Variations in structure that cause differences in binding specificity are most likely
to occur at binding sites. To compare binding site geometry, solid representations
can be used to describe the shape of the empty region that accommodates ligands
using a series of CSG operations. As an instructive example, the following method
describes one simple way to represent a binding cavity. We begin with the whole
structure of one protein A and the ligand l that binds at its functional site. For
each atom in l, a sphere is generated with radius 5.0 Å, and the union of all the
spheres, Sl , are calculated with VASP using CSG operations. Sl defines the vicinity
of the ligand binding cavity. GRASP2 [64] can be used to compute the molecular
surface m(A) using the classic rolling probe method [71]. The molecular surface
is generated using a 1.4 Å probe. A second “envelope” surface, e(A), is generated
with 5.0 Å probe using the same algorithm. The binding cavity a is generated using
the following CSG expression:

a = (Sl − m(A)) ∩ e(A)

As an instructive example, this cavity definition can have some shortcomings. A


bound ligand may not be available, or the available ligand might only occupy part
of the binding site. In such cases, atoms of l can be substituted with waters in the
binding site, or points in space positioned in the binding site by an expert. Unlike in
point-based methods, the precise position of the point is not significant, because its
only purpose is to describe a solid sphere that occupies the binding region: Regions
outside the binding region but inside the molecular surface (e.g. Fig. 7d), or outside
the envelope surface (e.g. Fig. 7e), are eliminated. In other cases, perhaps due to
small internal voids, the final cavity might exhibit multiple disconnected regions.
These can be eliminated with a simple depth first traversal of the triangles on the
surface, identifying parts of the surface that are disconnected, and removing all but
the largest or most biologically relevant region. Solid representations are compatible
with many approaches for accurately defining ligand binding cavities.

3.2 Using CSG for Component Localization

To find components of protein structures that cause different binding preferences,


we compare proteins that perform the same function but exhibit different binding
preferences. This starting point ensures that any differences identified are not
involved in performing different functions, but rather in binding different ligands.
For example, a protrusion in one binding site that does not exist in another could
prevent certain ligands from binding. This localization process is achieved using
CSG differences, which isolate amino acids that make the one cavity different from
another, or regions in one cavity that are not inside another cavity [23]. In these
Another random document with
no related content on Scribd:
— Hän ei tulekaan, huokaili hän, saattepa nähdä, että hän ei tule.

Genuasta saapui sähkösanoma, jossa ilmoitettiin, että Rodrigo oli


tulossa Preussen-laivalla. Kolme päivää myöhemmin tuo
suurmaailmallinen lääkäri, tehtyään ensin röyhkeän alentuvaisen
vieraskäynnin virkaveljiensä Tinaharkon ja Aasinleuan luo, ilmestyi
palatsiin.

Hän oli nuorempi ja kauniimpi kuin tohtori Tinaharkko ja hänen


ilmeensä oli ylpeämpi ja ylväämpi. Kunnioituksesta luontoa kohtaan,
jota hän totteli joka asiassa, hän antoi tukkansa ja partansa vapaasti
kasvaa ja muistutti ulkonäöltään noita muinaisajan filosofeja, joita
kreikkalaiset ovat veistäneet marmoriin.

Tutkittuaan kuningasta hän sanoi:

— Sire, lääkärit, jotka puhuvat taudeista niinkuin sokeat väreistä,


sanovat että teitä vaivaa neurastenia eli hermojen heikkous. Mutta
täten todettuaan tautinne eivät he siitä huolimatta kykene sitä
parantamaan, sillä elimellistä kudosta ei voida uudistaa muilla
keinoin kuin niillä, joita luonto on käyttänyt niitä rakentaessaan. Mitkä
ovat siis luonnon keinot ja menettelytavat? Luonnolla ei ole kättä
eikä erikoista työkalua; se on hienon hieno, se on henkevä; kaikkein
valtavimpiinkin rakennelmiinsa se käyttää äärettömän pieniä
ainesosia, atoomia, protyylia. Näkymättömästä sumusta se luo
kallioita, metalleja, kasveja, eläimiä, ihmisiä. Millä tavoin?
Vetovoiman, painolain, erittyväisyyden, läpitunkevaisuuden,
liukenevaisuuden, imeytyväisyyden, hiushuokoisuuden, sisäisen
sukulaisuuden ja myötämielisyyden avulla. Yksin hiekkajyvänkin luo
se samoin kuin se on luonut Linnunradan: avaruuksien
sopusointuisuus vallitsee niin toisessa kuin toisessakin; molemmat
ovat ne olemassa ainoastaan niitä muodostavien ainesosien liikkeen
vaikutuksesta, mitkä ainesosat juuri ovat niiden soiva, rakastava ja
ainaisessa liikkeessä oleva sielu. Ei ole mitään rakenteellista eroa
taivaan tähtien ja noiden pölyhiukkasten välillä, jotka tuossa
tanssivat edessämme huoneeseen tunkeutuvassa
auringonsäteessä, ja pienin noista pölyhiukkasista on yhtä ihana
kuin Sirius, sillä kaikissa luomakunnan olioissa tapahtuu tuo sama
äärettömän pieni, muotoa antava ja elämää ylläpitävä ihme. Siten
työskentelee luonto. Tuosta näkymättömästä, huomaamattomasta ja
punnitsemattomasta se on kehittänyt koko tämän avaran, aistein
havaittavan maailman, jota meidän henkemme nyt arvioi ja
punnitsee, ja se, mistä se on meidät itsemme tehnyt, on
mitättömämpää kuin hengähdys. Toimikaamme siis samoin kuin se
tuon punnitsemattoman, näkymättömän ja huomaamattoman avulla,
käyttäen hyväksemme samaa myötämielisyyden vetovoimaa ja
hienon hienoa läpitunkevaisuutta. Siinä asian ydin. Miten sovelluttaa
sitä kyseessä olevaan tapaukseen? Miten elvyttää jälleen loppuun
kuluneet hermot, kas siinä seikka, joka meidän vielä on saatava
selville.

Ja ensinnäkin, mitä ovat hermot? Jos kysymme niiden


määritelmää keltä hyvänsä vähäpätöisimmältäkin fysiologilta, jopa
vaikka esimerkiksi joltakin Aasinleualta tai Tinaharkolta, saamme heti
vastauksen. Mitä ovat hermot? Lankoja, säikeitä, jotka lähtevät
aivoista ja selkäytimestä ja jakautuvat ruumiin kaikkiin osiin, välittäen
niihin tajunnan kiihoituksia ja pannen liikkeelle toimintaa välittäviä
jäseniä. Hermot ovat siis aistimusta ja liikuntoa. Tämä riittääkin jo
meille valaisemaan niiden sisäisen kokoonpanon, niiden
oleellisimman olemuksen: annettakoon sille sitten mikä nimi
hyvänsä, niin on se samaa, jota me aistimusten piirissä nimitämme
iloksi ja siveellisyyden piirissä onneksi. Missä vain on ilon ja onnen
atoomeja, siellä on myös hermojen uudistumisainetta. Ja kun minä
sanon ilon atoomi, tarkoitan sillä todellakin erästä aineellista oliota,
erästä määrättyä elinainetta, tuntevaa kappaletta, joka voi ilmetä
kaikissa neljässä tilassa: kiinteässä, juoksevassa, kaasumaisessa ja
säteilevässä, kappaletta, jonka atoomipainon voi määrätä. Ilo ja
suru, joiden vaikutukset ovat tuttuja ihmisille, eläimille ja kasveille
aina aikojen alusta, ovat todellisia elinaineita: ne ovat ainetta, koska
ne ovat henkeä ja koska luonto kaikissa kolmessa
ilmestymismuodossaan, liikkeenä, aineena ja älynä, on yhtä.
Tarvitsee siis vain hankkia riittävä määrä ilon atoomeja ja istuttaa ne
elimistöön ihohuokoisuuden ja hengityksen avulla. Sentähden
määrään teidät pitämään onnellisen miehen paitaa.

— Mitä! huudahti kuningas, te tahdotte, että käyttäisin onnellisen


miehen paitaa!

— Niin, aivan ihoa vasten, sire, jotta teidän kuivettunut nahkanne


imisi itseensä niitä onnen ainesosia, joita tuon onnellisen miehen
hikirauhaset ovat uhonneet eristyskanaviaan myöten hänen
onnentäydestä kudoksestaan. Sillä tiedättehän, miten iho toimii: se
hengittää sisään ja ulos, välittäen lakkaamatonta aineenvaihdosta
ympäristössään.

— Ja tuo on siis se parannuskeino, jonka minulle määräätte, herra


Rodrigo?

— Sire, sen tehokkaampaa ei voi yleensä määrätä.


Lääkeluetteloissa ei ole mitään, jota voisi vaikutuksiensa puolesta
siihen edes verrata. Tuntematta luontoa, ja kykenemättä sitä
mitenkään jäljentämään, meidän pillerinpyörittäjämme osaavat
tehtaissaan valmistaa ainoastaan pienen määrän lääkkeitä, jotka
aina ovat vaarallisia ja vain harvoin tehokkaita. Siispä ne lääkkeet,
joita me emme osaa tehdä, ovat otettavat luonnosta valmiina,
niinkuin esimerkiksi iilimadot, vuori-ilmasto, meri-ilma, kuumat
luonnonlähteet, tamman maito, villikissan nahka ja onnellisesta
miehestä uhonneet ruumiinnesteet… Ettekö siis tiedä, että raaka
peruna, jos sitä pitää taskussaan, ottaa pois reumaattisen säryn? Te
halveksitte luonnollista parannuskeinoa; teillä pitää olla keinotekoiset
ja kemialliset rohdokset; teillä pitää olla tipat ja jauheet: teillä on
mahtanut siis olla niistä paljon apua, noista tipoistanne ja
jauheistanne?

Kuningas pyyteli anteeksi ja lupasi totella.

Tohtori Rodrigo, joka jo oli ovella poislähdössä, käännähti ympäri.

— Antakaa vähän lämmittää sitä, sanoi hän, ennenkuin panette


sen päällenne.
3. LUKU.

Herrat Nelilehti ja Pyhä-Sylvanus etsivät onnellista miestä kuninkaan


palatsista.

Tahtoen mitä pikimmiten saada ylleen parannusta tuottavan


paidan, Kristoffer V kutsutti luokseen Nelilehden, ylimäisen
tallimestarinsa, ja Pyhä-Sylvanuksen, yksityisasiainsa sihteerin, sekä
antoi heille tehtäväksi hankkia se hänelle mahdollisimman lyhyessä
ajassa.

Sovittiin siitä, että he pitäisivät täydellisesti salassa etsiskelyjensä


esineen. Oli nimittäin pelättävää, että jos yleisö saisi tietää,
minkälaatuista parannuskeinoa kuningas tarvitsee, suuret joukot
onnettomia ja etenkin juuri kaikkein onnettomimmat, kaikkein eniten
kurjuuden raskauttamat, tarjoisivat paitaansa palkinnon toivossa.
Pelättiin myös, että anarkistit voisivat lähettää myrkytettyjä paitoja.

Nuo molemmat aatelisherrat luulivat voivansa saada käsiinsä


tohtori Rodrigon lääkkeen tarvitsematta mennä palatsia ulommas, ja
sentähden he asettuivat erään pienen pyörö-ikkunan ääreen, josta
saattoi nähdä ohikulkevat hovimiehet. Kaikki ne, jotka tulivat heidän
näkyviinsä, olivat pitkänaamaisia ja happamen näköisiä, heidän
vaivansa oli selvästi kirjoitettu heidän otsalleen, heitä kulutti
hivuttava kurkoitus johonkin virkaan, johonkin arvoluokkaan,
johonkin etuoikeuteen tai ritarimerkkiin. Mutta laskeuduttuaan
suureen hovihuoneustoon Nelilehti ja Pyhä-Sylvanus näkivät hra
Lehdonrauhan, joka nukkui eräässä nojatuolissa suu auki korviin
asti, sieramet levällään, posket punaisina ja loistavina kuin kaksi
aurinkoa, rinta sopusointuisessa aaltoilussa, vatsa yhtä
poljennollisena ja rauhallisena, kasvot naurussa, uhoten iloa aina
pääkopan säteilevästä kupukatosta harittavien jalkojen varpaisiin
asti, jotka mukavasti ojenteleivat keveissä puolikengissään.

Tämän nähtyään sanoi Nelilehti:

— Älkäämme enää etsikö tämän enempää. Kun hän herää,


pyydämme häneltä hänen paitansa.

Tuossa tuokiossa nukkuja heräsikin, hieroi silmiään, venytteli


jäseniään ja katseli surkean näköisenä ympärilleen. Hänen
suupielensä venähtivät alas, hänen poskensa painuivat lyttyyn;
hänen silmäluomensa jäivät surullisesti riipuksiin niinkuin vaaterievut
köyhien ikkunoissa; hänen rinnastaan tunkeutui valittava huokaus, ja
koko hänen olentonsa ilmaisi ikävystymistä, kaihoa ja pettymystä.

Huomattuaan yksityisasiaan sihteerin ja ensimäisen tallimestarin,


hän sanoi:

— Ah, hyvät herrat, näin juuri niin kaunista unta. Uneksuin, että
kuningas korotti Lehdonrauhan maat markiisikunnaksi. Voi, se oli
ainoastaan unelma ja minä tiedän liiankin hyvin, että kuninkaan
tarkoitusperät ovat aivan päinvastaiset.
— Mennään eteenpäin, sanoi Pyhä-Sylvanus. On myöhä,
emmekä saa hukata aikaa.

He kohtasivat käytävässä erään valtakunnan päärin, joka


hämmästytti maailmaa luonteensa lujuudella ja henkensä syvyydellä.
Yksinpä hänen vihollisensakin myönsivät hänen epäitsekkyytensä,
suoruutensa ja rohkeutensa. Tiedettiin, että hän paraillaan kirjoitti
muistelmiaan ja jokainen mielisteli häntä, päästäkseen siellä
kunniakkaalle paikalle jälkimaailman silmissä.

— Hän on ehkä onnellinen, sanoi Pyhä-Sylvanus.

— Kysykäämme sitä häneltä, sanoi Nelilehti.

He lähestyivät häntä, vaihtoivat hänen kanssaan muutamia sanoja


ja sitten johtaen keskustelun onneen, tekivät tuon kysymyksen, joka
oli heidän sydämellään.

— Rikkaus ja kunnia eivät liikuta minua, vastasi hän, — ja minun


sydämeni on vapaa yksinpä kaikkein laillisimmista ja
luonnollisimmista hellyyssuhteista, kuten perhesiteistä ja ystävyyden
iloista. En tunne kiintymystä muuhun kuin yhteishyvän asioihin ja se
on kaikkein onnettomin intohimo, kaikkein hankalin rakkaus.

Olen ollut vallassa; silloin kieltäydyin ylläpitämästä valtion varoilla


ja sotamiestemme verellä niitä retkikuntia, joita merirosvot ja
kauppiaat olivat järjestäneet rikastuttaakseen itseään ja
hävittääkseen valtiota; en luovuttanut laivastoa enkä sotajoukkoa
hankitsijoiden saaliiksi, ja silloin jouduin kaikkien näiden veijarien
vihoihin, jotka panettelivat ja soimasivat minua tyhmän väkijoukon
säestäminä siitä, että olin pettänyt isänmaani kunnian pyhimmät
edut. Näitä suurryöväreitä vastaan ei kukaan minua puolustanut.
Nähtyäni kuinka typerä ja halpamainen kansan tuntemustapa on,
olen ruvennut kaipaamaan itsevaltiutta. Kuninkaan heikkous tekee
minut vallan epätoivoiseksi; suurien pienuus on minulle tuiki
tuskallinen näky; ministerien kyvyttömyys, vilpillisyys,
kansanedustajien tietämättömyys, alhaisuus ja itsensä
kauppaaminen saattavat minut vuoroin tylsän välinpitämättömyyden,
vuoroin raivon valtaan. Lievittääkseni jollakin niitä tuskia, joissa
päivisin kidun, kirjoitan öisin ja puran siten esille sen sapen, jota
alituisesti saan niellä.

Nelilehti ja Pyhä-Sylvanus heilauttivat hattuaan jalolle paarille ja


lähtivät eteenpäin. Ehdittyään muutaman askeleen käytävässä
joutuivat he vastakkain muutaman aivan pienen miehen kanssa, joka
nähtävästi oli kyttyräselkäinen, sillä hänen selkänsä kohosi esiin
pään yläpuolelta, kun hän keikailevalla ja mielistelevällä tavalla
keinutteli ruumistaan.

— On turhaa, sanoi Nelilehti, kääntyä tuon puoleen.

— Kukapa tietää, äännähti Pyhä-Sylvanus.

— Uskokaa minua: minä tunnen hänet, jatkoi tallimestari, — minä


olen hänen uskottunsa. Hän on tyytyväinen itseensä ja joka
suhteessa mieltynyt omaan olemukseensa, ja hänellä on siihen
syynsä. Tämä pieni kyttyräselkä on naisten lemmikki. Hovinaiset ja
kaupungin naiset, näyttelijättäret, porvarisnaiset, ilotytöt, niin
keikailevat ja tekopyhät kuin uskovaisetkin, ylpeimmätkin ja
kauneimmatkin, kaikki ovat hänen jalkojensa juuressa. Heitä
tyydyttääkseen hän hukkaa terveytensä ja elämänsä, hän on käynyt
synkkämieliseksi, onnenkantajan virka alkaa käydä hänelle
raskaaksi.
Aurinko oli jo mailleen painumassa ja kuultuaan, ettei kuningas
tänään ollenkaan näyttäytyisi, lähtivät viimeisetkin hovimiehet pois,
jättäen hovihuoneustot autioiksi.

— Antaisin kyllä mielelläni oman paitani, sanoi Nelilehti; voinpa


vakuuttaa, että minulla on erittäin onnellinen luonnonlaatu. Minä olen
aina tyytyväinen; minä syön ja juon hyvin. Minua kehutaan
kukoistavan näköiseksi; minun kasvojani pidetään miellyttävinä. Eikä
minulla olekaan kasvojeni suhteen mitään valittamista. Mutta
virtsarakkoani sensijaan kuumottaa ja painaa niin että se turmelee
minulta elämänilon. Tänä aamuna päästin ilmoille kiven, joka oli
suuri kuin kyyhkysen muna. Pelkään, että minun paidastani ei olisi
mitään hyötyä kuninkaalle.

— Antaisin minäkin omani, sanoi Pyhä-Sylvanus. Mutta on sitä


kiveni minullakin, ja se kivi on vaimoni. Olen naimisissa rumimman ja
ilkeimmän luontokappaleen kanssa, mitä koskaan on maailmassa
ollut, ja vaikkakin tiedän, että tulevaisuus on Jumalan kädessä,
lisään vahvalla vakaumuksella: ilkeimmän ja rumimman kanssa, mitä
koskaan tulee maan pinnalla olemaan, sillä sellaisen alkukuvan
toistuminen on niin epätodennäköistä, että voi yksinkertaisesti pitää
sitä mahdottomana…

Sitten luopuen tästä kiusallisesta puheenaineesta hän jatkoi:

— Nelilehti, ystäväni, olemme erehtyneet suunnasta. Onnellista ei


tule etsiä hovista eikä tämän maailman mahtavien parista.

— Te puhutte kuin filosofi, tokaisi siihen Nelilehti; te haastelette


aivan kuin tuo kerjäläinen Jean-Jacques [tarkoittaa Jean-Jacques
Rousseauta]. Te teette siinä itsellenne vääryyttä. Onnellisia ja onnen
arvoisia henkilöitä on kyllä yhtä paljon kuninkaan palatsissa ja
ylhäisön salongeissa kuin kirjailijakahviloissa ja työväen kapakoissa.
Jos emme olekaan tänään sellaista löytäneet näiden seinien sisältä,
niin johtuu se siitä, että on myöhäinen hetki ja että meillä ei sattunut
olemaan suotuista onnea. Menkäämme tänä iltana kuningattaren
pelihuoneeseen ja me onnistumme varmasti paremmin.

— Etsiä onnellista ihmistä pelipöydän ympäriltä! huudahti Pyhä-


Sylvanus, se on yhtä turhaa kuin etsiä helminauhaa naurismaasta tai
totuutta valtiomiehen huulilta!… Mutta Espanjan lähettiläs pitää tänä
yönä juhlat, koko kaupunki on siellä. Menkäämme sinne ja me
saamme helposti käsiimme hyvän ja sopivan paidan.

— Minulle on kyllä sattunut joskus, sanoi Nelilehti, että olen


pidellyt käsissäni jonkun onnellisen naisen paitaa. Se oli aina erittäin
miellyttävää. Mutta meidän onnemme kesti vain lyhyen hetken. Jos
puhun teille tästä, niin ei se tapahdu siksi, että tahtoisin sillä kerskua
(se ei todellakaan kannata) tai että tahtoisin muistutella mielessäni
menneitä iloja, jotka ovat milloin hyvänsä takaisin saatavissa, sillä
aivan päinvastoin kuin sananlaskussa sanotaan: jokaisella
ikäkaudella on sama ilo. Ei, tarkoitukseni on aivan toinen, se on
vakavampi ja hyveellisempi ja suoranaisessa yhteydessä sen ylevän
tehtävän kanssa, joka meille molemmille on annettu toimeksi: tahdon
esittää teille erään ajatuksen, joka juuri äsken syntyi aivoissani.
Ettekö usko, Pyhä-Sylvanus, että määrätessään lääkkeeksi
onnellisen miehen paidan, tohtori Rodrigo käytti "mies" sanaa
ylimalkaisessa merkityksessä, tarkoittaen sillä sukupuoleen
katsomatta koko ihmissukua yleensä, ja siis naisen paitaa yhtä hyvin
kuin miehenkin? Minä puolestani olisin taipuvainen näin
ajattelemaan, ja jos teidänkin käsityksenne kävisi samaan suuntaan,
voisimme laajentaa tutkimuspiiriämme ja lisätä suotuisia edellytyksiä
enemmän kuin toisella mokomalla, sillä hienossa ja sivistyneessä
yhteiskunnassa, kuten meidän, naiset ovat onnellisempia kuin
miehet: me palvelemme enemmän heitä kuin he meitä. Kuulkaahan,
Pyhä-Sylvanus, kun työalamme täten tavallaan laajentuu, niin
voisimme kenties jakaa sen keskenämme. Siten esimerkiksi, että
tästä illasta huomen-aamuun asti minä etsin onnellista naista
sillaikaa kun te etsitte onnellista miestä. Myöntäkää, ystäväiseni, että
naisen paita on laatuansa verraton. Kerrankin vanhaan aikaan
tunnustelin erästä, joka oli niin ohut, että sen saattoi vetää
sormuksen läpi: sen kangas oli hienompaa kuin hämähäkin verkko.
Ja mitä sanotte, ystäväni, siitä paidasta, jota eräs nainen Ranskan
hovissa Marie Antoinetten aikana piti tukkaröyhelönä tanssiaisissa?
Luulenpa, että olisimme sangen tervetulleita, jos voisimme herralle
kuninkaallemme ojentaa tuollaisen kauniin, hienon hienosta
aivinasta valmistetun paidan, välipitseineen, valencienne-
reunuksineen, ruusunvärisestä nauhasta solmittuine
olkaruusukkeineen, paidan, joka olisi keveämpi kuin hengähdys ja
joka tuoksuisi irikselle ja rakkaudelle.

Mutta Pyhä-Sylvanus vastusti kaikin voimin tällaista


lääkärinmääräyksen tulkintaa.

— Ettekö ole tullut ajatelleeksi, Nelilehti, huudahti hän, että naisen


paita tuottaisi kuninkaallemme vain naisen onnea, joka koituisi
hänelle kurjuudeksi ja häpeäksi! Enpä rupea tässä tutkistelemaan,
onko naisen mahdollisesti helpompi tulla onnelliseksi kuin miehen.
Tässä ei ole sen asian aika eikä paikka; parasta on, että lähdemme
päivällistä syömään. Fysiologit väittävät, että naisilla on herkempi
tunteellisuus kuin meillä, mutta nuo puheet ovat sellaisia
perusteettomia ylimalkaisuuksia, jotka menevät päitten yli eivätkä
vakuuta ketään. En myöskään tiedä, onko, kuten te näytte luulevan,
meidän sivistynyt yhteiskuntamme enemmän omiaan tekemään
onnellisiksi naiset kuin miehet. Sen vain olen huomannut, että tässä
meidän seurapiirissämme he eivät kasvata lapsiaan, eivät hoida
talouttaan, eivät tiedä mitään, eivät tee mitään ja sentään ovat kuolla
väsymyksestä; he kuluttavat voimansa loistamiseen, se on kynttilän
kohtalo: en juuri tiedä, onko se kadehdittava Mutta kysymys ei ole
siitä. Kenties vielä kerran tulee sellainenkin aika, jolloin ei ole enää
muuta kuin yksi sukupuoli, tai kenties niitä tulee olemaan kolme tai
enemmänkin. Siinä tapauksessa tulee myöskin sukupuolimoraali
olemaan rikkaampi, vaihtelevampi ja kaikin puolin
runsassisältöisempi. Mutta siihen mennessä saamme tyytyä kahteen
sukupuoleen; kummassakin on paljon samaa, naisessa on paljon
miestä ja miehessä paljon naista. Siitä huolimatta he ovat myös
erilaisia; kummallakin on oma luonteensa, omat tapansa ja lakinsa,
omat ilonsa ja vaivansa. Jos te naisellistutatte kuninkaan käsityksen
onnesta, niin millä jäätävällä katseella hän vastedes tarkasteleekaan
rouva Kananheimoa… Ja kukapa tietää, synkkämielisyyden ja
pehmeiden elämäntapojen orjaksi vajonnut kun on, mitä hän vielä
tekee, — kentiespä hän panee alttiiksi koko mainehikkaan
isänmaamme kunnian. Siihenkö te siis pyritte, Nelilehti?

Luokaa pieni silmäys kuninkaallisen palatsin käytävän


seinäverhoihin, joissa on kuvattuna Herkuleen tarina; siitä näette,
mitä tapahtui tälle sankarille, joka oli siitä merkillinen, että hänellä oli
aivan erikoisen huono onni paita-asioissa; hän pani leikillään
Omfaleen paidan ylleen eikä senjälkeen osannut enää muuta kuin
kehrätä villaa. Sellaisen kohtalon tahdotte siis
varomattomuudellanne valmistaa myös mainehikkaalle
hallitsijallemme.

— Oi-oi oi! valitteli ensimäinen tallimestari, otan takaisin sanani,


älkäämme puhuko siitä asiasta enää mitään.
4. LUKU.

Hieronymus.

Espanjan lähetystö säteili yön pimeydessä. Valojensa


heijastuksella se kultasi pilvenhattaratkin. Lyhtyseppeleet, jotka
reunustivat puiston käytäviä, loivat läheisiin lehvistöihin smaragdin
loistoa ja kuultavuutta. Bengalitulet punasivat taivaan suurien
mustien puiden yläpuolella. Näkymätön orkesteri kaiutteli
hekumallisia säveliä vienon virin viedä. Kutsuvieraiden komeat
saattueet peittivät nurmimaton, hännystakit heilahtelivat hämärässä;
sotilaspukujen nauhat ja ristit välkähtelivät; vaaleat naishahmot
liukuivat sirosti ja suloisesti ruoholla, jättäen jälkeensä kukin oman
tuoksunsa.

Nelilehti huomasi lähellään kaksi kuuluisaa valtiomiestä,


neuvoston presidentin ja hänen edeltäjänsä, jotka keskustelivat
yhdessä Onnen kuvapatsaan luona, ja hän oli juuri aikeessa yhtyä
heidän seuraansa. Mutta Pyhä-Sylvanus esti häntä siitä.

— He ovat molemmat onnettomia, kuiskasi hän; toista kirvelee


vieläkin se, että hän on menettänyt valta-asemansa, ja toinen pelkää
joka hetki kadottavansa sen. Tämä heidän kunnianhimonsa on
sitäkin surkuteltavampaa, kun he kumpikin ovat paljon vapaampia ja
vaikutusvaltaisempia yksityisasemassa kuin valtion johdossa, jossa
he pysyvät ainoastaan niin kauan kuin he nöyrästi ja häpeällisesti
alistuvat kaikkiin valtiopäivien oikkuihin ja kansan sokeihin
intohimoihin sekä pitävät silmällä rahamiesten etuja. Tätä
mahtipontista alennustilaansa he siis niin kiihkeästi tavoittelevat. Ah,
Nelilehti, parempi on jäädä omiin oloihinsa hevostensa ja koiriensa
pariin kuin pyrkiä hallitsemaan ihmisiä!

He lähtivät eteenpäin. Tuskin olivat he ottaneet kahta askelta, kun


heidän korviinsa kajahtivat iloiset naurunremahdukset läheisestä
lehtimajasta. He astuivat sinne ja näkivät siellä pensaspyökin
varjossa lihavan, riettaan näköisen miehen, joka istuen yht’aikaa
neljällä tuolilla kuuman-imelällä äänellä kertoili tarinoita lukuisalle
kuulijakunnalle. Tämän katseet olivat kiinni hänen antiikkista satyyria
muistuttavissa huulissaan ja yli-inhimillisesti pullistuneissa
kasvoissaan, jotka olivat ikäänkuin jonkunlaisen päihtymyksen sakan
peitossa. Se oli valtakunnan kuuluisin mies ja ainoa todellinen
kansan suosikki, Hieronymus. Hän puhui paljon hupaisesti,
runsassanaisesi, heitteli ajatuksia ilmaan, punoi kokoon juttuja, joista
toiset olivat erinomaisia, toiset vähemmän hyviä, mutta kaikki
nauruhermoja kutkuttavia. Hän kertoi muun muassa, että Ateenassa
eräänä päivänä tehtiin vallankumous, jolloin kaikki omaisuus jaettiin
ja naiset tehtiin yhteisiksi; mutta pianpa rumat ja vanhat alkoivat
valittaa, että heitä laiminlyötiin, ja silloin säädettiin heidän hyväkseen
sellainen laki, että kaikkien miesten oli käytävä ensin vanhojen ja
rumien luona ennenkuin saivat tulla nuorien ja kauniiden luo. Ja hän
kuvaili leveällä ilomielisyydellä hullunkurisia rakkaussuhteita, näitä
törkeitä syleilyjä ja noiden epätoivon rohkeudella varustettujen
nuorukaisten kauhistusta, nähdessään tihrusilmäiset ja
vuotavanenäiset rakastajattarensa, joilla nenä ja leuka koukistuivat
kuin pähkinäpihdeiksi. Sitten hän kertoili paksuja ja höystettyjä
kaskuja Saksan juutalaisista, kirkkoherroista ja talonpojista, lorun
toisensa jälkeen, lystikkäitä sutkauksia ja hassutuksia.

Hieronymus oli pelkkää suunnatonta puhujakoneistoa koko


ihminen. Kun hän puhui, niin koko hänen olentonsa päästä
kantapäähän asti puhui, eikä koskaan vielä ole kukaan puhuja niin
täydellisesti hallinnut sanojen leikkiä kuin hän. Hän oli vuorotellen
vakava, leikillinen, ylevä, hullunkurinen, käyttäen hyväkseen kaikkia
kaunopuheisuuden muotoja, ja tämä sama mies, joka
pensaspyökkien varjossa itsensä ja joutilaan kuulijakuntansa iloksi
täysverisenä huvinäyttelijänä esitti kaikenlaista hupaista koirankuria,
oli juuri edellisenä päivänä eduskunnassa mahtavan äänensä
voimalla loihtinut esiin vihan ja ihastuksen myrskyn, pannut ministerit
vapisemaan ja kansan johtomiehet tärisemään, ja hänen puheensa
kaiku kierteli ja kuohutti mieliä ympäri maata. Ollen taitava
hyökkäyksissään ja Iaskevainen korkeimman innostuksensa
lennossakin oli hän päässyt vastustuspuolueen johtajaksi, silti
joutumatta pahoihin väleihin hallituksen kanssa, ja vaikka hänen
varsinainen työmaansa olikin kansan keskuudessa, seurusteli hän
samalla myös ylhäisön piirissä. Hänestä sanottiin, että hän oli oikea
aikansa mies. Hän oli hetken mies: hänen henkensä sopeutui ajan ja
paikan mukaan. Hänen ajatuksensa toimi aina hetken vaatimusten
mukaisesti; hänen laaja ja ylimalkainen älynsä vastasi täydelleen
kansalaisten ylimalkaista yhteistöä; hänen valtava
keskinkertaisuutensa pimitti pois kaikki ympäröivät pienuudet ja
suuruudet: ei nähty muuta kuin häntä. Jo yksin hänen terveytensä
olisi voinut taata hänen onnensa; se oli lujaa ja jykevää tekoa
niinkuin hänen henkensäkin. Ollen suuri juomari, suuri paistetun ja
raa'an lihan ihastelija, hän piti itseään alati riemullisen yltäkylläiden
tilassa ja rohmaisi itselleen parhaan palan tämän maailman ilosta.
Kuunnellessaan hänen ihmeellisiä tarinoitaan Nelilehti ja Pyhä-
Sylvanus nauroivat niinkuin muutkin, ja tuuppien toistaan
olkapäällään he salamyhkää vilkuivat hänen paitaansa, jonka
Hieronymus auliisti oli valellut viineillä ja kastikkeilla ilomielisen
ateriansa aikana.

Ylpeän kansan lähettiläs, joka kaupusteli kuningas Kristofferille


itsekästä ystävyyttään, sattui juuri silloin kulkemaan ohi
nurmikentällä. Hän lähestyi tätä suurta miestä ja teki hänelle keveän
kumarruksen. Heti Hieronymus muuttui; ylhäinen ja lempeä
vakavuus, juhlallinen tyyneys levisi hänen kasvoilleen, ja hänen
äänensä hilliintyvät soinnut alkoivat heti puhetaidon jaloimmilla
hyväilyillä imarrella lähettilään korvia. Koko hänen asenteensa
ilmaisi syvää ulkoasiain ymmärrystä, kokousten ja neuvottelujen
henkeä: kaikki hänessä, yksin hänen sidottu kaulustimensa, hänen
pullistuva paitansa ja elefantinlaajat housunsakin saivat yhtaikaa
kuin ihmeen kautta saman valtioviisaan, arvokkaan ja
korkealähetystöllisen ulkomuodon.

Muut kutsuvieraat loittonivat, ja nuo molemmat mainehikkaat


henkilöt puhelivat kauan yhdessä ystävällisellä äänenpainolla ja
näyttivät olevan erittäin hyvissä väleissä keskenään, joka seikka heti
huomattiin ja pantiin erittäin merkille valtioasioita harrastavien
miesten ja "arvoura"-naisten piirissä.

— Hieronymus, — sanoi eräs, — pääsee ulkoasiain ministeriksi,


milloin vain tahtoo.

— Kun hän tulee siksi, — sanoi toinen, — pistää hän kuninkaan


taskuunsa.

Itävallan lähettilään rouva tarkasteli häntä lornettinsa läpi ja sanoi:


— Tuo poika on älykäs, hän tulee maailmassa menestymään.

Kun keskustelu oli päättynyt, lähti Hieronymus pienelle kävelylle


puutarhaan uskollisen Visulininsa kanssa, joka muistutti pöllönpäällä
varustettua kahlaajalintua; tämä ei koskaan luopunut hänen
viereltään.

Yksityisasiain sihteeri ja ylimäinen tallimestari seurasivat häntä.

— Hänen paitansa meidän juuri pitää saada, sanoi Nelilehti aivan


hiljaa. Mutta mahtaako hän antaa sitä? Hän on sosialisti ja taistelee
kuninkaan valtaa vastaan.

— Siitä ei ole pelkoa! Hän ei ole mikään paha ihminen, vastasi


Pyhä-Sylvanus, ja sitäpaitsi hänellä on pää paikallaan. Hän ei saata
toivoa muutosta, koska hänen paikkansa kerta kaikkiaan on
vastustuspuolueessa. Hän on vapaa vastuunalaisuudesta, hänen
asemansa on erinomainen: hän ei tietysti tahdo sitä kadottaa. Hyvä
vastustuspuoluelainen on aina yhteiskuntaa säilyttävä voima. Ellen
aivan erehdy, niin tuo rahvaan villitsijä olisi hyvin pahoillaan, jos hän
jollakin tavoin tulisi vahingoittaneeksi kuningasta. Jos vain taitavasti
osaamme hieroa kauppaa, saamme kyllä paidan. Hän on mielellään
suhteissa hoviin niinkuin Mirabeau. Hänet täytyy vain saada
vakuutetuksi siitä, että asia pysyy salassa.

Heidän näin puhuessaan Hieronymus käveli heidän edessään


hattu korvalla pyöritellen keppiään ilmassa ja päästellen lakkaamatta
ilomielisen luonnonlaatunsa purkaukseksi pilapuheita, leikinlaskuja,
naurahduksia, huudahduksia, uskallettuja sanaleikkejä, rivoja
tähtäilyjä, sikamaisuuksia ja laulunpätkiä. Silloin noin
parinkymmenen askeleen päässä hänestä Aulnesin herttua, maun ja
tapojen tunnustettu tuomari ja nuorison jumala, näkyi kohtaavan
erään tuttavansa naisen; tätä hän tervehti hyvin yksinkertaisesti
pienellä kuivalla eleellä, joka silti oli täynnä hurmaavaa suloa.
Kansanjohtaja katseli häntä tarkkaavaisesti; sitten käyden äkkiä
synkäksi ja miettiväiseksi hän laski raskaan kätensä kahlaajalinnun
olkapäälle:

— Visulin, — sanoi hän tälle, — minä antaisin mielelläni pois koko


kansansuosion ja kymmenen vuotta elämästäni, voidakseni kantaa
hännystakkia ja puhutella naisia samoin kuin tuo tolvana tuolla.

Hän oli kadottanut iloisuutensa. Hän kulki nyt kolkon vaiteliaana,


pää painuksissa ja tuijotteli surumielisesti omaan varjoonsa, jonka
kuu ivallisesti heitti hänen jalkoihinsa niinkuin muodottoman pullean
pallonuken.

— Mitä hän sanoi?… Laskeeko hän leikkiä? kysyi Nelilehti


levottomana.

— Hän ei ole koskaan ollut sen vilpittömämpi eikä vakavampi,


vastasi Pyhä-Sylvanus. Hän on paljastanut meille sen sisäisen
madon, joka häntä kalvaa. Hieronymus murehtii lohduttomasti sitä,
että häneltä puuttuu ylhäisyyttä ja hienoutta. Hän ei ole onnellinen.
Minä en antaisi neljää äyriä hänen paidastaan.

Aika kului, ja etsiskely rupesi näyttämään työläältä. Yksityisasiain


sihteeri ja ylimäinen tallimestari päättivät jatkaa tutkimuksiaan kukin
omalla tahollaan ja sopivat siitä, että he illallisten aikana kohtaisivat
toisensa pienessä keltaisessa salongissa, tehdäkseen
molemminpuolisesti toisilleen selvää tutkimustensa tuloksista.

Nelilehti tutkisteli etupäässä sotilashenkilöltä, korkeita herroja ja


suuria tilanhaltijoita eikä tällöin myöskään laiminlyönyt tiedustella
heidän tilaansa naisilta. Pyhä-Sylvanus, joka oli teräväkatseisempi,
koetti lukea rahamiesten silmistä ja tunnustella valtiomiesten
munaskuita.

He kohtasivat toisensa määrätyllä hetkellä, molemmat väsyneinä


ja naama pitkällään.

— Minä en ole nähnyt muita kuin onnellisia, sanoi Nelilehti, mutta


kaikkien onni on ollut jollakin tavoin pilalla. Sotilashenkilöt kuihtuvat
jonkun ristin, arvonimen tai lahjatulojen toivossa. Heidän kilpailijansa
saavuttamat edut ja kunniamerkit tyrehdyttävät heidän verensä.
Kuultuaan uutisen, että kenraali Kalske oli nimitetty Comoresin
herttuaksi, muuttuivat he keltaisiksi kuin kokospähkinät ja
viheriäisiksi kuin sisiliskot. Eräs heistä tuli tummanpunaiseksi: hän
sai halvauskohtauksen. Aatelisherramme kituvat yhtaikaa sekä
ikävästä että niistä rettelöistä, joita heillä on maatiloillaan; aina
käräjiä naapureitten kanssa, aina lakimiehet niskassa, he alituisen
huolen alla laahaavat jälessään raskaan joutilaisuutensa taakkaa.

— En minäkään ole sen paremmin onnistunut, sanoi Pyhä-


Sylvanus. Ja minua kummastuttaa ennen kaikkea se, että ihmisten
kärsimysten syyt ja aiheet saattavat olla aivan vastakkaisia. Estellen
prinssi esimerkiksi on onneton sentähden, että hänen vaimonsa
pettää häntä, mutta ei siksi, että hän rakastaisi vaimoaan, vaan siksi,
että hänen itserakkautensa kärsii siitä. Malvan ruhtinas taas on
onneton senvuoksi, että hänen vaimonsa ei petä häntä ja
menettelyllään pidättää häneltä mahdollisuudet parantaa rappiolle
menneen talonsa asiat. Toista rasittavat lapset; toinen on taaskin
epätoivoissaan siitä, ettei hänellä ole niitä. Olen tavannut porvareita,
jotka eivät toivo muuta kuin päästä maalle asumaan, ja maalaisia,
joiden ainoa unelma on tulla kaupunkilaiseksi. Minulla oli myös
kunnia päästä kahden kunnon miehen uskotuksi: toinen oli aivan
murtunut siitä, että oli kaksintaistelussa tappanut miehen, joka oli
vienyt hänen rakastajattarensa; ja toinen suri, ettei ollut saanut
kilpailijaansa hengiltä pois.

— Enpä olisi koskaan uskonut, huokasi Nelilehti, että olisi niin


vaikeaa löytää onnellista miestä.

— Kentiespä me myös toimiskelemme taitamattomasti, huomautti


Pyhä-Sylvanus; me etsimme sattuman varassa ilman määräperäistä
menetelmää; emme edes tiedä varmaan, mitä oikein etsimme.
Emmehän ole ollenkaan määritelleet onnea. Se meidän täytyy tehdä.

— Se olisi hukkaan heitettyä aikaa, vastasi Nelilehti.

— Anteeksi, minä pyydän saada pysyä väitteessäni, intti Pyhä-


Sylvanus. Kun olemme kerran saaneet määritellyksi onnen, s.o. kun
olemme sen rajoittaneet, täsmällistyttäneet ja kiinnittäneet johonkin
määrättyyn aikaan ja paikkaan, niin on meillä suuremmat
mahdollisuudet löytää se.

— Sitä en usko, — sanoi Nelilehti.

Kuitenkin he sopivat siitä, että kysyisivät tässä asiassa neuvoa


valtakunnan viisaimmalta mieheltä, kuninkaallisen kirjaston johtajalta
hra Tuliaivolta.

Aurinko oli jo noussut, kun he palasivat palatsiin. Kristoffer V oli


viettänyt unettoman yön ja vaati kärsimättömästi lääkepaitaa. He
pyytelivät anteeksi myöhästymistään, ja kiipesivät samaa tietä
kolmanteen kerrokseen; siellä hra Tuliaivo otti heidät vastaan

You might also like