2016 J Cheminform 8 14
2016 J Cheminform 8 14
2016 J Cheminform 8 14
Open Access
SOFTWARE
Abstract
Background: Due to exorbitant costs of high-throughput screening, many drug discovery projects commonly
employ inexpensive virtual screening to support experimental efforts. However, the vast majority of compounds in
widely used screening libraries, such as the ZINC database, will have a very low probability to exhibit the desired bioactivity for a given protein. Although combinatorial chemistry methods can be used to augment existing compound
libraries with novel drug-like compounds, the broad chemical space is often too large to be explored. Consequently,
the trend in library design has shifted to produce screening collections specifically tailored to modulate the function
of a particular target or a protein family.
Methods: Assuming that organic compounds are composed of sets of rigid fragments connected by flexible linkers, a molecule can be decomposed into its building blocks tracking their atomic connectivity. On this account, we
developed eSynth, an exhaustive graph-based search algorithm to computationally synthesize new compounds by
reconnecting these building blocks following their connectivity patterns.
Results: We conducted a series of benchmarking calculations against the Directory of Useful Decoys, Enhanced
database. First, in a self-benchmarking test, the correctness of the algorithm is validated with the objective to recover
a molecule from its building blocks. Encouragingly, eSynth can efficiently rebuild more than 80% of active molecules
from their fragment components. Next, the capability to discover novel scaffolds is assessed in a cross-benchmarking
test, where eSynth successfully reconstructed 40% of the target molecules using fragments extracted from chemically distinct compounds. Despite an enormous chemical space to be explored, eSynth is computationally efficient;
half of the molecules are rebuilt in less than a second, whereas 90% take only about a minute to be generated.
Conclusions: eSynth can successfully reconstruct chemically feasible molecules from molecular fragments. Furthermore, in a procedure mimicking the real application, where one expects to discover novel compounds based on a
small set of already developed bioactives, eSynth is capable of generating diverse collections of molecules with the
desired activity profiles. Thus, we are very optimistic that our effort will contribute to targeted drug discovery. eSynth
is freely available to the academic community at www.brylinski.org/content/molecular-synthesis.
Keywords: Molecular synthesis, Virtual screening, Target-focused libraries, eSynth, Chemical space, Targeted drug
discovery
Background
Due to extreme costs of high-throughput screening,
many drug discovery projects commonly employ inexpensive computations to support experimental efforts. In
*Correspondence: michal@brylinski.org
2016 Naderi etal. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium,
provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license,
and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/
publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Page 2 of 16
compounds and 41 kinases from five different families, demonstrating a 6.7-fold higher overall hit enrichment compared to a generic compound collection [14].
Furthermore, a structure-based modeling was used to
create a small, focused library against Chlamydophila
pneumoniae, a common pathogen recently linked to atherosclerosis and the risk of myocardial infarction [15].
Encouragingly, the experimentally determined hit rate
for the targeted library was 24.2%, which is considerably
higher than that expected for a generic library. Similar
to structure-based approaches, ligand-based techniques
can also be used in the focused library design, as shown
for the GPCR family [10]. Compared to large and diverse
screening libraries, using relatively small and targeted
collections significantly improves the odds of finding
potential drug candidates, thus further reduces the costs
of drug discovery.
Target-focused libraries are either designed or assembled upon some understanding of a specific protein
target or a protein family. These collections are often
compiled from larger, more diverse libraries using either
molecular docking (structure-based approach) or ligand
fingerprint similarity (ligand-based approach). The former employs structural, sequence and mutagenesis data,
whereas the latter is based on the biomolecular properties derived from known ligands, offering a useful way of
scaffold hopping from one ligand class to another [16].
Target-focused libraries are often constructed around a
single scaffold with one or more positions used to attach
various chemical moieties or side chains. Although this
approach can result in millions of different compounds
[17], the chemical space remains largely unexplored,
therefore, truly novel compounds will not be discovered.
On the other hand, combinatorial chemistry methods
can produce a vast collection of divers compounds, so
vast that only a tiny fraction of them could be explored,
even using supercomputers. One can hardly imagine
screening the chemical universe containing from 1012 to
10180 drug-like compounds [18]. Therefore, techniques
to design chemical libraries covering pharmacologically
relevant regions are needed [19]. These methods hold
a promise to advance our knowledge of biological processes leading to new strategies to treat diseases.
Compiling focused libraries by molecular synthesis is
essentially a combinatorial problem that can be addressed
using graph theory. These techniques have been already
extensively used in computer science and artificial intelligence for the synthesis of plans [20], problems and solutions in geometry [21], hardware from specifications
[22], and communication protocols [23]. Graph-based
approaches also have a wide range of applications in drug
discovery including the analysis of chemical structures to
better understand the common features of drug molecules
Page 3 of 16
a small set of already developed bioactives. Equally important, eSynth allows adding active subunits to an existing
compound in order to generate a large library of prototypes of the modified ligand. Such libraries can be examined by molecular docking to explore those modifications
yielding the highest binding affinity to the protein target.
Implementation
Molecular fragments
We developed a procedure for the automatic identification and extraction of molecular fragments from chemical compounds. An example decomposition procedure
is shown in Fig. 1. The extraction process utilizes the
PDBQT file format containing a central rigid fragment,
labeled as ROOT, from which zero or more rotatable
bonds protrude. The sets of atoms connected through
rotatable bonds are organized as BRANCHes, and at the
beginning and end of each BRANCH section, the serial
numbers of the two atoms forming a rotatable bond are
recorded. First, we identify all rigid moieties (Fig. 1b),
where a rigid fragment is defined as a set of at least four
non-hydrogen atoms connected by non-rotatable bonds
(Fig. 1c). The remaining parts are extracted as flexible
linkers (Fig. 1d). If two linker fragments are attached to
each other, these will be connected to form a longer linker
(Fig.1e). Failing to construct longer linkers from shorter
fragments would limit the library to contain only very
short linkers. Furthermore, we track the connectivity
between individual fragments, so that chemically feasible
Fig.1 Decomposing organic compounds into molecular fragments. Assuming that organic compounds are composed of sets of rigid fragments
connected by flexible linkers, a molecule can be decomposed into its building blocks tracking the atomic connectivity. a A stick representation of
afatinib. Extracting rigid fragments: b all rigid fragments are shown as thick lines, c only those rigid fragments composed of four or more atoms are
retained. Extracting flexible linkers: d small fragments connected by rotatable bonds, e small linkers are merged to form longer fragments, a single
atom can act as a linker as well. The following colors are used for atom types: carbongreen, nitrogenblue, oxygenred, fluorineyellow, and
chlorinepink
compounds can be synthesized using a graph-based algorithm. Every fragment is stored in the Structure Data Format (SDF) containing the 3D coordinates of all atoms and
the corresponding atomic types as well as the connectivity
information. The following SYBYL chemical types [29] are
used for ligand atoms: carbon (C.1, C.2, C.3, C.ar and C.
cat), nitrogen (N.1, N.2, N.3, N.4, N.am, N.ar and N.pl3),
oxygen (O.2, O.3 and O.co2), phosphorous (P.3), sulfur
(S.2, S.3, S.O and S.O2), and halogens (Br, Cl, F, I).
Connectivity information
Page 4 of 16
allowed for each atom in the linker file, e.g. N.3 atom shown
in Fig.2b can bind at most two atoms that belong to rigid
fragments accepting N.3. Noticeably, long linkers with the
extensive connectivity pose a risk of expanding the molecular search space to an unmanageable size. Therefore, unsaturated linkers can also be built to store only the number of
original connections, regardless of the maximum capacity
of their atoms to create covalent bonds. In contrast to saturated linkers, using unsaturated linkers with substantially
less connectivity considerably restricts the search space.
Fragment consolidation andpruning
Fig.2 Graph representation of rigids and linkers. Sample molecular fragments: a a rigid fragment, pyridine, with six constituent atoms in the bold
outline and two possible connections to C.3 and C.ar in the dashed outline, b a three-atom linking fragment containing C.3 carbon with up to 3
connections, C.3 carbon with up to 2 connections, and N.3 nitrogen with up to 2 connections. Examples of 2-molecules: c two identical rigids connected to each other, d, e two possible ways of connecting rigid and linker fragments shown in a, b
Page 5 of 16
Page 6 of 16
b = 9.585 108 120 megabytes with h=7 hash functions. This means each addition of a molecule to F and
each query on F is subject to the worst case of O(h)=7
hashings.
Molecular synthesis requires a string representation
of molecules. In particular, a molecule M is represented
using the Simplified Molecular-Input Line-Entry System (SMILES) specification [32] in the Bloom filter.
We can modify the Compose function in Algorithm 2
by including several Bloom filters, a single, overall filter F and a filter F for each level. When a level molecule M is synthesized, we first check whether it has
been previously synthesized by querying F. If the molecule has not been synthesized (MF), we add M to
F and query F. If M F, we add M to F and proceed
as in Algorithm 2 by adding M to the level- queue to
be processed into level-( + 1) molecules. Clearly, the
global Bloom filter F requires the most memory, but
ensures that molecules containing different number of
fragments with the same SMILES representation are
filtered as redundant.
Implementation ofeSynth
Page 7 of 16
Page 8 of 16
Fig.3 Implementation of eSynth. Input rigid and linker fragments in SDF format (a) are parsed (b) into the graph-based representation (c). Synthesizer (d) is the main engine to generate new molecules, which are subsequently passed to the Writer (e) component and output in SDF format (f)
important drug targets. First, we validated the correctness of the search algorithm using a self-benchmarking
test. Subsequently, we performed a cross-validation test
to evaluate the capability of eSynth to generate bioactive
molecules with novel chemical structures.
In the self-benchmarking test, each active compound
in the DUD-E library was decomposed into fragments
and the molecular synthesis was performed. Parent compounds are compared to those constructed by eSynth
using molecular fingerprint matching with the chemical
similarity assessed by the Tanimoto coefficient (TC) [34,
35]. The cross-validation test mimics a real application,
where novel compounds are expected to be discovered
based on a small set of known bioactive molecules. Here,
bioactive compounds for each DUD-E target were first
clustered into a collection of chemically dissimilar groups
using SUBSET [36] and a TC similarity threshold of 0.7.
Subsequently, each cluster was selected as a validation
set and molecular fragments from the remaining clusters
were used by eSynth to build new molecules. The performance of eSynth is evaluated using the fraction of successfully reconstructed validation compounds using fragments
extracted from chemically different molecules. Due to the
large size of compound datasets generated by eSynth, we
first used Open Babel [37] to filter out those molecules
that are dissimilar to the validation compounds with
TC < 0.5. Next, 3D atomic coordinates were generated
for the synthesized molecules using obgen from the Open
Babel package. A build-up algorithm to find atomic correspondence between chemical structures that calculates
2D-TC based on the identified the maximum common
substructure (kcombu) [38] was then applied to measure
the topological similarity between the filtered subset of
synthesized molecules and the validation compounds.
Results
Search algorithm andthe computational efficiency
those molecules that have already been synthesized. Furthermore, long, saturated linkers pose a considerable risk
of expanding the molecular search space to an unmanageable size, therefore, we introduced unsaturated linkers accepting only those connections that were originally
present in their parent molecules. In contrast to saturated linkers that can form all chemically possible bonds
with rigid fragments, unsaturated linkers significantly
restrict the search space, dramatically improving the
computational efficiency. Finally, using the Rule-of-Five
ensures that the synthesized compounds have drug-like
properties. However, in order to test the drug likeliness
of a molecule prior to its synthesis, Lipinskis descriptors
need to be estimated from molecular fragments, which is
discussed in the following section.
Page 9 of 16
Page 10 of 16
Selfbenchmarking test
We validate our search algorithm using a self-benchmarking test in which active molecules are reconstructed from their fragments. An example of a
successful case is shown in Fig. 7, where the parent
compound is decomposed into two rigid and two
linker fragments (Fig. 7a). Connecting these fragments
through the locations marked by asterisks following the
connectivity patterns of the parent molecule produces
a series of 2-, 3- and 4-molecules shown in Fig. 7bd,
respectively. The target molecule is correctly reconstructed at level 4 (Fig.7d).
In Fig.8, we assess the results obtained for the entire
set of 20,408 active compounds from the DUD-E dataset
using the highest TC between the synthesized and parent molecules. Using saturated linkers, 61.6% of actives
are reconstructed at a TC of 1.0, whereas 83.1 % have
a TC of 0.8. Moreover, the fraction of actives generated by eSynth that match parent compounds increases
to 70.9 % when unsaturated linkers are used. Note that
Open Babel calculates TC for a pair of ligands using
their hashed fingerprints, therefore, a TC of 1.0 denotes
identical fingerprints, but not necessarily identical
chemical structures. The inset in Fig.8 shows the computational efficiency of eSynth. Here, over 60% of actives
are reconstructed in less than a second using a single
processor thread, whereas 90 % compounds are generated within a minute. Note that the synthesis time is
fairly similar when only successful cases at a TC of 0.8
are considered.
Despite these encouraging results, eSynth fails to
reconstruct certain molecules. To clarify why some
compounds are not correctly generated, Fig. 8 presents
main scenarios leading to unsatisfactory results. The first
example shown in Fig.9a is a bioactive compound made
up of a single fragment that cannot be decomposed into
smaller parts. Since the molecular synthesis is not executed, eSynth generates no output. Molecules shown in
Fig.9b, c contain a long linker with a high degree of connectivity. In such cases, rigid fragments can potentially
connect to multiple linker locations leading to a combinatorial explosion. In principle, the parent molecule
will be reconstructed at some point, however, we limit
the wall time for molecular synthesis to 1 h by default.
During that time, about 10% of actives will not be reconstructed as previously shown in Fig. 8 (inset). Using
unsaturated linkers, whose connectivity is limited to the
original connections in the parent compounds, helps
address this issue, nevertheless, those targets containing
long and highly flexible linkers are still not generated in
a reasonably short computing time. Finally, some compounds are actually correctly reconstructed, yet they are
not recognized as similar to their parent molecules. This
Page 11 of 16
Fig.7 Self-benchmarking example. An example of the successful reconstruction of a molecule from its fragments. a The parent molecule is first
decomposed into two rigids, thiophene (C4H4S) and 2,5-dimethylfuran [(CH3)2C4H2O], and two linkers, sulfonamide (SO2N) and carboxylic acid [C(O)
OH]. Examples of constructed b 2-molecules, c 3-molecules, and d 4-molecules including the parent compound
Crossvalidation test
A cross-validation test was performed in order to evaluate the capability of eSynth to generate novel bioactive
molecules. Here, we attempt to reconstruct molecules
highly similar to target compounds using fragments
extracted from chemically different molecules. A set of
fragments obtained from clusters other than the target
cluster may lack rigid fragment(s) necessary to rebuild
some of the active compounds. Since the molecular
synthesis algorithm builds on the provided set of fragments, reconstructing molecules without all necessary
parts is impossible. Encouragingly, 76.1 % of 23,964
active DUD-E compounds for 101 target proteins are, in
principle, reconstructible. Moreover, we examined individual clusters of similar ligands and found out that out
of 9406 clusters, as many as 4100 clusters (43.6%) contain at least one compound that is non-reconstructible
because of missing rigid fragments. These numbers are
likely underestimated, considering the fact that linker
fragments can also be missing and the connectivity patterns may not allow for the correct reconstruction of
the topology of target actives, leading to non-reconstructible cases. Interestingly, Fig. 10a indicates that
for the majority of DUD-E targets, non-reconstructible
actives are typically distributed across clusters of similar
molecules.
Figure 10b presents the results obtained for 9406
chemically distinct groups of compounds compiled using
active DUD-E ligands. Encouragingly, in 45.1% (99.3%)
of the cases, the active ligand is reconstructed at a TC
Page 12 of 16
Fig.9 Examples of molecules not reconstructed by eSynth. Unsuccessful cases in the self-benchmarking test: a A molecule composed of only one
rigid fragment, b, c examples of molecules containing long linkers that exponentially increase the search space
Fig.10 Performance of eSynth in the cross-validation test. a DUD-E targets depicted as gray triangles are positioned in the plot according to the
fraction of reconstructible active compounds and the fraction of chemically similar clusters containing only reconstructible actives. Non-reconstructible actives are more uniformly distributed across clusters for those targets lying closer to the solid black diagonal line. b Cumulative fraction of
compounds reconstructed with the Tanimoto coefficient (TC) shown on the x-axis. TC is calculated using Open Babel (dashed gray line) and kcombu
(solid black line). The vertical dashed line delineates a TC threshold of 0.6. The inset shows a direct comparison between TC values computed by Open
Babel (1D-TC) and kcombu (2D-TC) with a solid black regression line
of 0.6 (0.5) using fragments extracted from different clusters associated with the same receptor protein.
Here, we employ an ultra-fast implementation of hashed
fingerprint-based chemical similarity using Open Babel
to compute 1D-TC because of a large number of pairwise similarity calculations. Subsequently, a relatively
small fraction of compound pairs, whose 1D-TC is 0.5
are subjected to more accurate comparison using 2D-TC
by kcombu. In contrast to fingerprint-based techniques,
kcombu detects one-to-one chemical matching between
two structures that can be used to assess the similarity of their biological activities. When the similarity is
evaluated by kcombu, 34.9% (58.2%) are reconstructed
at a 2D-TC of 0.6 (0.5). It has been shown that two
ligands whose 2D-TC is 0.6 typically have similar binding modes with a root-mean-square deviation (RMSD)
below 2.0 [39, 40]. Moreover, as depicted in the inset
in Fig. 10b, 2D-TC of 0.6 reported by kcombu roughly
Page 13 of 16
the synthetic accessibility depends on a particular biological target and the associated set of bioactive compounds, which are used to extract molecular fragments
for eSynth.
We also compare the distribution of SAscore values for
compounds generated by eSynth to those collected for
several other datasets. Figure11b shows that the median
SAscore value is 2.75 for decoy and 2.87 for active compounds from the DUD-E dataset (catalogue molecules).
Moreover, the median SAscore for FDA approved drugs
obtained from DrugBank [45] and compounds constructed by eSynth are 2.95 and 3.66, respectively. For
comparison, another study reported that the majority of bioactive molecules collected from the Derwent
World Drug Index [46] and the MDL Drug Data Report
[47] databases have SAscore between 2.5 and 5 [44]. In
contrast, natural products are generally more difficult to
synthesize than typical organic molecules. Encouragingly,
the median SAscore for molecules constructed by eSynth
is lower than those for natural products, which is 3.82 for
the Nuclei of Bioassays, Biosynthesis and Ecophysiology
(NuBBE) database of secondary metabolites and derivatives from the biodiversity of Brazil [48], and 4.30 for the
Universal Natural Product Database (UNPD) [49]. Compounds from the Dictionary of Natural Products [50]
were previously reported to have a broad distribution of
SAscore values between 2 and 8. On that account, the
synthetic accessibility of molecules generated by eSynth
is fairly high. The resulting datasets can be further filtered
using existing tools, such as SAscore, in order to exclude
those compounds containing synthetically unfeasible elements, e.g. chiral centers, large rings and non-standard
ring fusions.
Discussion
Exploring the chemical space to produce pharmacologically applicable compounds is a daunting task because of
an enormous size of the search space and numerous biochemical criteria restricting compound generation, i.e.
synthetic feasibility, drug-likeness, and the effective binding to the biological target. Using atom-based methods
may create an enormous chemical space that can easily
surpass the available computing resources. For instance,
the largest library generated by an atom-based approach
is the GDB-17 dataset comprising 166 billion small molecules [51]. On that account, fragment-based methods
can be used as an alternative. Here, reference molecules
are used as a source of building blocks, which can be subsequently combined to produce new compounds that are
to some extent related to the initial molecules [52]. Fragment-based algorithms typically employ certain rules
for combining various moieties, e.g. linkerlinker bonds
are prohibited, while ring-linker-ring connections are
allowed. In contrast to atom-based methods, fragmentbased techniques have capabilities to explore much larger
molecules.
To facilitate the construction of target-focused libraries
for virtual screening, we developed eSynth, a new fragment-based approach to molecular synthesis that follows
simple combinatorial chemistry steps using an optimized,
graph-based algorithm. eSynth rapidly generates series of
compounds with diverse chemical scaffolds complying
with criteria for drug-likeness. Although, these molecules
may have different physicochemical properties, the initial
fragments are procured from biologically active and synthetically feasible compounds. Consequently, we demonstrated that the constructed libraries are enriched with
pharmacologically relevant molecules synthesized under
loose biochemical constraints.
Our effort simplifies the synthesis process by avoiding
techniques such as click chemistry, e.g. AutoClickChem
[53], and those relying on statistical restrains, e.g. Fragment Optimized Growth (FOG) [54]. Moreover, in contrast to other methods designed for certain classes of
compounds such as peptides generated from amino acid
fragments, e.g. GrowMol [55] and LUDI [56], eSynth
can construct any class of organic, drug-like molecules.
Several methods employ the binding site information
in order to generate molecules with a binding affinity
toward a given target protein, e.g. Multiple Copy Simultaneous Search (MCSS) [57], SPROUT: structure generating software using template [58], and SMall Molecule
Growth (SMoG) [59]. eSynth does not require protein
structures, yet the cross-validation test clearly demonstrates that molecules highly similar to those compounds known to bind to the target protein are effectively
generated.
Evolutionary algorithms that break fragments and
make crossovers allow for an exhaustive exploration of
the chemical space [60, 61], however, using these techniques also requires applying chemical stability and
synthetic feasibility rules, which, in turn, utilizes extra
computational resources. For instance, the Algorithm
for Chemical Space Exploration with Stochastic Search
(ACSESS) was designed to construct representative universal libraries in an arbitrary chemical space [61]. This
approach implements convergent evolutionary operations through bond and/or atom modifications on an initial library of molecules to acquire a maximally diverse
subset of molecules. Although using evolutionary techniques does not guarantee a completeness of the space
search, ACSESS systematically explores the small molecule universe, providing a near-infinite source of novel
compounds. Differ from other techniques employing
generic combinatorial algorithms, chemical rules and filters, eSynth was not designed to explore a broad chemical
Page 14 of 16
space; rather, it is purposely confined to a chemical subspace around a particular drug target.
eSynth relies solely on fragments and their connectivity
patterns extracted from parent molecules to generate a
series of drug-like compounds. Thus, it is essential to use
synthetically feasible bioactive compounds as the source
in order to generate molecules with similar chemical
and pharmacological profiles. Importantly, eSynth is not
restricted to a particular hypothesis, e.g. a pre-defined
pharmacophore often used by synthesis algorithms. For
example, a pharmacophore-based de novo design method
of drug-like molecules (PhDD) ensures that molecules
constructed from linker and rigid fragments fit a given
pharmacophore model [60]. The search space in PhDD is
not only confined to the fragment and linker libraries, but
also it is limited to a user-defined template molecule in
the form of a pharmacophore hypothesis. eSynth avoids
such hypotheses in order to generate target-focused compound datasets, yet without any bias toward a specific
scaffold.
Molecular synthesis methods often use knowledgebased rules to connect fragments. For example, combining the amine with the carbonyl to form the amide
changes the preference of the nitrogen atom toward
those moieties that might be more likely attached to
an amide rather than an amine nitrogen [54]. On that
account, FOG uses the statistical knowledge to create new branches and decide which branch to grow as
an effective way to generate novel molecules. Similar to
eSynth, FOG employs a construction algorithm using
molecular fragments to generate synthetically tractable
molecules, however, it grows molecules using a Markov
Chain according to statistics on the frequency of specific connections in the database of chemicals. Moreover,
the Topology Classifier algorithm is used to classify the
constructed molecules as drugs or non-drugs. Given a
set of fragments, the chemical search space in FOG may
be somewhat limited to those molecules having similar
characteristics as the training compounds. In contrast,
eSynth creates new molecules by reusing fragments and
following their connectivity patterns in the parent compounds. Therefore, it the covers a larger chemical space
and does not require constructing statistical databases of
fragment connections.
Conclusions
eSynth is a new algorithm to generate large datasets of
chemical compounds by connecting small molecular
fragments. It first establishes the width of a search space
with a diverse foundation of initial small molecules followed by the stochastic exploration of the depth of the
chemical space by constructing multi-fragment molecules. This hybrid approach ensures a deeper exploration
Availability andrequirements
Project name: eSynth
Project home page: www.brylinski.org/content/molecular-synthesis
Operating system(s): Platform independent, preferably
Linux
Programming language: C++
Other requirements: Open Babel, GSL, Zlib
License: GNU GPL
Any restrictions to use by non-academics: license needed
Authors contributions
MN implemented scripts for creating fragments, compiled datasets, and analyzed results. CA implemented eSynth. MN and CA wrote the manuscript. YD
conducted the synthetic accessibility analysis. SM assisted in the development
of synthesis algorithms implemented in eSynth and provided insights into the
theoretical foundations of probabilistic data structures. MB provided original
insights into the fragmentation process and the use of heuristics as a mechanism to prune the chemical space, and made final edits to the manuscript. All
authors read and approved the final manuscript.
Author details
1
Department ofBiological Sciences, Louisiana State University, Baton Rouge,
LA, USA. 2Department ofComputer Science andInformation Systems, Bradley
University, Peoria, IL, USA. 3Department ofPhysics andAstronomy, Louisiana
State University, Baton Rouge, LA, USA. 4Department ofComputer Science
andEngineering, Louisiana State University, Baton Rouge, LA, USA. 5Center
forComputation andTechnology, Louisiana State University, Baton Rouge, LA,
USA.
Acknowledgements
The authors are grateful to Dr. Peter Ertl at Novartis for sharing his SAscore
code to calculate the synthetic accessibility. Portions of this research were
conducted with high performance computational resources provided by Louisiana State University (HPC@LSU, http://www.hpc.lsu.edu).
Competing interests
The authors declare that the research was conducted in the absence of any
commercial or financial relationships that could be construed as a potential
conflict of interest.
Received: 13 October 2015 Accepted: 3 March 2016
References
1. Leung CH, Ma DL (2015) Recent advances in virtual screening for drug
discovery. Methods 71:13
Page 15 of 16
Page 16 of 16