Abstract
Query expansion is the process of reformulating an original query to improve retrieval performance in information retrieval systems. Relevance feedback is one of the most useful query modification techniques in information retrieval systems. In this paper, we introduce query expansion into ligand-based virtual screening (LBVS) using the relevance feedback technique. In this approach, a few high-ranking molecules of unknown activity are filtered from the outputs of a Bayesian inference network based on a single ligand molecule to form a set of ligand molecules. This set of ligand molecules is used to form a new ligand molecule. Simulated virtual screening experiments with the MDL Drug Data Report and maximum unbiased validation data sets show that the use of ligand expansion provides a very simple way of improving the LBVS, especially when the active molecules being sought have a high degree of structural heterogeneity. However, the effectiveness of the ligand expansion is slightly less when structurally-homogeneous sets of actives are being sought.
Similar content being viewed by others
References
Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. J Chem Inf Comput Sci 38:983–996
Carhart RE, Smith DH, Venkataraghavan R (1985) Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comput Sci 25:64–73
Johnson MA, Maggiora GM (1990) Concepts and application of molecular similarity. Wiley, New York
Sheridan RP, Kearsley SK (2002) Why do we need so many chemical similarity search methods? Drug Discov Today 7:903–911
Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity—a review. QSAR Comb Sci 22:1006–1026
Bender A, Glen RC (2004) Molecular similarity: a key technique in molecular informatics. Org Biomol Chem 2:3204–3218
Maldonado A, Doucet J, Petitjean M, Fan B-T (2006) Molecular similarity and diversity in chemoinformatics: from theory to applications. Mol Divers 10:39–79
Leach AR, Gillet VJ (2003) An Introduction to chemoinformatics. Kluwer, Dordrecht
Abdo A, Salim N (2009) Similarity-based virtual screening with a Bayesian inference network. ChemMedChem 4:210–218
Abdo A, Salim N (2011) Ligand-based virtual screening using Bayesian inference network. In: Library design, search methods, and applications of fragment-based drug design, vol 1076. ACS symposium series, vol 1076. American Chemical Society, pp 57–69
Abdo A, Salim N (2011) New fragment weighting scheme for the Bayesian inference network in ligand-based virtual screening. J Chem Inf Model 51:25–32
Abdo A, Salim N (2009) Bayesian inference network significantly improves the effectiveness of similarity searching using multiple 2D fingerprints and multiple reference structures. QSAR Comb Sci 28:1537–1545
Abdo A, Salim N (2009) Similarity-based virtual screening using Bayesian inference network: enhanced search using 2D fingerprints and multiple reference structures. QSAR Comb Sci 28:654–663
Abdo A, Chen B, Mueller C, Salim N, Willett P (2010) Ligand-based virtual screening using Bayesian networks. J Chem Inf Model 50:1012–1020
Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A (2005) Enhancing the effectiveness of similarity-based virtual screening using nearest-neighbor information. J Med Chem 48:7049–7054
Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A (2006) New methods for ligand-based virtual screening: use of data fusion and machine learning to enhance the effectiveness of similarity searching. J Chem Inf Model 46:462–470
Gardiner EJ, Gillet VJ, Haranczyk M, Hert J, Holliday JD, Malim N, Patel Y, Willett P (2009) Turbo similarity searching: effect of fingerprint and dataset on virtual-screening performance. Stat Anal Data Mining 2:103–114
Abdo A, Salim N, Ahmed A (2011) Implementing relevance feedback in ligand-based virtual screening using Bayesian inference network. J Biomol Screen 16:1081–1088
de Castro P, de França F, Ferreira H, Coelho G, Von Zuben F (2010) Query expansion using an immune-inspired biclustering algorithm. Nat Comput 9:579–602
López-Pujalte C, Guerrero-Bote VP, Moya-Anegón FD (2003) Genetic algorithms in relevance feedback: a second test and new contributions. Inf Process Manage 39:669–687
Taktak I, Tmar M, Hamadou A (2009) Query reformulation based on relevance feedback. In: Andreasen T, Yager R, Bulskov H, Christiansen H, Larsen H (eds) Flexible query answering systems, vol 5822. Lecture notes in computer science. Springer, Berlin, pp 134–144
Symyx Technologies. MDL drug data report. http://www.symyx.com/products/databases/bioactivity/mddr/index.jsp. Accessed October 20, 2011
Pipeline Pilot (2008) Accelrys Software Inc., San Diego
Rohrer SG, Baumann K (2009) Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data. J Chem Inf Model 49:169–184
Siegel S, Castellan NJ (1988) Nonparametric statistics for the behavioral sciences. McGraw-Hill, New York
Swets J (1988) Measuring the accuracy of diagnostic systems. Science 240:1285–1293
Triballeau N, Acher F, Brabet I, Pin J-P, Bertrand H-O (2005) Virtual screening workflow development guided by the “receiver operating characteristic” curve approach. Application to high-throughput docking on metabotropic glutamate receptor subtype 4. J Med Chem 48(7):2534–2547. doi:10.1021/jm049092j
Acknowledgments
This work is supported by Ministry of Higher Education (MOHE) and Research Management Centre (RMC) at the Universiti Teknologi Malaysia (UTM) under Research University Grant Category (VOT Q.J130000.7128.00H72).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abdo, A., Saeed, F., Hamza, H. et al. Ligand expansion in ligand-based virtual screening using relevance feedback. J Comput Aided Mol Des 26, 279–287 (2012). https://doi.org/10.1007/s10822-012-9543-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-012-9543-4