[PDF][PDF] Alternative algorithm for the search of an optimal set of descriptors in QSAR-QSPR studies

PR Duchowicz, EA Castro… - MATCH Commun. Math …, 2006 - match.pmf.kg.ac.rs
MATCH Commun. Math. Comput. Chem, 2006match.pmf.kg.ac.rs
During the last decades there has been great interest in the development of Quantitative
Structure-Property/Activity Relationships QSPR/QSAR for the reliable prediction of
physicochemical, biological and pharmacological properties of chemical compounds, solely
from the knowledge of their molecular structure. Such relationships are most welcome when
the experimental values have not been determined in the laboratory due to economical or
time consuming reasons, or technical difficulties1-4. In this kind of studies one looks for a …
During the last decades there has been great interest in the development of Quantitative Structure-Property/Activity Relationships QSPR/QSAR for the reliable prediction of physicochemical, biological and pharmacological properties of chemical compounds, solely from the knowledge of their molecular structure. Such relationships are most welcome when the experimental values have not been determined in the laboratory due to economical or time consuming reasons, or technical difficulties1-4. In this kind of studies one looks for a relationship of the form() P f= d, where P is the property being studied and d is a set of mathematical or empirical molecular descriptors quantifying the molecular structure and carrying information about it, represented by simple numerical quantities. The simplest descriptors are, for instance, the numbers and types of chosen atoms or bonds in the structure of the molecule. More elaborated descriptors can be derived from various different theories, such as the Chemical Graph Theory, Quantum Mechanics, Information Theory, etc. 5-7. The function () f d is commonly unknown and depends on the property P, the set of descriptors d, and the number and type of compounds under study. Typically, this function is chosen so that it generates the best predictions for the property being modeled.
Nowadays there are thousands of descriptors available in the literature7-9, and one is faced with the problem of selecting the best set of d descriptors out of a much larger set of D ones, according to some criterion such as the smallest total standard deviation S 10-13. A full search (FS) of such optimal set requires!/[()!!] D D dd− linear regressions; a number that increases so rapidly with D that soon becomes impracticable. Moreover, if D is smaller than the number of molecules M, then one may look for the global optimal set of descriptors and the necessary comparison of the best sets of 1, 2,, d D=. descriptors requires a total of 2 1 D− linear regressions.
match.pmf.kg.ac.rs