Comparison of descriptor spaces for chemical compound retrieval and classification
In recent years the development of computational techniques that build models to correctly assign chemical compounds to various classes or to retrieve potential drug-like compounds has been an active area of research. Many of the best-performing ...
The importance of generalizability for anomaly detection
In security-related areas there is concern over novel “zero-day” attacks that penetrate system defenses and wreak havoc. The best methods for countering these threats are recognizing “nonself” as in an Artificial Immune System or recognizing “self” ...
Robust projected clustering
Projected clustering partitions a data set into several disjoint clusters, plus outliers, so that each cluster exists in a subspace. Subspace clustering enumerates clusters of objects in all subspaces of a data set, and it tends to produce many ...
Ensembles of relational classifiers
Relational classification aims at including relations among entities into the classification process, for example taking relations among documents such as common authors or citations into account. However, considering more than one relation can further ...
Random walk with restart: fast solutions and applications
How closely related are two nodes in a graph? How to compute this score quickly, on huge, disk-resident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used ...
Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond
Much work on skewed, stochastic, high dimensional, and biased datasets usually implicitly solve each problem separately. Recently, we have been approached by Texas Commission on Environmental Quality (TCEQ) to help them build highly accurate ozone level ...