Statistical evaluation of the Predictive Toxicology Challenge 2000-2001

Bioinformatics. 2003 Jul 1;19(10):1183-93. doi: 10.1093/bioinformatics/btg130.

Authors

Hannu Toivonen¹, Ashwin Srinivasan, Ross D King, Stefan Kramer, Christoph Helma

Affiliation

¹ Department of Computer Science, PO Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland. hannu.toivonen@cs.helsinki.fi

PMID: 12835260
DOI: 10.1093/bioinformatics/btg130

Abstract

Motivation: The development of in silico models to predict chemical carcinogenesis from molecular structure would help greatly to prevent environmentally caused cancers. The Predictive Toxicology Challenge (PTC) competition was organized to test the state-of-the-art in applying machine learning to form such predictive models.

Results: Fourteen machine learning groups generated 111 models. The use of Receiver Operating Characteristic (ROC) space allowed the models to be uniformly compared regardless of the error cost function. We developed a statistical method to test if a model performs significantly better than random in ROC space. Using this test as criteria five models performed better than random guessing at a significance level p of 0.05 (not corrected for multiple testing). Statistically the best predictor was the Viniti model for female mice, with p value below 0.002. The toxicologically most interesting models were Leuven2 for male mice, and Kwansei for female rats. These models performed well in the statistical analysis and they are in the middle of ROC space, i.e. distant from extreme cost assumptions. These predictive models were also independently judged by domain experts to be among the three most interesting, and are believed to include a small but significant amount of empirically learned toxicological knowledge.

Availability: PTC details and data can be found at: http://www.predictive-toxicology.org/ptc/.

Publication types

Comparative Study
Evaluation Study
Validation Study

MeSH terms

Algorithms
Animals
Artificial Intelligence*
Carcinogenicity Tests / methods*
Carcinogens / chemistry*
Carcinogens / toxicity*
Data Collection
Databases, Factual
Environmental Exposure / adverse effects
Female
Government Programs / organization & administration
Male
Mice
Models, Biological*
Models, Statistical*
Neoplasms / chemically induced*
Rats
Reproducibility of Results
Risk Assessment / methods*
Sensitivity and Specificity
Sex Factors
Species Specificity
Structure-Activity Relationship
Toxicology / methods
United States

Substances

Carcinogens