Abstract
Microarray data usually contain missing values, thus estimating these missing values is an important preprocessing step. This paper proposes an estimation method of missing values based on Partial Least Squares (PLS) regression. The method is feasible for microarray data, because of the characteristics of PLS regression. We compared our method with three methods, including ROWaverage, KNNimpute and LLSimpute, on different data and various missing probabilities. The experimental results show that the proposed method is accurate and robust for estimating missing values.
This work was supported by the 863 Research Plan of China under Grant No. 2004AA231071 and the NSF of China under Grant No. 60533110.
Chapter PDF
Similar content being viewed by others
Keywords
- Microarray Data
- Partial Little Square
- Similar Gene
- Partial Little Square
- Normalize Root Mean Square Error
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Chu, S., DeRisi, J., et al.: The transcriptioal program of sporulation in budding yeast. Science 278, 680–686 (1998)
Alon, U., Barkai, N., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotid arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)
Golub, T.R., Slonim, D.K., et al.: Molecular classification of cancer: class discovery and class prediction by expression monitoring. Science 286, 531–537 (1999)
Alizadeh, A.A., Eisen, M.B., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Raychaudhuri, S., Stuart, J.M., Altman, R.: Principal components analysis to summarize microarray experiments: application to sporulation time series. In: Pac. Symp. Biocomput., pp. 455–466 (2000)
Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl Acad. Sci. USA 97, 10101–10106 (2000)
Troyanskaya, O., Cantor, M., et al.: Missing value estimation methods for DNA microarray. Bioinformatics 17, 520–525 (2001)
Oba, S., Sato, M., et al.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19, 2088–2096 (2003)
Bø, T.H., Dysvik, B., Jonassen, I.: LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res. 32(e34) (2004)
Kim, H., Golub, G.H., Park, H.: Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21, 187–198 (2005)
Helland, I.S.: On the structure of partial least squares regression. Commun. Stat. -Simul. Comput. 17, 581–607 (1988)
Garthwaite, P.H.: An interpretation of partial least squares. J. Am. Stat. Assoc. 89, 122–127 (1994)
Wang, H.: Partial Least-squares Regression — Method and Applications. National Defence Industry Press, China (1999)
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. 36, 111–133 (1974)
Spellman, P.T., Sherlock, G., et al.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998)
Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by ologonucleotide arrays. Proc. Natl Acad. Sci. USA 96, 6745–6750 (1999)
Ouyang, M., Welsh, W.J., Georgopoulos, P.: Gaussian mixture clustering and imputation of microarray data. Bioinformatics 20, 917–923 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, K., Li, J., Wang, C. (2006). Missing Values Estimation in Microarray Data with Partial Least Squares Regression. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science – ICCS 2006. ICCS 2006. Lecture Notes in Computer Science, vol 3992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11758525_90
Download citation
DOI: https://doi.org/10.1007/11758525_90
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34381-3
Online ISBN: 978-3-540-34382-0
eBook Packages: Computer ScienceComputer Science (R0)