Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2649387.2649439acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

omniClassifier: a desktop grid computing system for big data prediction modeling

Published: 20 September 2014 Publication History

Abstract

Robust prediction models are important for numerous science, engineering, and biomedical applications. However, best-practice procedures for optimizing prediction models can be computationally complex, especially when choosing models from among hundreds or thousands of parameter choices. Computational complexity has further increased with the growth of data in these fields, concurrent with the era of "Big Data". Grid computing is a potential solution to the computational challenges of Big Data. Desktop grid computing, which uses idle CPU cycles of commodity desktop machines, coupled with commercial cloud computing resources can enable research labs to gain easier and more cost effective access to vast computing resources. We have developed omniClassifier, a multi-purpose prediction modeling application that provides researchers with a tool for conducting machine learning research within the guidelines of recommended best-practices. omniClassifier is implemented as a desktop grid computing system using the Berkeley Open Infrastructure for Network Computing (BOINC) middleware. In addition to describing implementation details, we use various gene expression datasets to demonstrate the potential scalability of omniClassifier for efficient and robust Big Data prediction modeling. A prototype of omniClassifier can be accessed at http://omniclassifier.bme.gatech.edu/.

References

[1]
T. M. Brown, D. W. Latham, M. E. Everett, and G. A. Esquerdo, "Kepler input catalog: photometric calibration and stellar classification," The Astronomical Journal, vol. 142, p. 112, 2011.
[2]
L. Shi, G. Campbell, W. D. Jones, F. Campagne, Z. Wen, S. J. Walker, Z. Su, T.-M. Chu, F. M. Goodsaid, and L. Pusztai, "The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models," Nature biotechnology, vol. 28, pp. 827--838, 2010.
[3]
R. Simon, M. D. Radmacher, K. Dobbin, and L. M. McShane, "Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification," Journal of the National Cancer Institute, vol. 95, pp. 14--18, 2003.
[4]
C. Ambroise and G. J. McLachlan, "Selection bias in gene extraction on the basis of microarray gene-expression data," Proceedings of the National Academy of Sciences, vol. 99, pp. 6562--6566, 2002.
[5]
S. Varma and R. Simon, "Bias in error estimation when using cross-validation for model selection," BMC bioinformatics, vol. 7, p. 91, 2006.
[6]
D. P. Anderson, "Boinc: A system for public-resource computing and storage," in Grid Computing, 2004. Proceedings. Fifth IEEE/ACM International Workshop on, 2004, pp. 4--10.
[7]
D. P. Anderson, J. Cobb, E. Korpela, M. Lebofsky, and D. Werthimer, "SETI@ home: an experiment in public-resource computing," Communications of the ACM, vol. 45, pp. 56--61, 2002.
[8]
A. L. Beberg, D. L. Ensign, G. Jayachandran, S. Khaliq, and V. S. Pande, "Folding@ home: Lessons from eight years of volunteer distributed computing," in Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, 2009, pp. 1--8.
[9]
S. M. Larson, C. D. Snow, and M. Shirts, "Folding@ Home and Genome@ Home: Using distributed computing to tackle previously intractable problems in computational biology," 2002.
[10]
K. Vinsen and D. Thilker, "A BOINC based, citizen-science project for pixel spectral energy distribution fitting of resolved galaxies in multi-wavelength surveys," Astronomy and Computing, vol. 3, pp. 1--12, 2013.
[11]
T. Desell, B. Szymanski, and C. Varela, "An asynchronous hybrid genetic-simplex search for modeling the Milky Way galaxy using volunteer computing," in Proceedings of the 10th annual conference on Genetic and evolutionary computation, 2008, pp. 921--928.
[12]
B. Knispel, R. Eatough, H. Kim, E. Keane, B. Allen, D. Anderson, C. Aulbert, O. Bock, F. Crawford, and H.-B. Eggenstein, "Einstein@ Home Discovery of 24 Pulsars in the Parkes Multi-beam Pulsar Survey," The Astrophysical Journal, vol. 774, p. 93, 2013.
[13]
O. Nov, O. Arazy, and D. Anderson, "Scientists@ Home: what drives the quantity and quality of online citizen science participation?," PloS one, vol. 9, p. e90375, 2014.
[14]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, and V. Dubourg, "Scikit-learn: Machine learning in Python," The Journal of Machine Learning Research, vol. 12, pp. 2825--2830, 2011.
[15]
S. Sonnenburg, G. Rätsch, S. Henschel, C. Widmer, J. Behr, A. Zien, F. d. Bona, A. Binder, C. Gehl, and V. Franc, "The SHOGUN machine learning toolbox," The Journal of Machine Learning Research, vol. 11, pp. 1799--1802, 2010.
[16]
R. R. Bouckaert, E. Frank, M. A. Hall, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "WEKA---Experiences with a Java Open-Source Project," The Journal of Machine Learning Research, vol. 11, pp. 2533--2541, 2010.
[17]
S. R. Piccolo and L. J. Frey, "ML-Flex: A flexible toolbox for performing classification analyses in parallel," The Journal of Machine Learning Research, vol. 13, pp. 555--559, 2012.
[18]
K. Ng, A. Ghoting, S. R. Steinhubl, W. F. Stewart, B. Malin, and J. Sun, "PARAMO: A PARAllel predictive Modeling platform for healthcare analytic research using electronic health records," Journal of biomedical informatics, vol. 48, pp. 160--170, 2014.
[19]
B. Efron and R. J. Tibshirani, An introduction to the bootstrap vol. 57: CRC press, 1994.
[20]
R. R. Picard and R. D. Cook, "Cross-validation of regression models," Journal of the American Statistical Association, vol. 79, pp. 575--583, 1984.
[21]
C. Ding and H. Peng, "Minimum redundancy feature selection from microarray gene expression data," Journal of bioinformatics and computational biology, vol. 3, pp. 185--205, 2005.
[22]
V. G. Tusher, R. Tibshirani, and G. Chu, "Significance analysis of microarrays applied to the ionizing radiation response," Proceedings of the National Academy of Sciences, vol. 98, pp. 5116--5121, 2001.
[23]
R. Breitling, P. Armengaud, A. Amtmann, and P. Herzyk, "Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments," FEBS letters, vol. 573, pp. 83--92, 2004.
[24]
R. M. Parry, J. H. Phan, and M. D. Wang, "Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems," BMC bioinformatics, vol. 13, p. S7, 2012.
[25]
R. Parry, W. Jones, T. Stokes, J. Phan, R. Moffitt, H. Fang, L. Shi, A. Oberthuer, M. Fischer, and W. Tong, "k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction," The pharmacogenomics journal, vol. 10, pp. 292--309, 2010.
[26]
C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector machines," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, p. 27, 2011.
[27]
G. Bradski, "The opencv library," Doctor Dobbs Journal, vol. 25, pp. 120--126, 2000.
[28]
P. Komarek and A. W. Moore, "Making logistic regression a core data mining tool with tr-irls," in Data Mining, Fifth IEEE International Conference on, 2005, p. 4 pp.
[29]
J. Eaton, D. Bateman, and S. Hauberg, GNU Octave: a high-level interactive language for numerical computations: John W. Eaton., 2009.
[30]
L. D. Miller, J. Smeds, J. George, V. B. Vega, L. Vergara, A. Ploner, Y. Pawitan, P. Hall, S. Klaar, and E. T. Liu, "An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival," Proceedings of the National Academy of Sciences of the United States of America, vol. 102, pp. 13550--13555, 2005.
[31]
A. J. Minn, G. P. Gupta, P. M. Siegel, P. D. Bos, W. Shu, D. D. Giri, A. Viale, A. B. Olshen, W. L. Gerald, and J. Massagué, "Genes that mediate breast cancer metastasis to lung," Nature, vol. 436, pp. 518--524, 2005.
[32]
C. Sotiriou, P. Wirapati, S. Loi, A. Harris, S. Fox, J. Smeds, H. Nordgren, P. Farmer, V. Praz, and B. Haibe-Kains, "Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis," Journal of the National Cancer Institute, vol. 98, pp. 262--272, 2006.
[33]
B. Stefanska, J. Huang, B. Bhattacharyya, M. Suderman, M. Hallett, Z.-G. Han, and M. Szyf, "Definition of the landscape of promoter DNA hypomethylation in liver cancer," Cancer research, vol. 71, pp. 5891--5903, 2011.
[34]
Y. B. Deng, G. Nagae, Y. Midorikawa, K. Yagi, S. Tsutsumi, S. Yamamoto, K. Hasegawa, N. Kokudo, H. Aburatani, and A. Kaneda, "Identification of genes preferentially methylated in hepatitis C virus-related hepatocellular carcinoma," Cancer science, vol. 101, pp. 1501--1510, 2010.
[35]
S. Roessler, E. L. Long, A. Budhu, Y. Chen, X. Zhao, J. Ji, R. Walker, H. L. Jia, Q. H. Ye, and L. X. Qin, "Integrative genomic identification of genes on 8p associated with hepatocellular carcinoma progression and patient survival," Gastroenterology, vol. 142, pp. 957--966. e12, 2012.
[36]
L. Badea, V. Herlea, S. O. Dima, T. Dumitrascu, and I. Popescu, "Combined Gene Expression Analysis of Whole-Tissue and Microdissected Pancreatic Ductal Adenocarcinoma identifies Genes Specifically Overexpressed in Tumor Epithelia-The authors reported a Combined Gene Expression Analysis of Whole-Tissue and Microdissected Pancreatic Ductal Adenocarcinoma identifies Genes Specifically Overexpressed in Tumor Epithelia," Hepato-gastroenterology, vol. 55, p. 2016, 2008.
[37]
H. Pei, L. Li, B. L. Fridley, G. D. Jenkins, K. R. Kalari, W. Lingle, G. Petersen, Z. Lou, and L. Wang, "FKBP51 affects cancer cell response to chemotherapy by negatively regulating Akt," Cancer cell, vol. 16, pp. 259--266, 2009.
[38]
M. Ishikawa, K. Yoshida, Y. Yamashita, J. Ota, S. Takada, H. Kisanuki, K. Koinuma, Y. L. Choi, R. Kaneda, and T. Iwao, "Experimental trial for diagnosis of pancreatic ductal carcinoma based on gene expression profiles of pancreatic ductal cells," Cancer science, vol. 96, pp. 387--393, 2005.
[39]
C. Pilarsky, O. Ammerpohl, B. Sipos, E. Dahl, A. Hartmann, A. Wellmann, T. Braunschweig, M. Löhr, R. Jesnowski, and H. Friess, "Activation of Wnt signalling in stroma from pancreatic cancer identified by gene expression profiling," Journal of cellular and molecular medicine, vol. 12, pp. 2823--2835, 2008.
[40]
U. R. Chandran, R. Dhir, C. Ma, G. Michalopoulos, M. Becich, and J. Gilbertson, "Differences in gene expression in prostate cancer, normal appearing prostate tissue adjacent to cancer and prostate tissue from cancer free organ donors," BMC cancer, vol. 5, p. 45, 2005.
[41]
D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. D'Amico, and J. P. Richie, "Gene expression correlates of clinical prostate cancer behavior," Cancer cell, vol. 1, pp. 203--209, 2002.
[42]
J. Jones, H. Otu, D. Spentzos, S. Kolia, M. Inan, W. D. Beecken, C. Fellbaum, X. Gu, M. Joseph, and A. J. Pantuck, "Gene signatures of progression and metastasis in renal cell cancer," Clinical Cancer Research, vol. 11, pp. 5730--5739, 2005.
[43]
E. J. Kort, L. Farber, M. Tretiakova, D. Petillo, K. A. Furge, X. J. Yang, A. Cornelius, and B. T. Teh, "The E2F3-Oncomir-1 axis is activated in Wilms' tumor," Cancer research, vol. 68, pp. 4034--4038, 2008.
[44]
A. N. Schuetz, Q. Yin-Goen, M. B. Amin, C. S. Moreno, C. Cohen, C. D. Hornsby, W. L. Yang, J. A. Petros, M. M. Issa, and J. G. Pattaras, "Molecular classification of renal tumors by gene expression profiling," The Journal of Molecular Diagnostics, vol. 7, pp. 206--218, 2005.
[45]
M. V. Yusenko, R. P. Kuiper, T. Boethe, B. Ljungberg, A. G. van Kessel, and G. Kovacs, "High-resolution DNA copy number and gene expression analyses distinguish chromophobe renal cell carcinomas and renal oncocytomas," BMC cancer, vol. 9, p. 152, 2009.
[46]
S. Whalen and G. Pandey, "A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics," in Data Mining (ICDM), 2013 IEEE 13th International Conference on, 2013, pp. 807--816.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
September 2014
851 pages
ISBN:9781450328944
DOI:10.1145/2649387
  • General Chairs:
  • Pierre Baldi,
  • Wei Wang
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. big data
  2. desktop grid computing
  3. nested cross validation
  4. prediction modeling

Qualifiers

  • Short-paper

Funding Sources

Conference

BCB '14
Sponsor:
BCB '14: ACM-BCB '14
September 20 - 23, 2014
California, Newport Beach

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 142
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media