Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Network-based methods to identify highly discriminating subsets of biomarkers

Published: 01 November 2014 Publication History

Abstract

Complex diseases such as various types of cancer and diabetes are conjectured to be triggered and influenced by a combination of genetic and environmental factors. To integrate potential effects from interplay among underlying candidate factors, we propose a new network-based framework to identify effective biomarkers by searching for groups of synergistic risk factors with high predictive power to disease outcome. An interaction network is constructed with node weights representing individual predictive power of candidate factors and edge weights capturing pairwise synergistic interactions among factors. We then formulate this network-based biomarker identification problem as a novel graph optimization model to search for multiple cliques with maximum overall weight, which we denote as the Maximum Weighted Multiple Clique Problem (MWMCP). To achieve optimal or near optimal solutions, both an analytical algorithm based on column generation method and a fast heuristic for large-scale networks have been derived. Our algorithms for MWMCP have been implemented to analyze two biomedical data sets: a Type 1 Diabetes (T1D) data set from the Diabetes Prevention Trial-Type 1 (DPT-1) study, and a breast cancer genomics data set for metastasis prognosis. The results demonstrate that our network-based methods can identify important biomarkers with better prediction accuracy compared to the conventional feature selection that only considers individual effects.

References

[1]
D. Thomas, "Gene-environment-wide association studies: Emerging approaches," Nat. Rev. Genetics, vol. 11, no. 4, pp. 259-272, 2010.
[2]
U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine, "Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays," Proc. Nat. Acad. Sci. USA, vol. 96, no. 12, pp. 6745-6750, 1999.
[3]
Y. Saeys, I. Inza, and P. Larrañaga, "A review of feature selection techniques in bioinformatics," Bioinformatics, vol. 23, no. 19, pp. 2507-2517, 2007.
[4]
Z. Q. Tang, L. Y. Han, H. H. Lin, J. Cui, J. Jia, B. C. Low, B. W. Li, and Y. Z. Chen, "Derivation of stable microarray cancer-differentiating signatures using consensus scoring of multiple random sampling and gene-ranking consistency evaluation," Cancer Res., vol. 67, no. 20, pp. 9996-10 003, 2007.
[5]
R. Tibshirani, "Regression shrinkage and selection via the lasso," J. Royal Statist. Soc. Ser. B (Methodol.), vol. 58, no. 1, pp. 267-288, 1996.
[6]
S. Ma, and J. Huang, "Penalized feature selection and classification in bioinformatics," Briefings Bioinformat., vol. 9, no. 5, pp. 392-403, 2008.
[7]
J. Watkinson, X. Wang, T. Zheng, and D. Anastassiou, "Identification of gene interactions associated with disease from gene expression data using synergy networks," BMC Syst. Biol., vol. 2, no. 1, p. 10, 2008.
[8]
H.-Y. Chuang, E. Lee, Y.-T. Liu, D. Lee, and T. Ideker, "Network-based classification of breast cancer metastasis," Mol. Syst. Biol., vol. 3, no. 1, pp. 140-149, 2007.
[9]
J. Su, B.-J. Yoon, and E. R. Dougherty, "Accurate and reliable cancer classification based on probabilistic inference of pathway activity," PLoS One, vol. 4, no. 12, p. e8161, 2009.
[10]
S. J. Sajjadi, A. A. Adl, B. Zeng, and X. Qian, "Finding the most discriminating sets of biomarkers by maximum weighted clique," presented at the 6th INFORMS Workshop Data Mining Health Informatics, Charlotte, NC, USA, vol. 1500, 2011.
[11]
W. F. Symmans, J. Liu, D. M. Knowles, and G. Inghirami, "Breast cancer heterogeneity: Evaluation of clonality in primary and metastatic lesions," Human Pathol., vol. 26, no. 2, pp. 210-216, 1995.
[12]
L. Ein-Dor, I. Kela, G. Getz, D. Givol, and E. Domany, "Outcome signature genes in breast cancer: Is there a unique set?" Bioinformatics, vol. 21, no. 2, pp. 171-178, 2005.
[13]
A. H. Bild, G. Yao, J. T. Chang, Q. Wang, A. Potti, D. Chasse, M.-B. Joshi, D. Harpole, J. M. Lancaster, A. Berchuck, J. A. Olson, J. R. Marks, H. K. Dressman, M. West, and J. R. Nevins, "Oncogenic pathway signatures in human cancers as a guide to targeted therapies," Nature, vol. 439, no. 7074, pp. 353-357, 2005.
[14]
M. R. Gary, and D. S. Johnson, Computers and intractability: A Guide to the Theory of Np-Completeness. San Francisco, CA, USA, Freeman, 1979.
[15]
I. M. Bomze, M. Budinich, P. M. Pardalos, and M. Pelillo, "The maximum clique problem," in Handbook of Combinatorial Optimization. New York, NY, USA: Springer, 1999, pp. 1-74.
[16]
F. Vanderbeck and L. A. Wolsey, "An exact algorithm for ip column generation," Oper. Res. Lett., vol. 19, no. 4, pp. 151-159, 1996.
[17]
C. Barnhart, E. L. Johnson, G. L. Nemhauser, M. W. Savelsbergh, and P. H. Vance, "Branch-and-price: Column generation for solving huge integer programs," Oper. Res., vol. 46, no. 3, pp. 316-329, 1998.
[18]
M. E. Lübbecke and J. Desrosiers, "Selected topics in column generation," Oper. Res., vol. 53, no. 6, pp. 1007-1023, 2005.
[19]
H. Saigo, T. Uno, and K. Tsuda, "Mining complex genotypic features for predicting hiv-1 drug resistance," Bioinformatics, vol. 23, no. 18, pp. 2455-2462, 2007.
[20]
Y. Ying, K. Huang, and C. Campbell, "Enhanced protein fold recognition through a novel data integration approach," BMC Bioinformatics, vol. 10, no. 1, p. 267, 2009.
[21]
J. Håstad, "Clique is hard to approximate within n1-ε"," Acta Math., vol. 182, no. 1, pp. 105-142, 1999.
[22]
N. Sloane, "Unsolved problems in graph theory arising from the study of codes," Graph Theory Notes New York, vol. 18, pp. 11-20, 1989.
[23]
R. Horaud and T. Skordas, "Stereo correspondence through feature grouping and maximal cliques," IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 11, pp. 1168-1180, Nov. 1989.
[24]
K. Corrádi and S. Szabó, "A combinatorial approach for keller's conjecture," Periodica Math. Hungarica, vol. 21, no. 2, pp. 95-100, 1990.
[25]
P. Berman and A. Pelc, "Distributed probabilistic fault diagnosis for multiprocessor systems," in Proc. 20th Int. Symp. Fault-Tolerant Comput., 1990, pp. 340-346.
[26]
F. S. Kuhl, G. M. Crippen, and D. K. Friesen, "A combinatorial algorithm for calculating ligand binding," J. Comput. Chem., vol. 5, no. 1, pp. 24-34, 1984.
[27]
A. Ben-Dor, R. Shamir, and Z. Yakhini, "Clustering gene expression patterns," J. Comput. Biol., vol. 6, no. 3-4, pp. 281-297, 1999.
[28]
R. Carraghan and P. M. Pardalos, "An exact algorithm for the maximum clique problem," Oper. Res. Lett., vol. 9, no. 6, pp. 375-382, 1990.
[29]
P. R. Östergård, "A new algorithm for the maximum-weight clique problem," Nordic J. Comput., vol. 8, no. 4, pp. 424-436, 2001.
[30]
D. Kumlander, "A new exact algorithm for the maximum-weight clique problem based on a heuristic vertex-coloring and a backtrack search," in Proc. 5th Int. Conf. Model., Comput. Optim. Inf. Syst. Manage. Sci., 2004, pp. 202-208.
[31]
D. Brélaz, "New methods to color the vertices of a graph," Commun. ACM, vol. 22, no. 4, pp. 251-256, 1979.
[32]
A. Mehrotra, and M. A. Trick, "Cliques and clustering: A combinatorial approach," Oper. Res. Lett., vol. 22, no. 1, pp. 1-12, 1998.
[33]
R. Chen, G. I. Mias, J. Li-Pook-Than, L. Jiang, H. Y. Lam, R. Chen, E. Miriami, K. J. Karczewski, M. Hariharan, F. E. Dewey, Y. Cheng, M. J. Clark, H. Im, L. Habegger, S. Balasubramanian, M. O'Huallachain, J. T. Dudley, S. Hillenmeyer, R. Haraksingh, D. Sharon, G. Euskirchen, P. Lacroute, K. Bettinger, A. P. Boyle, M. Kasowski, F. Grubert, S. Seki, M. Garcia, M. Whirl-Carrillo, M. Gallardo, M. A. Blasco, P. L. Greenberg, P. Snyder, T. E. Klein, R. B. Altman, A. J. Butte, E. A. Ashley, M. Gerstein, K. C. Nadeau, H. Tang, and M. Snyder, "Personal omics profiling reveals dynamic molecular and medical phenotypes," Cell, vol. 148, no. 6, pp. 1293-1307, 2012.
[34]
P. Erdös, and A. Rényi, "On the evolution of random graphs," Publ. Math. Inst. Hungarian Acad. Sci., vol. 5, pp. 17-61, 1960.
[35]
C.-C. Chang and C.-J. Lin, "Libsvm: A library for support vector machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, p. 27, 2011.
[36]
P. Xu, Y. Wu, Y. Zhu, G. Dagne, G. Johnson, D. Cuthbertson, J. P. Krischer, J. M. Sosenko, J. S. Skyler, and on behalf of the Diabetes Prevention TrialType 1 (DPT-1) Study Group, "Prognostic performance of metabolic indexes in predicting onset of type 1 diabetes," Diabetes Care, vol. 33, no. 12, pp. 2508-2513, 2010.
[37]
J. P. Krischer, D. D. Cuthbertson, L. Yu, T. Orban, N. Maclaren, R. Jackson, W. E. Winter, D. A. Schatz, J. P. Palmer, G. S. Eisenbarth, and the Diabetes Prevention TrialType 1 Study Group, "Screening strategies for the identification of multiple antibody-positive relatives of individuals with type 1 diabetes," J. Clin. Endocrinol. Metabolism, vol. 88, no. 1, pp. 103-108, 2003.
[38]
J. M. Sosenko, J. P. Palmer, C. J. Greenbaum, J. Mahon, C. Cowie, J. P. Krischer, H. P. Chase, N. H. White, B. Buckingham, K. C. Herold, D. Cuthbertson, J. S. Skyler, and the Diabetes Prevention Trial-Type 1 Study Group, "Increasing the accuracy of oral glucose tolerance testing and extending its application to individuals with normal glucose tolerance for the prediction of type 1 diabetes the diabetes prevention trial-type 1," Diabetes Care, vol. 30, no. 1, pp. 38-42, 2007.
[39]
American Diabetes Association, "Diagnosis and classification of diabetes mellitus," Diabetes Care, vol. 36, no. Suppl 1, pp. S67-S74, 2013.
[40]
Y. Wang, J. G. Klijn, Y. Zhang, A. M. Sieuwerts, M. P. Look, F. Yang, D. Talantov, M. Timmermans, M. E. Meijer-van Gelder, J. Yu, T. Jatkoe, E. M. Berns, D. Atkins, and J. A. Foekens, "Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer," The Lancet, vol. 365, no. 9460, pp. 671-679, 2005.
[41]
B. Bozóky, A. Savchenko, P. Csermely, T. Korcsmáros, Z. Dúl, F. Pontén, L. Székely, and G. Klein, "Novel signatures of cancer-associated fibroblasts," Int. J. Cancer, vol. 133, pp. 286-293, 2013.

Cited By

View all
  • (2017)Detecting Pairwise Interactive Effects of Continuous Random Variables for Biomarker Identification with Small Sample SizeIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2016.258604214:6(1265-1275)Online publication date: 1-Nov-2017
  • (2014)Selected articles from the 2012 IEEE international workshop on genomic signal processing and statistics (GENSIPS 2012)IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2014.235321811:6(981-983)Online publication date: 1-Nov-2014

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 11, Issue 6
November/December 2014
290 pages
ISSN:1545-5963
  • Editor:
  • Ying Xu
Issue’s Table of Contents

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 November 2014
Accepted: 10 May 2014
Revised: 16 April 2014
Received: 14 March 2013
Published in TCBB Volume 11, Issue 6

Author Tags

  1. column generation
  2. discriminating biomarkers
  3. maximum weighted multiple clique problem

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Detecting Pairwise Interactive Effects of Continuous Random Variables for Biomarker Identification with Small Sample SizeIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2016.258604214:6(1265-1275)Online publication date: 1-Nov-2017
  • (2014)Selected articles from the 2012 IEEE international workshop on genomic signal processing and statistics (GENSIPS 2012)IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2014.235321811:6(981-983)Online publication date: 1-Nov-2014

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media