Nothing Special   »   [go: up one dir, main page]

skip to main content
article

On Efficient Feature Ranking Methods for High-Throughput Data Analysis

Published: 01 November 2015 Publication History

Abstract

Efficient mining of high-throughput data has become one of the popular themes in the big data era. Existing biology-related feature ranking methods mainly focus on statistical and annotation information. In this study, two efficient feature ranking methods are presented. Multi-target regression and graph embedding are incorporated in an optimization framework, and feature ranking is achieved by introducing structured sparsity norm. Unlike existing methods, the presented methods have two advantages: (1) the feature subset simultaneously account for global margin information as well as locality manifold information. Consequently, both global and locality information are considered. (2) Features are selected by batch rather than individually in the algorithm framework. Thus, the interactions between features are considered and the optimal feature subset can be guaranteed. In addition, this study presents a theoretical justification. Empirical experiments demonstrate the effectiveness and efficiency of the two algorithms in comparison with some state-of-the-art feature ranking methods through a set of real-world gene expression data sets.

References

[1]
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander, "Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring," Science, vol. 286, no. 5439, pp. 531-537, 1999.
[2]
J. E. Elias and S. P. Gygi, "Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry," Nature Methods, vol. 4, no. 3, pp. 207-214, 2007.
[3]
Y. Nannya, M. Sanada, K. Nakazaki, N. Hosoya, L. Wang, A. Hangaishi, M. Kurokawa, S. Chiba, D. K. Bailey, G. C. Kennedy, and S. Ogawa, "A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays," Cancer Res., vol. 65, no. 14, pp. 6071-6079, 2005.
[4]
S. A. Armstrong, J. E. Staunton, L. B. Silverman, R. Pieters, M. L. den Boer, M. D. Minden, S. E. Sallan, E. S. Lander, T. R. Golub, and S. J. Korsmeyer, "Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia," Nature Genetics, vol. 30, no. 1, pp. 41-47, 2001.
[5]
J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith, and M. West, "Bayesian factor regression models in the large p, small n paradigm," Bayesian Statist., vol. 7, pp. 733-742, 2003.
[6]
X. He and P. Niyogi, "Locality preserving projections," in Proc. Neural Inform. Process. Syst. 16, 2003, vol. 16, pp. 234-241.
[7]
B. Liao, Y. Jiang, W. Liang, W. Zhu, L. Cai, and Z. Cao, "Gene selection using locality sensitive Laplacian score," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 11, no. 6, pp. 1146-1156, Nov. 2014.
[8]
R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. Hoboken, NJ, USA: Wiley, 2012.
[9]
X. He, D. Cai, and P. Niyogi, "Laplacian score for feature selection," in Proc. Adv. Neural Inform. Process. Syst., 2005, pp. 507-514.
[10]
S. Bandyopadhyay, S. Mallik, and A. Mukhopadhyay, "A survey and comparative study of statistical tests for identifying differential expression from microarray data," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 11, no. 1, pp. 95-115, Jan. 2014.
[11]
C. Lazar, J. Taminau, S. Meganck, D. Steenhoff, A. Coletta, C. Molter, V. de Schaetzen, R. Duque, H. Bersini, and A. Nowe, "A survey on filter techniques for feature selection in gene expression microarray analysis," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 9, no. 4, pp. 1106-1119, Jul.-Aug. 2012.
[12]
S. Niijima and Y. Okuno, "Laplacian linear discriminant analysis approach to unsupervised feature selection," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 6, no. 4, pp. 605-614, Oct. 2009.
[13]
D. Cai, C. Zhang, and X. He, "Unsupervised feature selection for multi-cluster data," in Proc. 16th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2010, pp. 333-342.
[14]
F. Nie, S. Xiang, Y. Jia, C. Zhang, and S. Yan, "Trace ratio criterion for feature selection," in Proc. 23rd Nat. Conf. Artif. Intell., 2008, vol. 2, pp. 671-676.
[15]
L.-K. Luo, D.-F. Huang, L.-J. Ye, Q.-F. Zhou, G.-F. Shao, and H. Peng, "Improving the computational efficiency of recursive cluster elimination for gene selection," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 8, no. 1, pp. 122-129, Jan. 2011.
[16]
P. A. Mundra and J. C. Rajapakse, "SVM-RFE with MRMR filter for gene selection," IEEE Trans. NanoBiosci., vol. 9, no. 1, pp. 31- 37, Mar. 2010.
[17]
C. Ding, D. Zhou, X. He, and H. Zha, "R 1-PCA: Rotational invariant l 1-norm principal component analysis for robust subspace factorization," in Proc. 23rd Int. Conf. Mach. Learn., 2006, pp. 281-288.
[18]
S. Xiang, F. Nie, G. Meng, C. Pan, and C. Zhang, "Discriminative least squares regression for multiclass classification and feature selection," IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 11, pp. 1738-1754, Nov. 2012.
[19]
X. Niyogi, "Locality preserving projections," in Proc. Neural Inform. Process. Syst., 2004, vol. 16, p. 153.
[20]
J. Friedman, T. Hastie, and R. Tibshirani, "A note on the group lasso and a sparse group lasso," arXiv preprint arXiv:1001.0736, 2010.
[21]
D. Kong, C. Ding, and H. Huang, "Robust nonnegative matrix factorization using l21-norm," in Proc. 20th ACM Int. Conf. Inform. Knowl. Manage., 2011, pp. 673-682.
[22]
S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2009.
[23]
F. Nie, H. Huang, X. Cai, and C. Ding, "Efficient and robust feature selection via joint l2, 1-norms minimization," in Proc. Adv. Neural Inf. Process. Syst., 2010, vol. 23, pp. 1813-1821.
[24]
K. B. Petersen and M. S. Pedersen, "The matrix cookbook," Tech. Univ. Denmark, pp. 7-15, 2008.
[25]
[Online]. Available: http://levis.tongji.edu.cn/gzli/data/mirrorkentridge.html, Jul. 2013.
[26]
S. L. Pomeroy, P. Tamayo, M. Gaasenbeek, L. M. Sturla, M. Angelo, M. E. McLaughlin, J. Y. Kim, L. C. Goumnerova, P. M. Black, C. Lau, C. A. Jeffrey, Z. David, M. O. James, C. Tom, W. Cynthia, A. B. Jaclyn, P. Tomaso, M. Shayan, R. Ryan, C. Andrea, S. Gustavo, N. L. David, P. M. Jill, S. L. Eric, and R. G. Todd, "Prediction of central nervous system embryonal tumour outcome based on gene expression," Nature, vol. 415, no. 6870, pp. 436-442, 2002.
[27]
U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine, "Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays," in Proc. Nat. Acad. Sci., 1999, vol. 96, no. 12, pp. 6745-6750.
[28]
A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. J. Hudson, L. Lu, D. B. Lewis, R. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, J. C. Byrd, D. Botstein, P. O. Brown, and L. M. Staudt, "Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling," Nature, vol. 403, no. 6769, pp. 503-511, 2000.
[29]
J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson, and P. S. Meltzer, "Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks," Nature Med., vol. 7, no. 6, pp. 673-679, 2001.
[30]
J. D. Spurrier, "On the null distribution of the kruskal-wallis statistic," Nonparametric Statist., vol. 15, no. 6, pp. 685-691, 2003.
[31]
F. R. Chung, Spectral Graph Theory. American Mathematical Soc., Washington, DC, vol. 92, 1997.
[32]
A. P. Bradley, "The use of the area under the ROC curve in the evaluation of machine learning algorithms," Pattern Recog., vol. 30, no. 7, pp. 1145-1159, 1997.
[33]
Z.-H. Zhou and X.-Y. Liu, "Training cost-sensitive neural networks with methods addressing the class imbalance problem," IEEE Trans. Knowl. Data Eng., vol. 18, no. 1, pp. 63-77, Jan. 2006.
[34]
Y. Piao, M. Piao, K. Park, and K. H. Ryu, "An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data," Bioinformatics, vol. 28, no. 24, pp. 3306- 3315, 2012.
[35]
C.-C. Chang and C.-J. Lin, "Libsvm: A library for support vector machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, p. 27, 2011.
[36]
S.-L. Wang, Y.-H. Zhu, W. Jia, and D.-S. Huang, "Robust classification method of tumor subtype by using correlation filters," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 9, no. 2, pp. 580-591, Mar.-Apr. 2012.
[37]
X. He, D. Cai, Y. Shao, H. Bao, and J. Han, "Laplacian regularized Gaussian mixture model for data clustering," IEEE Trans. Knowl. Data Eng., vol. 23, no. 9, pp. 1406-1418, Sep. 2011.
[38]
Q. Gu, Z. Li, and J. Han, "Joint feature selection and subspace learning," in Proc.-Int. Joint Conf. Artif. Intell., 2011, vol. 22, no. 1, p. 1294.
[39]
D. W. Huang, B. T. Sherman, and R. A. Lempicki, "Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists," Nucleic Acids Res., vol. 37, no. 1, pp. 1-13, 2009.
[40]
D. Husmeier, "Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks," Bioinformatics, vol. 19, no. 17, pp. 2271-2282, 2003.
[41]
M. Kapushesky, P. Kemmeren, A. C. Culhane, S. Durinck, J. Ihmels, C. Körner, M. Kull, A. Torrente, U. Sarkans, J. Vilo, and A. Brazma, "Expression profiler: Next generationan online platform for analysis of microarray data," Nucleic Acids Res., vol. 32, no. suppl 2, pp. W465-W470, 2004.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 12, Issue 6
November 2015
268 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 November 2015
Published in TCBB Volume 12, Issue 6

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 30
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media