Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/WI-IAT.2009.32acmconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
Article

Rank Aggregation Based Text Feature Selection

Published: 15 September 2009 Publication History

Abstract

Filtering feature selection method (filtering method, for short) is a well-known feature selection strategy in pattern recognition and data mining. Filtering method outperforms other feature selection methods in many cases when the dimension of features is large. There are so many filtering methods proposed in previous work leading to the “selection trouble” that how to select an appropriate filtering method for a given text data set. Since to find the best filtering method is usually intractable in real application, this paper takes an alternative path. We propose a feature selection framework that fuses the results obtained by different filtering methods. In fact, deriving a better rank list from different rank lists, known as rank aggregation, is a hot topic studied in many disciplines. Based on the proposed framework and Markov chains rank aggregation techniques, in this paper, we present two new feature selection methods: FR-MC1 and FR-MC4. We also introduce a perturbation algorithm to alleviate the drawbacks of Markov chains rank aggregation techniques. Empirical evaluation on two public text data sets shows that the two new feature selection methods achieve better or comparable results than classical filtering methods, which also demonstrate the effectiveness of our framework.

References

[1]
I. Guyon et al., "An Introduction to Variable and Feature Selection", Journal of Machine Learning Research, 3 (2003) pp. 1157-1182.
[2]
Z. Zhu et al., "Wrapper-filter feature selection algorithm using a memetic framework", IEEE Trans. System Man and Cybernetics. Part B, 2007, Vol.37, No.1, pp. 70-76.
[3]
L. Molina et al., "Feature Selection Algorithms: A Survey and Experimental Evaluation", Pro. of International Conference on Data Mining, 2002, pp. 306-313.
[4]
X. Geng et al., "Feature Selection for Ranking", Pro. of ACM SIGIR Conference, 2007, pp. 407-414.
[5]
A. Dasgupta et al., "Feature Selection Methods for Text Classification", Pro. of ACM KDD Conference, 2007, pp. 230-239.
[6]
L. Breiman et al., "Classification and regression trees", Wadsworth and Brooks, 1984.
[7]
E. Cantú-Paz et al., "Feature Selection in Scientific Applications", Pro. of ACM KDD Conference, pp. 788- 793, 2004.
[8]
C. Dwork, "Rank aggregation revisited", Manuscript, http://www.eecs.harvard.edu/~michaelm/CS222/rank2.p df, 2001.
[9]
C. Dwork et al., "Rank aggregation methods for the Web", Pro. of World Wide Web Conference, 2001, pp. 613- 622.
[10]
M. Farah et al., "An Outranking Approach for Rank Aggregation in Information Retrieval", Pro. of ACM SIGIR Conference, 2007, pp. 591-598.
[11]
M. Renda et al. "Web Metasearch: Rank vs. score based rank aggregation methods", Pro. Of ACM SOAC Conference, 2003, pp. 841-846.
[12]
J. Lee et al., "Analyses of multiple evidence combination", Pro. of ACM SIGIR Conference, 1997, pp. 267-276.
[13]
H. Young et al., "A consistent extension of Condorcet's election principle", SIAM Journal on Applied Math, 1978, 35(2):285-300.
[14]
G. Forman, "An extensive empirical study of feature selection metrics for text classification", Journal of Machine Learning Research, 2003, 3(1):1533-7928.
[15]
T. Liu et al, "An evaluation on feature selection for text clustering", Proc. of International Conference on Machine Learning, 2003, pp. 488-495.
[16]
Y. Yang et al., "A Comparative Study on Feature Selection in Text Categorization", Proc. of International Conference on Machine Learning, 1997 pp. 412-420.
[17]
V. Ng et al., "Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Review", Proc. of the COLING/ACL Main Conference Poster Sessions, 2006, pp. 611-618.
[18]
K. Nigam et al., "Text classification from labeled and unlabeled documents using EM", Machine Learning, 2000, 39(2):103-134.
[19]
S. Householder, "The Theory of Matrices in Numerical Analysis", Blaisdell Publishing Company, 1964.
[20]
T. Joachims, "Text Categorization with Support Vector Machines: Learning with Many Relevant Features", Proc. the 10th ECML, pp.137-142, 1998.
[21]
http://www.sogou.com/labs/dl/t.html.
[22]
S. Tan et al., "A Novel Refinement Approach for Text Categorization", Pro. of ACM CIKM Conference, pp. 469-476, 2005.
[23]
http://www.nlp.org.cn/docs/download.php?doc_id=294
[24]
http://sewm.pku.edu.cn/QA/
[25]
http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
[26]
http://kdd.ics.uci.edu/databases/reuters21578/reuters215 78.html

Cited By

View all
  • (2021)A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect PredictionComputational Intelligence and Neuroscience10.1155/2021/50690162021Online publication date: 1-Jan-2021
  • (2013)Similarity Aggregation a New Version of Rank Aggregation Applied to Credit Scoring CaseProceedings of the First International Conference on Mining Intelligence and Knowledge Exploration - Volume 828410.1007/978-3-319-03844-5_61(618-628)Online publication date: 18-Dec-2013
  • (2013)Rank Aggregation for Filter Feature Selection in Credit ScoringProceedings of the First International Conference on Mining Intelligence and Knowledge Exploration - Volume 828410.1007/978-3-319-03844-5_2(7-15)Online publication date: 18-Dec-2013

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WI-IAT '09: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
September 2009
726 pages
ISBN:9780769538013

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 15 September 2009

Check for updates

Author Tags

  1. Markov chains
  2. Rank aggregation
  3. Text feature selection

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2021)A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect PredictionComputational Intelligence and Neuroscience10.1155/2021/50690162021Online publication date: 1-Jan-2021
  • (2013)Similarity Aggregation a New Version of Rank Aggregation Applied to Credit Scoring CaseProceedings of the First International Conference on Mining Intelligence and Knowledge Exploration - Volume 828410.1007/978-3-319-03844-5_61(618-628)Online publication date: 18-Dec-2013
  • (2013)Rank Aggregation for Filter Feature Selection in Credit ScoringProceedings of the First International Conference on Mining Intelligence and Knowledge Exploration - Volume 828410.1007/978-3-319-03844-5_2(7-15)Online publication date: 18-Dec-2013

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media