Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/860435.860533acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Classification of source code archives

Published: 28 July 2003 Publication History

Abstract

The World Wide Web contains a number of source code archives. Programs are usually classified into various categories within the archive by hand. We report on experiments for automatic classification of source code into these categories. We examined a number of factors that affect classification accuracy. Weighting features by expected entropy loss makes a significant improvement in classification accuracy. We show a Support Vector Machine can be trained to classify source code with a high degree of accuracy. We feel these results show promise for software reuse.

References

[1]
Abramson N. Information Theory and Coding, McGraw-Hill, New York, 1963.
[2]
Bennett K P and Campbell C. Support Vector Machines: Hype or Hallelujah. ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) Explorations, Vol. 2(2), 1--13, 2000.
[3]
Chang C and Lin C. LIBSVM: A library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[4]
Chen A, Lee Y K, Yao A Y, and Michail A. Code search based on CVS comments: A preliminary evaluation (Technical Report 0106). School of Computer Science and Eng., University of New South Wales, Australia, 2001.
[5]
Dumais S T. Using SVMs for text categorization. IEEE Intelligent Systems Magazine, Trends and Controversies, Vol. 13(4), 21--23, 1998.
[6]
Joachims T. Text categorization with support vector machines. In Proceedings of the Tenth European Confernce on Machine Learning, 137--142, 1999.
[7]
Krueger C W. Software reuse. ACM Computing Surveys, Vol. 24(2), 131--183, 1992.
[8]
Merkl D. Content-based software classification by self-organization. In Proceedings of the IEEE International Conference on Neural Networks, 1086--1091, 1995.
[9]
Rosson M B and Carroll J M. The reuse of reuses in Smalltalk programming. ACM Transactions on Computer-Human Interaction, Vol. 3(3), 219--253, 1996.
[10]
Stierna E J and Rowe N C. Applying information-retrieval methods to software reuse. Information Processing and Management, Vol. 39(1), 67--74, 2003.
[11]
Ugurel S, Krovetz R, Giles C L, Pennock D, Glover E, and Zha Hongyuan. What's the Code? Automatic Classification of Source Code Archives. In Proceedings of the ACM SIGKDD Conference on Knowledge and Data Discovery, 2002.

Cited By

View all
  • (2009)Effectively Searching Maps in Web DocumentsProceedings of the 31th European Conference on IR Research on Advances in Information Retrieval10.1007/978-3-642-00958-7_17(162-176)Online publication date: 18-Apr-2009
  • (2008)Classifying Software ChangesIEEE Transactions on Software Engineering10.1109/TSE.2007.7077334:2(181-196)Online publication date: 1-Mar-2008

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
July 2003
490 pages
ISBN:1581136463
DOI:10.1145/860435
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SVM
  2. classification
  3. software reuse
  4. source code archive

Qualifiers

  • Article

Conference

SIGIR03
Sponsor:

Acceptance Rates

SIGIR '03 Paper Acceptance Rate 46 of 266 submissions, 17%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2009)Effectively Searching Maps in Web DocumentsProceedings of the 31th European Conference on IR Research on Advances in Information Retrieval10.1007/978-3-642-00958-7_17(162-176)Online publication date: 18-Apr-2009
  • (2008)Classifying Software ChangesIEEE Transactions on Software Engineering10.1109/TSE.2007.7077334:2(181-196)Online publication date: 1-Mar-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media