Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1137983.1137990acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

TA-RE: an exchange language for mining software repositories

Published: 22 May 2006 Publication History

Abstract

Software repositories have been getting a lot of attention from researchers in recent years. In order to analyze software repositories, it is necessary to first extract raw data from the version control and problem tracking systems. This poses two challenges: (1) extraction requires a non-trivial effort, and (2) the results depend on the heuristics used during extraction. These challenges burden researchers that are new to the community and make it difficult to benchmark software repository mining since it is almost impossible to reproduce experiments done by another team. In this paper we present the TA-RE corpus. TA-RE collects extracted data from software repositories in order to build a collection of projects that will simplify extraction process. Additionally the collection can be used for benchmarking. As the first step we propose an exchange language capable of making sharing and reusing data as simple as possible.

References

[1]
J. Bevan and E. J. Whitehead, Jr., "Identification of Software Instabilities," Proc. of 2003 Working Conference on Reverse Engineering (WCRE 2003), Victoria, Canada, 2003.
[2]
J. Bevan, E. J. Whitehead, Jr., S. Kim, and M. Godfrey, "Facilitating Software Evolution with Kenyon," Proc. of the 2005 European Software Engineering Conference and 2005 Foundations of Software Engineering (ESEC/FSE 2005), Lisbon, Portugal, pp. 177--186, 2005.
[3]
D. Beyer and A. Noack, "Clustering Software Artifacts Based on Frequent Common Changes," Proc. of the 13th IEEE International Workshop on Program Comprehension (IWPC 2005), St. Louis, Missouri, USA, pp. 259--268, 2005.
[4]
V. Dallmeier, P. Weißgerber, and T. Zimmermann, "APFEL: A Preprocessing Framework For Eclipse," 2005, http://www.st.cs.uni-sb.de/softevo/apfel/.
[5]
S. G. Eick, T. L. Graves, A. F. Karr, J. S. Marron, and A. Mockus, "Does Code Decay? Assessing the Evidence from Change Management Data," IEEE Transactions on Software Engineering, vol. 27, pp. 1--12., 2001.
[6]
M. Fischer, M. Pinzger, and H. Gall, "Populating a Release History Database from Version Control and Bug Tracking Systems," Proc. of 2003 Int'l Conference on Software Maintenance (ICSM'03), pp. 23--32, 2003.
[7]
M. W. Godfrey and L. Zou, "Using Origin Analysis to Detect Merging and Splitting of Source Code Entities," IEEE Trans. on Software Engineering, vol. 31, pp. 166--181, 2005.
[8]
T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy, "Predicting Fault Incidence Using Software Change History," IEEE Transactions on Software Engineering, vol. 26, pp. 653--661, 2000.
[9]
T. L. Graves and A. Mockus, "Inferring Change Effort from Configuration Management Data," Proc. of In Metrics 98: Fifth International Symposium on Software Metrics, Bethesda, Maryland, pp. 267--273, 1998.
[10]
M. Kim, V. Sazawal, D. Notkin, and G. Murphy, "An Empirical Study of Code Clone Genealogies," Proc. of the 2005 European Software Engineering Conference and 2005 Foundations of Software Engineering (ESEC/FSE 2005), Lisbon, Portugal, pp. 187--196, 2005.
[11]
S. Kim, K. Pan, and E. J. Whitehead, Jr., "When Functions Change Their Names: Automatic Detection of Origin Relationships," Proc. of 12th Working Conference on Reverse Engineering (WCRE 2005), Pennsylvania, USA, 2005.
[12]
S. Kim, E. J. Whitehead, Jr., and J. Bevan, "Analysis of Signature Change Patterns," Proc. of Int'l Workshop on Mining Software Repositories (MSR 2005), Saint Louis, Missouri, USA, pp. 64--68, 2005.
[13]
D. Lewis, Y. Yang, T. Rose, and F. Li, "RCV1: A New Benchmark Collection for Text Categorization Research " Journal of Machine Learning Research, vol. 5, pp. 361--397, 2004.
[14]
A. Mockus, R. F. Fielding, and J. Herbsleb, "A Case Study of Open Source Development: The Apache Server," Proc. of 22nd Int'l Conference on Software Engineering (ICSE 2000), Limerick, Ireland, pp. 263--272 2000.
[15]
A. Mockus and J. Herbsleb, "Expertise Browser: A Quantitative Approach to Identifying Expertise," Proc. of 24rd Int'l Conference on Software Engineering (ICSE 2002), Orlando, Florida, pp. 503--512, 2002.
[16]
A. Mockus and L. G. Votta, "Identifying Reasons for Software Changes Using Historic Databases," Proc. of International Conference on Software Maintenance (ICSM 2000), San Jose, California, USA, pp. 120--130, 2000.
[17]
A. Mockus and D. M. Weiss, "Globalization by Chunking: a Quantitative Approach," IEEE Software, vol. 18, pp. 30--37, 2001.
[18]
A. Mockus, P. Zhang, and P. Li, "Drivers for Customer Perceived Software Quality," Proc. of 2005 Int'l Conference on Software Engineering (ICSE 2005), Saint Louis, Missouri, USA, 2005.
[19]
D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz, "UCI Repository of machine learning databases," 1988, http://www.ics.uci.edu/~mlearn/MLRepository.html.
[20]
J. Sayyad Shirabad and T. J. Menzies, "The PROMISE Repository of Software Engineering Databases," 2005, http://promise.site.uottawa.ca/SERepository.
[21]
J. Sliwerski, T. Zimmermann, and A. Zeller, "When Do Changes Induce Fixes?" Proc. of Int'l Workshop on Mining Software Repositories (MSR 2005), Saint Louis, Missouri, USA, pp. 24--28, 2005.
[22]
T. Zimmermann and P. Weißgerber, "Preprocessing CVS Data for Fine-Grained Analysis," Proc. of Int'l Workshop on Mining Software Repositories (MSR 2004), Edinburgh, Scotland, pp. 2--6, 2004.
[23]
T. Zimmermann, P. Weißgerber, S. Diehl, and A. Zeller, "Mining Version Histories to Guide Software Changes," IEEE Trans. Software Engineering, vol. 31, pp. 429--445, 2005.

Cited By

View all
  • (2023)A Robust Pedestrian Re-Identification and Out-Of-Distribution Detection FrameworkDrones10.3390/drones70603527:6(352)Online publication date: 27-May-2023
  • (2021)World of code: enabling a research workflow for mining and analyzing the universe of open source VCS dataEmpirical Software Engineering10.1007/s10664-020-09905-926:2Online publication date: 25-Feb-2021
  • (2020)Standing on shoulders or feet? An extended study on the usage of the MSR data papersEmpirical Software Engineering10.1007/s10664-020-09834-7Online publication date: 18-Jul-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '06: Proceedings of the 2006 international workshop on Mining software repositories
May 2006
191 pages
ISBN:1595933972
DOI:10.1145/1137983
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 May 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. analysis
  2. corpus
  3. prediction
  4. software repository mining

Qualifiers

  • Article

Conference

ICSE06
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Robust Pedestrian Re-Identification and Out-Of-Distribution Detection FrameworkDrones10.3390/drones70603527:6(352)Online publication date: 27-May-2023
  • (2021)World of code: enabling a research workflow for mining and analyzing the universe of open source VCS dataEmpirical Software Engineering10.1007/s10664-020-09905-926:2Online publication date: 25-Feb-2021
  • (2020)Standing on shoulders or feet? An extended study on the usage of the MSR data papersEmpirical Software Engineering10.1007/s10664-020-09834-7Online publication date: 18-Jul-2020
  • (2019)Progressive processing of system-behavioral queryProceedings of the 35th Annual Computer Security Applications Conference10.1145/3359789.3359818(378-389)Online publication date: 9-Dec-2019
  • (2019)Standing on shoulders or feet?Proceedings of the 16th International Conference on Mining Software Repositories10.1109/MSR.2019.00085(565-576)Online publication date: 26-May-2019
  • (2019)World of codeProceedings of the 16th International Conference on Mining Software Repositories10.1109/MSR.2019.00031(143-154)Online publication date: 26-May-2019
  • (2015)RepMineProceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW)10.1109/ASEW.2015.20(78-81)Online publication date: 9-Nov-2015
  • (2014)Querying sequential software engineering dataProceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering10.1145/2635868.2635902(700-710)Online publication date: 11-Nov-2014
  • (2013)OrionProceedings of the 2013 18th International Conference on Engineering of Complex Computer Systems10.1109/ICECCS.2013.42(242-245)Online publication date: 17-Jul-2013
  • (2012)Online sharing and integration of results from mining software repositoriesProceedings of the 34th International Conference on Software Engineering10.5555/2337223.2337510(1644-1646)Online publication date: 2-Jun-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media