Nothing Special   »   [go: up one dir, main page]

skip to main content
article

A comparison of identity merge algorithms for software repositories

Published: 01 August 2013 Publication History

Abstract

Software repository mining research extracts and analyses data originating from multiple software repositories to understand the historical development of software systems, and to propose better ways to evolve such systems in the future. Of particular interest is the study of the activities and interactions between the persons involved in the software development process. The main challenge with such studies lies in the ability to determine the identities (e.g., logins or e-mail accounts) in software repositories that represent the same physical person. To achieve this, different identity merge algorithms have been proposed in the past. This article provides an objective comparison of identity merge algorithms, including some improvements over existing algorithms. The results are validated on a selection of large ongoing open source software projects.

References

[1]
. In: Basili, V., Rombach, D., Schneider, K., Kitchenham, B., Pfahl, D., Selby, R. (Eds.), Lecture Notes in Computer Science, vol. 4336. Springer.
[2]
Hassan, A.E., Mockus, A., Holt, R.C. and Johnson, P.M., Guest editors' introduction: Special issue on mining software repositories. IEEE Transactions on Software Engineering. v31. 426-428.
[3]
Fernandez-Ramil, J., Lozano, A., Wermelinger, M. and Capiluppi, A., Empirical studies of open source evolution. In: Mens, T., Demeyer, S. (Eds.), Software Evolution, Springer. pp. 263-288.
[4]
Milev, R., Muegge, S. and Weiss, M., Design evolution of an open source project using an improved modularity metric. In: IFIP, vol. 299. Springer. pp. 20-33.
[5]
Mockus, A., Fielding, R.T. and Herbsleb, J.D., Two case studies of open source software development: Apache and Mozilla. ACM Transactions on Software Engineering and Methodology. v11. 309-346.
[6]
Robles, G., Gonzalez-Barahona, J.M. and Herraiz, I., Evolution of the core team of developers in libre software projects. In: Proc. IEEE International Working Conference on Mining Software Repositories, IEEE Computer Society, Washington, DC, USA. pp. 167-170.
[7]
Weiss, M., Moroiu, G. and Zhao, P., Evolution of open source communities. In: Damiani, E., Fitzgerald, B., Scacchi, W., Scotto, M., Succi, G. (Eds.), IFIP, vol. 203. Springer. pp. 21-32.
[8]
J. Gutsche, The evolution of open source communities: an institutional analysis, Technical Report, Technische Universität Dresden, 2004.
[9]
J. Martinez-Romo, G. Robles, J.M. González-Barahona, M. Ortuño-Perez, Using social network analysis techniques to study collaboration between a FLOSS community and a company, in: Open Source Development, Communities and Quality, vol. 275, pp. 171-186.
[10]
Lungu, M.F., Towards reverse engineering software ecosystems. In: Proc. International Conference on Software Maintenance, IEEE. pp. 428-431.
[11]
Nia, R., Bird, C., Devanbu, P.T. and Filkov, V., Validity of network analyses in open source projects. In: Whitehead, J., Zimmermann, T. (Eds.), Proc. International Working Conference on Mining Software Repositories, IEEE. pp. 201-209.
[12]
A. Capiluppi, P. Lago, M. Morisio, Evidences in the evolution of OS projects through changelog analyses, in: Proc. 3rd Workshop on Open Source Software Engineering, pp. 19-24.
[13]
Yu, L., Mining change logs and release notes to understand software maintenance and evolution. CLEI Electronic Journal. v12.
[14]
Chen, K., Schach, S.R., Yu, L., Offutt, J. and Heller, G.Z., Open-source change logs. Empirical Software Engineering. v9. 197-210.
[15]
Bird, C., Gourley, A., Devanbu, P.T., Gertz, M. and Swaminathan, A., Mining email social networks. In: Diehl, S., Gall, H., Hassan, A.E. (Eds.), Proc. International Working Conference on Mining Software Repositories, ACM. pp. 137-143.
[16]
Navarro, G., A guided tour to approximate string matching. ACM Computing Surveys. v33.
[17]
Developer identification methods for integrated data from various sources. In: Proc. International Working Conference on Mining Software Repositories, ACM. pp. 106-110.
[18]
Stephany, F., Mens, T. and Gírba, T., Maispion: a tool for analysing and visualising open source software developer communities. In: Proc. International Workshop on Smalltalk Technologies, ACM, New York, NY, USA. pp. 50-57.
[19]
W. Poncin, A. Serebrenik, M. van¿den Brand, Process mining software repositories, in: Proc. European Conference on Software Maintenance and Reengineering, 2011, pp. 5-14.
[20]
W. Poncin, Process mining software repositories, Master's Thesis, Eindhoven University of Technology, 2010.
[21]
Stol, K.-J. and Babar, M.A., Reporting empirical research in open source software: the state of practice. In: IFIP, vol. 299. Springer. pp. 156-169.
[22]
Goeminne, M. and Mens, T., A framework for analysing and visualising open source software ecosystems. In: Proc. Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), ACM, New York, NY, USA. pp. 42-47.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Science of Computer Programming
Science of Computer Programming  Volume 78, Issue 8
August, 2013
236 pages

Publisher

Elsevier North-Holland, Inc.

United States

Publication History

Published: 01 August 2013

Author Tags

  1. Comparison
  2. Empirical software engineering
  3. Identity merging
  4. Open source
  5. Software evolution
  6. Software repository mining

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Grounded Theory of Cross-Community SECOs: Feedback Diversity Versus SynchronizationIEEE Transactions on Software Engineering10.1109/TSE.2023.331387549:10(4731-4750)Online publication date: 1-Oct-2023
  • (2022)Recognizing Bot Activity in Collaborative Software DevelopmentIEEE Software10.1109/MS.2022.317860139:5(56-61)Online publication date: 1-Sep-2022
  • (2022)A systematic process for Mining Software RepositoriesInformation and Software Technology10.1016/j.infsof.2021.106791144:COnline publication date: 9-May-2022
  • (2022)A mixed-methods analysis of micro-collaborative coding practices in OpenStackEmpirical Software Engineering10.1007/s10664-022-10167-w27:5Online publication date: 18-Jun-2022
  • (2020)Bot or not?Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops10.1145/3387940.3391503(31-35)Online publication date: 27-Jun-2020
  • (2020)How do companies collaborate in open source ecosystems?Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering10.1145/3377811.3380376(1196-1208)Online publication date: 27-Jun-2020
  • (2019)An empirical study of multiple names and email addresses in OSS version control repositoriesProceedings of the 16th International Conference on Mining Software Repositories10.1109/MSR.2019.00068(409-420)Online publication date: 26-May-2019
  • (2019)SortingHatProceedings of the 41st International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion.2019.00036(51-54)Online publication date: 25-May-2019
  • (2018)Companies' domination in FLOSS developmentProceedings of the 40th International Conference on Software Engineering: Companion Proceeedings10.1145/3183440.3195047(440-441)Online publication date: 27-May-2018
  • (2017)Linking Multiple Online Identities in Criminal Investigations: A Spectral Co-Clustering FrameworkIEEE Transactions on Information Forensics and Security10.1109/TIFS.2017.270490612:9(2242-2255)Online publication date: 1-Sep-2017
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media