Nothing Special   »   [go: up one dir, main page]

skip to main content
article

An extended assessment of type-3 clones as detected by state-of-the-art tools

Published: 01 June 2011 Publication History

Abstract

Code reuse through copying and pasting leads to so-called software clones. These clones can be roughly categorized into identical fragments (type-1 clones), fragments with parameter substitution (type-2 clones), and similar fragments that differ through modified, deleted, or added statements (type-3 clones). Although there has been extensive research on detecting clones, detection of type-3 clones is still an open research issue due to the inherent vagueness in their definition. In this paper, we analyze type-3 clones detected by state-of-the-art tools and investigate type-3 clones in terms of their syntactic differences. Then, we derive their underlying semantic abstractions from their syntactic differences. Finally, we investigate whether there are code characteristics that indicate that a tool-suggested clone candidate is a real type-3 clone from a human's perspective. Our findings can help developers of clone detectors and clone refactoring tools to improve their tools.

References

[1]
Baker, B. S. (1995). On finding duplication and near-duplication in large software systems. In L. Wills, P. Newcomb, & E. Chikofsky (Eds.), Proceedings of WCRE (pp. 86-95).
[2]
Balazinska, M., Merlo, E. M., Dagenais, M., Lague, B., & Kontogiannis, K. (1999). Measuring clone based reengineering opportunities. In IEEE symposium on software metrics (pp. 292-303). IEEE Computer Society Press.
[3]
Balazinska, M., Merlo, E., Dagenais, M., Lague, B., & Kontogiannis, K. (2000). Advanced clone-analysis to support object-oriented system refactoring. In WCRE (pp. 98-107). IEEE Computer Society Press.
[4]
Baxter, I. D., Yahin, A., Moura, L., Sant'Anna, M., & Bier, L. (1998). Clone detection using abstract syntax trees. In T. M. Koshgoftaar & K. Bennett (Eds.), ICSM, (pp. 368-378).
[5]
Bellon, S., Koschke, R., Antoniol, G., Krinke, J., & Merlo, E. (2007). Comparison and evaluation of clone detection tools. IEEE Computer Society Transactions on Software Engineering, 33, 577-591.
[6]
Chen, X., Kwong, S., & Li, M. (2000) A compression algorithm for dna sequences and its applications in genome comparison. In RECOMB '00: Proceedings of the fourth annual international conference on computational molecular biology (p. 107). New York, NY, USA: ACM.
[7]
Chen, X., Francia, B., Li, M., Mckinnon, B., & Seker, A. (2004). Shared information and program plagiarism detection. Transactions on Information Theory, 50(7), 1545-1551.
[8]
Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1, 269-271.
[9]
Ducasse, S., Rieger, M., & Demeyer, S. (1999). A language independent approach for detecting duplicated code. In ICSM '99: Proceedings of the IEEE international conference on software maintenance (p. 109). Washington, DC, USA: IEEE Computer Society.
[10]
Evans, W. S., Fraser, C. W., & Ma, F. (2007). Clone detection via structural abstraction. In WCRE (pp. 150-159).
[11]
Falke, R., Koschke, R., & Frenzel, P. (2008). Empirical evaluation of clone detection using syntax suffix trees. Empirical Software Engineering, 13(6), 601-643.
[12]
Frenzel, P., Koschke, R., Breu, A. P. J., & Angstmann, K. (2007). Extending the reflection method for consolidating software variants into product lines. In WCRE (pp. 160-169). IEEE Computer Society Press.
[13]
Higo, Y., Kamiya, T., Kusumoto, S., & Inoue, K. (2004). Aries: Refactoring support environment based on code clone analysis. In IASTED Conference on software engineering and applications (pp. 222-229).
[14]
Higo, Y., Kamiya, T., Kusumoto, S., & Inoue, K. (2007). Method and implementation for investigating code clones in a software system. Information and Software Technology, 49(9-10), 985-998.
[15]
Jia, Y., Binkley, D., Harman, M., Krinke, J., & Matsushita, M. (2009) Kclone: A proposed approach to fast precise code clone detection. In Proceedings of CSMR'09 (pp. 12-16).
[16]
Kamiya, T., Kusumoto, S., & Inoue, K. (2002). CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Computer Society Transactions on Software Engineering, 28(7), 654-670.
[17]
Kapser, C., Anderson, P., Godfrey, M., Koschke, R., Rieger, M., van Rysselberghe, F., & Weißgerber, P. (2007). Subjectivity in clone judgment: Can we ever agree? In Duplication, redundancy, and similarity in software, dagstuhl seminar proceedings, No. 06301.
[18]
Kapser, C. J., & Godfrey, M. W. (2003a) A taxonomy of clones in source code: The re-engineers most wanted list. In Proceedings of IWDSC'03.
[19]
Kapser, C. J., & Godfrey, M. W. (2003b) Toward a taxonomy of clones in source code: A case study. In Evolution of large scale industrial software architectures (pp. 67-78).
[20]
Kapser, C. J., & Godfrey, M. W. (2006). Supporting the analysis of clones in software systems: Research articles. Journal of Software Maintenance and Evolution, 18(2), 61-82.
[21]
Koschke, R. (2007). Survey of research on software clones. In R. Koschke, E. Merlo, & A. Walenstein (Eds.), Duplication, redundancy, and similarity in software, Dagstuhl seminar proceedings.
[22]
Koschke, R. (2008a). Frontiers in software clone management. In Proceedings of the international conference on software maintenance.
[23]
koschke, R. (2008b). Identifying and removing software clones, chap. 2 (pp. 15-39). Berlin: Springer.
[24]
Koschke, R., Girard, J. F., Würthner, M. (1998). Intermediate representations for reverse engineering. In WCRE (pp. 241-250). IEEE Computer Society Press.
[25]
Koschke, R., Frenzel, P., Breu, A. P., & Angstmann, K. (2009). Extending the reflexion method for consolidating software variants into product lines. Software Quality Journal, 17(4), 331-366.
[26]
Krinke, J. (2001). Identifying similar code with program dependence graphs. In WCRE (pp. 301-309).
[27]
Li, M., Chen, X., Li, X., Ma, B., & Vitányi, P. M. B. (2004). The similarity metric. Transactions on Information Theory, 50(12), 3250-3264.
[28]
Mayrand, J., Leblanc, C., & Merlo, E. (1996). Experiment on the automatic detection of function clones in a software system using metrics. In ICSM (p. 244). IEEE Computer Society.
[29]
Mende, T., Beckwermert, F., Koschke, R., & Meier, G. (2008). Supporting the grow-and-prune model in software product lines evolution using clone detection. In European Conference on Software Maintenance and Reengineering (pp. 163-172). IEEE Computer Society Press.
[30]
Mende, T., Koschke, R., & Beckwermert, F. (2009). An evaluation of code similarity identification for the grow-and-prune model. Journal of Software Maintenance and Evolution: Research and Practice, 21(2), 143-169.
[31]
Nevill-Manning, C. G., & Witten, I. H. (1997). Linear-time, incremental hierarchy inference for compression. In DCC (pp. 3-11). Washington, DC, USA: IEEE Computer Society.
[32]
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
[33]
Roy, C. K., & Cordy, J. R. (2007). A survey on software clone detection research. Technical report no. 2007- 541. Ontario, Canada: School of Computing, Queen's University at Kingston.
[34]
Roy, C. K., Cordy, J. R., & Koschke, R. (2009) Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Journal of Science of Computer Programming accepted for publication.
[35]
Selkow, S. M. (1977). The tree-to-tree editing problem. Information Processing Letters, 6(6), 184-186.
[36]
Shasha, D., & Zhang, K. (1989). Fast parallel algorithms for the unit cost editing distance between trees. In SPAA '89: Proceedings of the first annual ACM symposium on parallel algorithms and architectures (pp. 117-126). New York, NY, USA: ACM.
[37]
Smith, R., & Horwitz, S. (2009). Detecting and measuring similarity in code clones.
[38]
Tai, K. C. (1979). The tree-to-tree correction problem. J ACM, 26(3), 422-433.
[39]
Tiarks, R., Koschke, R., & Falke, R. (2009). An assessment of type-3 clones as detected by state-of-the-art tools. In Workshop source code analysis and manipulation (pp. 67-76). IEEE Computer Society Press.
[40]
Valiente, G. (2002). Algorithms on trees and graphs, 1st Ed. New York: Springer.
[41]
Walenstein, A. (2007). Code clones: Reconsidering terminology. In Duplication, Redundancy, and Similarity in Software, Dagstuhl Seminar Proceedings, No. 06301.
[42]
Walenstein, A., Jyoti, N., Li, J., Yang, Y., & Lakhotia, A. (2003). Problems creating task-relevant clone detection reference data. In WCRE. IEEE Computer Society Press.
[43]
Walenstein, A., El-Ramly, M., Cordy, J. R., SW, Mahdavi, K., Pizka, M., Ramalingam, G., & von Gudenberg, J. W. (2007a). Similarity in programs. In Duplication, redundancy, and similarity in software.
[44]
Walenstein, A., Venable, M., Hayes, M., Thompson, C., & Lakhotia, A. (2007b) Exploiting similarity between variants to defeat malware. In Proceedings of BlackHat 2007 DC Briefings.
[45]
Zhang, K. (1995). Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recognition, 28(3), 463-474.
[46]
Zhang, K., & Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Scientific Computing, 18(6), 1245-1262.
[47]
Ziv, J., & Lempel, A. (1977). A universal algorithm for sequential data compression. Transactions on Information Theory, 23(3), 337-343. URL http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1055714.

Cited By

View all
  • (2024)Behind the Intent of Extract Method Refactoring: A Systematic Literature ReviewIEEE Transactions on Software Engineering10.1109/TSE.2023.334580050:4(668-694)Online publication date: 4-Jan-2024
  • (2020)An automated approach to assess the similarity of GitHub repositoriesSoftware Quality Journal10.1007/s11219-019-09483-028:2(595-631)Online publication date: 1-Jun-2020
  • (2018)An empirical study on how project context impacts on code cloningJournal of Software: Evolution and Process10.1002/smr.211530:12Online publication date: 12-Dec-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Software Quality Journal
Software Quality Journal  Volume 19, Issue 2
June 2011
248 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2011

Author Tags

  1. Clone categorization
  2. Software clones
  3. Type-3 clones

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Behind the Intent of Extract Method Refactoring: A Systematic Literature ReviewIEEE Transactions on Software Engineering10.1109/TSE.2023.334580050:4(668-694)Online publication date: 4-Jan-2024
  • (2020)An automated approach to assess the similarity of GitHub repositoriesSoftware Quality Journal10.1007/s11219-019-09483-028:2(595-631)Online publication date: 1-Jun-2020
  • (2018)An empirical study on how project context impacts on code cloningJournal of Software: Evolution and Process10.1002/smr.211530:12Online publication date: 12-Dec-2018
  • (2017)Reengineering legacy applications into software product linesEmpirical Software Engineering10.1007/s10664-017-9499-z22:6(2972-3016)Online publication date: 1-Dec-2017
  • (2015)Classification model for code clones based on machine learningEmpirical Software Engineering10.1007/s10664-014-9316-x20:4(1095-1125)Online publication date: 1-Aug-2015
  • (2014)Feature location for software product line migrationProceedings of the 18th International Software Product Line Conference: Companion Volume for Workshops, Demonstrations and Tools - Volume 210.1145/2647908.2655967(52-59)Online publication date: 15-Sep-2014
  • (2014)How should we measure functional sameness from program source code? an exploratory study on Java methodsProceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering10.1145/2635868.2635886(294-305)Online publication date: 11-Nov-2014
  • (2013)Towards a curated collection of code clonesProceedings of the 7th International Workshop on Software Clones10.5555/2662708.2662720(53-59)Online publication date: 19-May-2013
  • (2013)Understanding the evolution of type-3 clones: an exploratory studyProceedings of the 10th Working Conference on Mining Software Repositories10.5555/2487085.2487117(139-148)Online publication date: 18-May-2013

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media