Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2684200.2684290acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

An Empirical Study on Retrieving Structural Clones Using Sequence Pattern Mining Algorithms

Published: 04 December 2014 Publication History

Abstract

Many clone detection techniques focus on fragments of duplicated code, i.e., simple clones. Structural clones are simple clones within a syntactic boundary that are good candidates for refactoring. In this paper, a new approach for detection of structural clones in source code is presented. The proposed approach is parse-tree-based and is enhanced by frequent subsequence mining. It comprises three stages: preprocessing, mining frequent statement sequences, and fine-matching for structural clones using a modified longest common subsequence (LCS) algorithm. The lengths of control statements in a programming language and method identifiers differ; thus, a conventional LCS algorithm does not return the expected length of matched identifiers. We propose an encoding algorithm for control statements and method identifiers. Retrieval experiments were conducted using the Java SWING source code. The results show that the proposed data mining algorithm detects clones comprising 51 extracted statements. Our modified LCS algorithm retrieves a number of structural clones with arbitrary statement gaps.

References

[1]
Baker, B. S. 1995. On finding duplication and near-duplication in large software systems. In Proceedings of the 2nd Working Conference on Reverse Engineering (July 1995), 86--95,
[2]
Ducasse, S, Rieger, M., and Demeyer, S. 1999. A language independent approach for detecting duplicated code. In Proceedings of the International Conference on Software Maintenance (Sep. 1999), 109--118.
[3]
Roy, C.K. and Cordy J.R. 2007. A survey on software clone detection research. Queen's Technical Report:541. Queen's University at Kingston, Ontario, Canada (Sep.2007), 1--115.
[4]
Basit, H.A. and Jarzabek, S. 2009. A Data Mining Approach for Detecting Higher-level Clones in Software. IEEE Transactions on Software Engineering 35, 4 (Feb. 2009), 497--514.
[5]
Baxter, I. D., Yahin, A., Moura, L., Sant'Anna, M., and Bier, L. 1998. Clone Detection Using Abstract Syntax Trees. In Proceedings of the 14th International Conference on Software Maintenance (Nov 1998), 368--377.
[6]
Krinke, J. 2001. Identifying Similar Code with Program Dependence Graphs. In Proceedings of the 8th Working Conference on Reverse Engineering (Oct. 2001), 301--309.
[7]
Koschke, R., Falke, R., and Frenzel, P. 2006. Clone Detection Using Abstract Syntax Suffix Trees. In Proceedings of the 13th Working Conference on Reverse Engineering (Oct. 2006), 253--262.
[8]
Li, Z., Lu, S., Myagmar, S., and Zhou, Y. 2004. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code. In Proceedings of the 6th Symposium on Operating System Design and Implementation (Dec, 2004), 289--302.
[9]
Roy, C.K. and Cordy J.R. 2008. NICAD: Accurate Detection of Near-Miss Intentional Clons Using Flexible Pretty-Printing and Code Normalization. In Proceedings of the 16th IEEE International Conference on Program Comprehension (June 2008), 172--181.
[10]
Agrawal, R., Imielinski, T., and Swami, A.N. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD international conference on Management of data (1993), 207--216.
[11]
Duszynski, S., Knodel, J., and Becker, M. 2011. Analyzing the Source Code of Multiple Software Variants for Reuse Potential. In Proceedings of the 18th Working Conference on Reverse Engineering (Oct. 2011), 303--307.
[12]
Longest common subsequence, 2014. http://rosettacode.org/wiki/Longest_common_subsequence, (Sept. 2014).
[13]
Nathan Magnus and Su Yibin. 2009. Apriori Implementation. http://www2.cs.uregina.ca/~dbd/cs831/notes/itemsets/itemset_prog1.html.

Cited By

View all
  • (2020)Software smell detection techniques: A systematic literature reviewJournal of Software: Evolution and Process10.1002/smr.2320Online publication date: 15-Oct-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
iiWAS '14: Proceedings of the 16th International Conference on Information Integration and Web-based Applications & Services
December 2014
587 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • @WAS: International Organization of Information Integration and Web-based Applications and Services
  • Johannes Kepler Univ Linz: Johannes Kepler Universität Linz

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 December 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Control statement
  2. Frequent subsequence mining
  3. Java source code
  4. Method identifier
  5. Structural clone

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

iiWAS '14

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Software smell detection techniques: A systematic literature reviewJournal of Software: Evolution and Process10.1002/smr.2320Online publication date: 15-Oct-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media