Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1985404.1985413acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Scalable clone detection using description logic

Published: 23 May 2011 Publication History

Abstract

The semantic web is slowly transforming the web as we know it into a machine understandable pool of information that can be consumed and reasoned about by various clients. Source-code is no exception to this trend and various communities have proposed standards to share code as linked data. With the availability of large amounts of open source code published in publically accessible repositories and the introduction of massively horizontally scaling frameworks and cloud computing infrastructure, a new era of software mining across information silos is reshaping the software engineering landscape. The so far unreachable goal of analyzing code at a global level, and therefore detecting global software clones, has become manageable. Description logic and semantic web reasoners have so far only plaid a minor role in this transformation and are mainly used to model source code data. In this paper, we introduce a clone detection algorithm that uses a semantic web reasoner and is based on the Hadoop map-reduce framework that can scale horizontally to a large amount of data. We also define a novel and compact clone model that only considers control-blocks and used data types while still yielding similar clone detection results than more complex representations. In order to validate our approach we have compared our algorithm to some of the leading clone detection tools (CCFinder, JCD and Simian) and show differences in performance and detection precision.

References

[1]
Rahman, F. Bird, C. Devanbu, P. 2010. Clones: What is that smell? IEEE Working Conference on Mining Software Repositories (MSR), 72--81, 2010.
[2]
Harder, J. Gode, N. 2010. Quo vadis, clone management? In Proceeding of IWSC '10 - 4th International Workshop on Software Clones. 85--86, 2010.
[3]
Mutharaju, R. and Maier, F. A MapReduce Algorithm for EL+. 23rd International Workshop on Description Logics (DL2010), 464--474, 2010.{4} Urbani, J., Kotoulas, S., Oren, E., van Harmelen, F., Scalable Distributed Reasoning using MapReduce, In Proceedings of the 8th International Semantic Web Conference (ISWC '09), 2009.
[4]
Simian Clone Detection Tool, http://www.redhillconsulting.com.au/products/simian/, last visited October 2010.
[5]
Ducasse, S., Rieger, M., Demeyer, S., A Language Independent Approach for Detecting Duplicated Code. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM '99), USA, 109--116, 1999.
[6]
Prechelt, L. Malpohl, G. Philippsen, M. JPlag: Finding plagiarisms among a set of programs. Technical Report 2000--1, Fakultat fur Informatik, Universitat Karlsruhe, Germany, March 2000.
[7]
Kamiya, T., Kusumoto, S., Inoue, K., CCFinder: A multilinguistic token-based code clone detection system for large scale source code. Transactions on Software Engineering, 8(7):654--670, 2002.
[8]
Baxter, I.D., Yahin, A., Moura, L., Sant'Anna, M., Bier, L., Clone detection using abstract syntax trees. In Proceedings of the 14th IEEE International Conference on Software Maintenance (ICSM-98), Bethesda, MD, USA, pages 368-- 377, 1998.
[9]
Davis, I.J., Godfrey, M.W., Clone detection by exploiting assembler. In Proceedings of the 4th International Workshop on Software Clones (IWSC '10). ACM, New York, NY, USA, 77--78, 2010.
[10]
Inoue, K., Garg, P., Iida, H., Matsumoto, K., Torii, K., Mega software engineering. In Proc. of the 6th International PROFES (Product Focused Software Process Improvement) Conference, pages 399--413, Oulu, Finland, 2005.
[11]
Livieri, S. Higo, Y. Matushita, M. Inoue, K. Very-large scale code clone analysis and visualization of open source program using distributed ccfinder: D-ccfinder, In Proc. of the 29th International Conference on Software Engineering, 2007.
[12]
Burd, E. Bailey, J. Evaluating clone detection tools for use during preventative maintenance. In Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation (SCAM'02), 36--43, 2002.
[13]
Koschke, R. Falke, R. Frenzel, P. Clone detection using abstract syntax suffix trees. In Proceedings of the 13th Working Conference on Reverse Engineering (WCRE 2006), 253--262, 2006.
[14]
Walenstein, A. Jyoti, N. Li, J., Yun Yang, Lakhotia, A. Problems creating task-relevant clone detection reference data. In Proceedings of the 10th Working Conference on Reverse Engineering (WCRE-03), 285--294, 2003.
[15]
Kapser, C. Toward an Understanding of Software Code Cloning as a Development Practice, Doctor Thesis, University of Waterloo, 2009.
[16]
Dean, J. Ghemawat, S., MapReduce: simplified data processing on large clusters. Communications of the ACM, 107--113, 2004.
[17]
Apache Hadoop Project, http://hadoop.apache.org/,last visited January 2011.
[18]
Taylor, R. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics, BMC Bioinformatics, Vol. 11, No. Suppl 12. 2010
[19]
Berners-Lee, T. Hendler, J. Lassila, O. The Semantic Web. Scientific American, May, 2001.
[20]
Baader, F. Calvanese, D. McGuinness, D. Nardi, D. Patel- Schneider, P. The Description Logic Handbook, Cambridge University Press, 2003.
[21]
McGuinness, D. L., Harmelen, F. V., OWL Web Ontology Language Overview, W3C Recommendation February 2004, http://www.w3.org/TR/owl-features/, last visited January 2011
[22]
Haarslev, V., Moller, R. Description of the racer system and its applications, Proceedings of the International Workshop on Description Logics, pp. 131--141, 2001.
[23]
Parsia, B., Sirin, E., Pellet: An OWL DL reasoner, Proceedings of 3rd International Semantic Web Conference, 2004.
[24]
DBpedia -- Semantic Wiki DB, http://dbpedia.org, last visited October 2010.
[25]
Kiefer, C. Bernstein, A. Tappolet, J. Analyzing Software with iSPARQL, Proceedings of the 3rd ESWC International Workshop on Semantic Web Enabled Software Engineering, 2007.
[26]
Baader, F., Lutz, C., Suntisrivaraporn, B. CEL A Polynomial-time Reasoner for Life Science Ontologies. In Proceedings of the 3rd International Joint Conference on Automated Reasoning (IJCAR'06), volume 4130 of Lecture Notes in Artificial Intelligence, 287--291. Springer-Verlag, 2006.
[27]
Jiang, L., Misherghi, G., Su, Z., and Glondu, S. DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones. In Proceedings of the 29th international conference on Software Engineering (ICSE '07), 96--105, 2007.
[28]
Roy, C., Cordy J., Koschke, R. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming, 74, 7, 470- 495, 2009.
[29]
Hummel, B., Juergens, E., Heinemann, L., Conradt, M., Index-based code clone detection: incremental, distributed, scalable, IEEE International Conference on Software Maintenance, pp.1--9, 2010.
[30]
Krinke, J., Identifying Similar Code with Program Dependence Graphs. In Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01), Washington, DC, USA, 301--309, 2001.

Cited By

View all
  • (2020)Semantic code search via equational reasoningProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386001(1066-1082)Online publication date: 11-Jun-2020
  • (2019)An Exploratory Study on Detection of Cloned Code in Information SystemsProceedings of the XV Brazilian Symposium on Information Systems10.1145/3330204.3330277(1-8)Online publication date: 20-May-2019
  • (2018)Semantic code clone detection for Internet of Things applications using reaching definition and liveness analysisThe Journal of Supercomputing10.1007/s11227-016-1832-674:9(4199-4226)Online publication date: 1-Sep-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
IWSC '11: Proceedings of the 5th International Workshop on Software Clones
May 2011
92 pages
ISBN:9781450305884
DOI:10.1145/1985404
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 May 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. code clone detection
  2. semantic-web.

Qualifiers

  • Research-article

Conference

ICSE11
Sponsor:
ICSE11: International Conference on Software Engineering
May 23, 2011
HI, Waikiki, Honolulu, USA

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Semantic code search via equational reasoningProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386001(1066-1082)Online publication date: 11-Jun-2020
  • (2019)An Exploratory Study on Detection of Cloned Code in Information SystemsProceedings of the XV Brazilian Symposium on Information Systems10.1145/3330204.3330277(1-8)Online publication date: 20-May-2019
  • (2018)Semantic code clone detection for Internet of Things applications using reaching definition and liveness analysisThe Journal of Supercomputing10.1007/s11227-016-1832-674:9(4199-4226)Online publication date: 1-Sep-2018
  • (2015)WuKong: a scalable and accurate two-phase approach to Android app clone detectionProceedings of the 2015 International Symposium on Software Testing and Analysis10.1145/2771783.2771795(71-82)Online publication date: 13-Jul-2015
  • (2015)A parallel and efficient approach to large scale clone detectionJournal of Software: Evolution and Process10.1002/smr.170727:6(402-429)Online publication date: 25-Mar-2015
  • (2013)A parallel and efficient approach to large scale clone detectionProceedings of the 7th International Workshop on Software Clones10.5555/2662708.2662719(46-52)Online publication date: 19-May-2013
  • (2013)A parallel and efficient approach to large scale clone detection2013 7th International Workshop on Software Clones (IWSC)10.1109/IWSC.2013.6613042(46-52)Online publication date: May-2013
  • (2013)An Empirical Experiment on Analogy-Based Software Cost Estimation with CUDA FrameworkProceedings of the 2013 22nd Australian Conference on Software Engineering10.1109/ASWEC.2013.28(165-174)Online publication date: 4-Jun-2013
  • (2013)Software clone detection: A systematic reviewInformation and Software Technology10.1016/j.infsof.2013.01.00855:7(1165-1199)Online publication date: Jul-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media