Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Clones: what is that smell?

Published: 01 August 2012 Publication History

Abstract

Clones are generally considered bad programming practice in software engineering folklore. They are identified as a bad smell (Fowler et al. 1999) and a major contributor to project maintenance difficulties. Clones inherently cause code bloat, thus increasing project size and maintenance costs. In this work, we try to validate the conventional wisdom empirically to see whether cloning makes code more defect prone. This paper analyses the relationship between cloning and defect proneness. For the four medium to large open source projects that we studied, we find that, first, the great majority of bugs are not significantly associated with clones. Second, we find that clones may be less defect prone than non-cloned code. Third, we find little evidence that clones with more copies are actually more error prone. Fourth, we find little evidence to support the claim that clone groups that span more than one file or directory are more defect prone than collocated clones. Finally, we find that developers do not need to put a disproportionately higher effort to fix clone dense bugs. Our findings do not support the claim that clones are really a "bad smell" (Fowler et al. 1999). Perhaps we can clone, and breathe easily, at the same time.

References

[1]
Alkhatib G (1992) The maintenance problem of application software: an empirical analysis. J Softw Maint: Res Pract 4(2):83-104.
[2]
Bachmann A, Bernstein A (2009) Data retrieval, processing and linking for software process data analysis. Technical report, University of Zurich. http://www.ifi.uzh.ch/ddis/people/adrianbachmann/pdq/. Accessed May 2009.
[3]
Baker BS (1995) On finding duplication and near-duplication in large software systems. In: WCRE '95: proceedings of the 2nd working conference on reverse engineering. IEEE Computer Society, Washington, pp 86-95. http://portal.acm.org/citation.cfm?id=836911
[4]
Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (1999) Partial redesign of java software systems based on clone analysis. In: WCRE '99: proceedings of the 6th working conference on reverse engineering. IEEE Computer Society, Washington, pp 326-336. http://portal.acm.org/citation.cfm?id=837061
[5]
Barbour L, Khomh F, Zou Y (2011) Late propagation in software clones.
[6]
Baxter ID, Yahin A, Moura L, Sant'Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: Proceedings of the international conference on software maintenance, pp 368-377.
[7]
Berkus J (2007) The 5 types of open source projects. http://www.powerpostgresql.com/5_types. Accessed 20 March 2007.
[8]
Bird C, Bachmann A, Aune E, Duffy J, BernsteinA, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: ESEC/FSE '09: proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, New York, pp 121-130.
[9]
Bruntink M, van Deursen A, van Engelen R, Tourwe T (2005) On the use of clone detection for identifying crosscutting concern code. IEEE Trans Softw Eng 31(10):804-818.
[10]
Cai D, Kim M (2011) An empirical study of long-lived code clones. Fundamental approaches to software engineering, pp 432-446.
[11]
¿ubranic D, Murphy GC (2003) Hipikat: recommending pertinent software development artifacts. In: ICSE '03: proceedings of the 25th international conference on software engineering. IEEE Computer Society, Washington, pp 408-418. http://portal.acm.org/citation.cfm?id=776816.776866
[12]
Ducasse S, Rieger M, Demeyer S (1999) A language independent approach for detecting duplicated code. In: Proc. IEEE int. conf. on software maintenance 1999 ('99). Oxford, UK, pp 109-118.
[13]
Ekoko ED, Robillard MP (2007) Tracking code clones in evolving software. In: ICSE '07: proceedings of the 29th international conference on software engineering. IEEE Computer Society, Washington, pp 158-167.
[14]
Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM '03: proceedings of the international conference on software maintenance. IEEE Computer Society, Washington, pp 23-32. http://portal.acm.org/ citation.cfm?id=943568
[15]
Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code, 1st edn. Addison-Wesley Professional. http://www.amazon.com/exec/obidos/redirect? tag=citeulike07-20&path=ASIN/0201485672
[16]
Gabel M, Jiang L, Su Z (2008) Scalable detection of semantic clones. In: ICSE '08: proceedings of the 30th international conference on Software engineering. ACM, New York, pp 321-330.
[17]
Geiger R, Fluri B, Gall H, Pinzger M (2006) Relation of code clones and change couplings. In: Baresi L, Heckel R (eds) Fundamental approaches to software engineering. Lecture notes in computer science, vol 3922, chap 31. Springer, Berlin/Heidelberg, pp 411-425.
[18]
Göde N, Koschke R (2011) Frequency and risks of changes to clones. In: Proceeding of the 33rd international conference on software engineering. ACM, pp 311-320.
[19]
Higo Y, Kamiya T, Kusumoto S, Inoue K (2005) Aries: refactoring support tool for code clone. SIGSOFT Softw Eng Notes 30(4):1-4.
[20]
Jiang L, Misherghi G, Su Z, Glondu S (2007a) Deckard: scalable and accurate tree-based detection of code clones. In: ICSE '07: proceedings of the 29th international conference on software engineering. IEEE Computer Society, Washington, pp 96-105.
[21]
Jiang L, Su Z, Chiu E (2007b) Context-based detection of clone-related bugs. In: ESEC-FSE '07: proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, New York, pp 55-64.
[22]
Juergens E, Deissenboeck F, Hummel B, Wagner S (2009) Do code clones matter? In: ICSE '09: proceedings of the 2009 IEEE 31st international conference on software engineering. IEEE Computer Society, Washington, pp 485-495.
[23]
Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654-670.
[24]
Kan S (2002) Metrics and models in software quality engineering. Addison-Wesley Longman Publishing Co., Inc., Boston.
[25]
Kapser C, Godfrey M (2008) Cloning considered harmful considered harmful: patterns of cloning in software. Empir Software Eng 13(6):645-692.
[26]
Kapser C, Godfrey MW (2006) "Cloning considered harmful" considered harmful. In: Working conference on reverse engineering, pp 19-28.
[27]
Kawaguchi S, Yamashina T, Uwano H, Fushida K, Kamei Y, Nagura M, Iida H (2009) Shinobi: a tool for automatic code clone detection in the ide. In: Working conference on reverse engineering, pp 313-314.
[28]
Kim M, Bergman L, Lau T, Notkin D (2004) An ethnographic study of copy and paste programming practices in oopl. In: International symposium on empirical software engineering, pp 83-92.
[29]
Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. SIGSOFT Softw Eng Notes 30(5):187-196.
[30]
Kim S, Zimmermann T, Pan K, Jr J (2006) Automatic identification of bug-introducing changes. In: ASE '06: proceedings of the 21st IEEE/ACM international conference on automated software engineering. IEEE Computer Society, Washington, pp 81-90.
[31]
Kim S, Whitehead E, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181-196.
[32]
Komondoor R, Horwitz S (2001) Using slicing to identify duplication in source code. In: Cousot P (ed) Static analysis, lecture notes in computer science, chap 3, vol 2126. Springer, Berlin, pp 40- 56.
[33]
Komondoor R, Horwitz S (2003) Effective, automatic procedure extraction. In: IWPC '03: proceedings of the 11th IEEE international workshop on program comprehension. IEEE Computer Society, Washington, pp 33-42. http://portal.acm.org/citation.cfm?id=857023
[34]
Krinke J (2007) A study of consistent and inconsistent changes to code clones. In: WCRE '07: proceedings of the 14th working conference on reverse engineering. IEEE Computer Society, Washington, pp 170-178.
[35]
Krinke J (2008) Is cloned code more stable than non-cloned code? In: 2008 8th IEEE international working conference on source code analysis and manipulation, pp 57-66. SCAM.2008.14
[36]
Li Z, Lu S, Myagmar S, Zhou Y (2004) CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In: OSDI'04: proceedings of the 6th conference on symposium on opearting systems design & implementation. USENIX Association, Berkeley, p 20. http://portal.acm.org/citation.cfm?id=1251274
[37]
Mäntylä M, Lassenius C (2006) Subjective evaluation of software evolvability using code smells: an empirical study. Empir Software Eng 11(3):395-431.
[38]
Mockus A, Votta LG (2000) Identifying reasons for software changes using historic databases. In: Proceedings international conference on software maintenance, 2000. IEEE Computer Society, Los Alamitos, pp 120-130.
[39]
Nguyen TT, Nguyen HA, Pham NH, Al-Kofahi JM, Nguyen TN (2009) Clone-aware configuration management. In: ASE '09: proceedings of the 2009 IEEE/ACM international conference on automated software engineering. IEEE Computer Society, Washington, pp 123-134.
[40]
Rahman F, Bird C, Devanbu P (2010) Clones: what is that smell? In: Proceedings of the 7th working conference on mining software repositories. IEEE Computer Society.
[41]
Roy C, Cordy J (2007) A survey on software clone detection research. Queens School of Computing TR 541:115.
[42]
Selim G, Barbour L, Shang W, Adams B, Hassan A, Zou Y (2010) Studying the impact of clones on software defects. In: 2010 17th working conference on reverse engineering (WCRE). IEEE, pp 13-21.
[43]
Sliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: MSR '05: proceedings of the 2005 international workshop on mining software repositories. ACM, New York, pp 1-5.
[44]
Thummalapenta S, Cerulo L, Aversano L, Di Penta M (2009) An empirical study on the maintenance of source code clones. Empir Software Eng 15(1):1-34.
[45]
Toomim M, Begel A, Graham SL (2004) Managing duplicated code with linked editing. In: VLHCC '04: proceedings of the 2004 IEEE symposium on visual languages--human centric computing. IEEE Computer Society, Washington, pp 173-180.

Cited By

View all
  • (2024)CloneRipples: predicting change propagation between code clone instances by graph-based deep learningEmpirical Software Engineering10.1007/s10664-024-10567-030:1Online publication date: 30-Oct-2024
  • (2023)Workflow analysis of data science code in public GitHub repositoriesEmpirical Software Engineering10.1007/s10664-022-10229-z28:1Online publication date: 1-Jan-2023
  • (2022)Predicting change propagation between code clone instances by graph-based deep learningProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527912(425-436)Online publication date: 16-May-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 17, Issue 4-5
August 2012
264 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 August 2012

Author Tags

  1. Empirical software engineering
  2. Software clone
  3. Software evolution
  4. Software maintenance
  5. Software quality

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)CloneRipples: predicting change propagation between code clone instances by graph-based deep learningEmpirical Software Engineering10.1007/s10664-024-10567-030:1Online publication date: 30-Oct-2024
  • (2023)Workflow analysis of data science code in public GitHub repositoriesEmpirical Software Engineering10.1007/s10664-022-10229-z28:1Online publication date: 1-Jan-2023
  • (2022)Predicting change propagation between code clone instances by graph-based deep learningProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527912(425-436)Online publication date: 16-May-2022
  • (2022)Clones in deep learning code: what, where, and why?Empirical Software Engineering10.1007/s10664-021-10099-x27:4Online publication date: 1-Jul-2022
  • (2021)A Survey On Log Research Of AIOps: Methods and TrendsMobile Networks and Applications10.1007/s11036-021-01832-326:6(2353-2364)Online publication date: 1-Dec-2021
  • (2021)Evolution of technical debt remediation in PythonJournal of Software: Evolution and Process10.1002/smr.231933:4Online publication date: 1-Apr-2021
  • (2019)Why aren’t regular expressions a lingua franca? an empirical study on the re-use and portability of regular expressionsProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3338906.3338909(443-454)Online publication date: 12-Aug-2019
  • (2019)Measuring the effect of clone refactoring on the size of unit test cases in object-oriented softwareInnovations in Systems and Software Engineering10.1007/s11334-019-00334-615:2(117-137)Online publication date: 1-Jun-2019
  • (2018)Understanding metric-based detectable smells in Python softwareInformation and Software Technology10.5555/3163583.316367094:C(14-29)Online publication date: 1-Feb-2018
  • (2018)Are code smells the root cause of faults?Proceedings of the 19th International Conference on Agile Software Development: Companion10.1145/3234152.3234153(1-3)Online publication date: 21-May-2018
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media