Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3379597.3387467acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

On the Prevalence, Impact, and Evolution of SQL Code Smells in Data-Intensive Systems

Published: 18 September 2020 Publication History

Abstract

Code smells indicate software design problems that harm software quality. Data-intensive systems that frequently access databases often suffer from SQL code smells besides the traditional smells. While there have been extensive studies on traditional code smells, recently, there has been a growing interest in SQL code smells. In this paper, we conduct an empirical study to investigate the prevalence and evolution of SQL code smells in open-source, data-intensive systems. We collected 150 projects and examined both traditional and SQL code smells in these projects. Our investigation delivers several important findings. First, SQL code smells are indeed prevalent in data-intensive software systems. Second, SQL code smells have a weak co-occurrence with traditional code smells. Third, SQL code smells have a weaker association with bugs than that of traditional code smells. Fourth, SQL code smells are more likely to be introduced at the beginning of the project lifetime and likely to be left in the code without a fix, compared to traditional code smells. Overall, our results show that SQL code smells are indeed prevalent and persistent in the studied data-intensive software systems. Developers should be aware of these smells and consider detecting and refactoring SQL code smells and traditional code smells separately, using dedicated tools.

References

[1]
R Agarwal. 2017. Decision making with association rule mining and clustering in supply chains. International Journal of Data and Network Science 1, 1 (2017), 11--18.
[2]
R Agrawal, T Imielinski, and A Swami. 1993. Mining associations between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 207--216.
[3]
Rakesh Agrawal, Ramakrishnan Srikant, et al. 1994. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, Vol. 1215. 487--499.
[4]
Shadi AlZu'bi, Bilal Hawashin, Mohammad EIBes, and Mahmoud Al-Ayyoub. 2018. A novel recommender system based on apriori algorithm for requirements engineering. In 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS). IEEE, 323--327.
[5]
Maurício Aniche, Gabriele Bavota, Christoph Treude, Marco Aurélio Gerosa, and Arie van Deursen. 2018. Code smells for model-view-controller architectures. Empirical Software Engineering 23, 4 (2018), 2121--2157.
[6]
Giuliano Antoniol, Kamel Ayari, Massimiliano Di Penta, Foutse Khomh, and Yann-Gaël Guéhéneuc. 2008. Is it a bug or an enhancement? A text-based approach to classify change requests. In CASCON, Vol. 8. 304--318.
[7]
N. Arzamasova, M. Schäler, and K. Böhm. 2018. Cleaning Antipatterns in an SQL Query Log. IEEE Transactions on Knowledge and Data Engineering 30, 3 (March 2018), 421--434.
[8]
S. Brass and C. Goldberg. 2004. Semantic errors in SQL queries: a quite complete list. In Fourth International Conference on Quality Software, 2004. QSIC 2004. Proceedings. 250--257.
[9]
Sergey Brin, Rajeev Motwani, Jeffrey D Ullman, and Shalom Tsur. 1997. Dynamic itemset counting and implication rules for market basket data. Acm Sigmod Record 26, 2 (1997), 255--264.
[10]
Marta Burzańska and Piotr Wiśniewski. 2018. How Poor Is the "Poor Man's Search Engine"?. In Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety, Stanisław Kozielski, Dariusz Mrozek, Paweł Kasprowski, Bożena Małysiak-Mrozek, and Daniel Kostrzewa (Eds.). Springer International Publishing, Cham, 294--305.
[11]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321--357.
[12]
Harald Cramer. 1946. Mathematical methods of statistics. Princeton U. Press, Princeton (1946), 500.
[13]
Francisco Gonçalves de Almeida Filho, Antônio Diogo Forte Martins, Tiago da Silva Vinuto, José Maria Monteiro, Ítalo Pereira de Sousa, Javam de Castro Machado, and Lincoln Souza Rocha. 2019. Prevalence of bad smells in PL/SQL projects. In Proceedings of the 27th International Conference on Program Comprehension. IEEE Press, 116--121.
[14]
F. A. Fontana, V. Ferme, A. Marino, B. Walter, and P. Martenka. 2013. Investigating the Impact of Code Smells on System's Quality: An Empirical Study on Systems of Different Application Domains. In 2013 IEEE International Conference on Software Maintenance. 260--269.
[15]
Martin Fowler, Kent Beck, John Brant, William Opdyke, Don Roberts, and Erich Gamma. 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley Longman Publishing Co., Inc., USA.
[16]
Christian Goldberg. 2009. Do You Know SQL? About Semantic Errors in Database Queries. In In 7th Workshop on Teaching, Learning and Assessment in Databases. 13--19.
[17]
Yann-Gaël Guéhéneuc. 2007. Ptidej: A flexible reverse engineering tool suite. In 2007 IEEE International Conference on Software Maintenance. IEEE, 529--530.
[18]
Latifa Guerrouj, Zeinab Kermansaravi, Venera Arnaoudova, Benjamin CM Fung, Foutse Khomh, Giuliano Antoniol, and Yann-Gaël Guéhéneuc. 2017. Investigating the relation between lexical smells and change-and fault-proneness: an empirical study. Software Quality Journal 25, 3 (2017), 641--670.
[19]
Salima Hassaine, Foutse Khomh, Yann-Gaël Guéhéneuc, and Sylvie Hamel. 2010. IDS: An immune-inspired approach for the detection of software design smells. In 2010 Seventh International Conference on the Quality of Information and Communications Technology. IEEE, 343--348.
[20]
Geoffrey Hecht, Naouel Moha, and Romain Rouvoy. 2016. An empirical study of the performance impacts of android code smells. In Proceedings of the International Conference on Mobile Software Engineering and Systems. ACM, 59--69.
[21]
GitHub Inc. 2019. Search. Retrieved December 28, 2019 from https://developer.github.com/v3/search/
[22]
Zichuan Jin, Yanpeng Cui, and Zheng Yan. 2019. Survey of Intrusion Detection Methods Based on Data Mining Algorithms. In Proceedings of the 2019 International Conference on Big Data Engineering. ACM, 98--106.
[23]
David Johannes, Foutse Khomh, and Giuliano Antoniol. 2019. A large-scale empirical study of code smells in JavaScript projects. Software Quality Journal (2019), 1--44.
[24]
Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2012. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39, 6 (2012), 757--773.
[25]
Edward L Kaplan and Paul Meier. 1958. Nonparametric estimation from incomplete observations. Journal of the American statistical association 53, 282 (1958), 457--481.
[26]
Bill Karwin. 2010. SQL Antipatterns: Avoiding the pitfalls of database programming. Pragmatic Bookshelf.
[27]
Manpreet Kaur and Shivani Kang. 2016. Market Basket Analysis: Identify the changing trends of market data using association rule mining. Procedia computer science 85 (2016), 78--85.
[28]
Marouane Kessentini and Ali Ouni. 2017. Detecting android smells using multi-objective genetic programming. In Proceedings of the 4th International Conference on Mobile Software Engineering and Systems. IEEE Press, 122--132.
[29]
Foutse Khomh, Massimiliano Di Penta, and Yann-Gael Gueheneuc. 2009. An exploratory study of the impact of code smells on software change-proneness. In 2009 16th Working Conference on Reverse Engineering. IEEE, 75--84.
[30]
Foutse Khomh, Stephane Vaucher, Yann-Gaël Guéhéneuc, and Houari Sahraoui. 2011. BDTEX: A GQM-based Bayesian approach for the detection of antipatterns. Journal of Systems and Software 84, 4 (2011), 559--572.
[31]
P. Khumnin and T. Senivongse. 2017. SQL antipatterns detection and database refactoring process. In 2017 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). 199--205.
[32]
Sunghun Kim, E James Whitehead Jr, and Yi Zhang. 2008. Classifying software changes: Clean or buggy? IEEE Transactions on Software Engineering 34, 2 (2008), 181--196.
[33]
Wei Li and Raed Shatnawi. 2007. An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution. Journal of Systems and Software 80, 7 (2007), 1120--1128.
[34]
Red Gate Software Ltd. 2014. 119 SQL Code Smells.
[35]
Y. Lyu, A. Alotaibi, and W. G. J. Halfond. 2019. Quantifying the Performance Impact of SQL Antipatterns on Mobile Applications. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). 53--64.
[36]
I. Macia, R. Arcoverde, A. Garcia, C. Chavez, and A. von Staa. 2012. On the Relevance of Code Anomalies for Identifying Architecture Degradation Symptoms. In 2012 16th European Conference on Software Maintenance and Reengineering. 277--286.
[37]
Mika V. Mäntylä and Casper Lassenius. 2006. Subjective evaluation of software evolvability using code smells: An empirical study. Empirical Software Engineering 11, 3 (01 Sep 2006), 395--431.
[38]
Loup Meurice, Csaba Nagy, and Anthony Cleve. 2016. Static analysis of dynamic database usage in Java systems. In International Conference on Advanced Information Systems Engineering. Springer, 491--506.
[39]
Rupert G Miller Jr. 2011. Survival analysis. Vol. 66. John Wiley & Sons.
[40]
Audris Mockus and Lawrence G Votta. 2000. Identifying Reasons for Software Changes using Historic Databases. In Proc. of the 2000 International Conference on Software Maintenance. 120--130.
[41]
Naouel Moha, Yann-Gaël Guéhéneuc, Anne-Françoise Le Meur, Laurence Duchien, and Alban Tiberghien. 2010. From a domain analysis to the specification and detection of code and design smells. Formal Aspects of Computing 22, 3-4 (2010), 345--361.
[42]
Biruk Asmare Muse. 2020. Replication package. https://github.com/Biruk-Asmare/MSR_2020_SQLSmells_Prevalence
[43]
Csaba Nagy and Anthony Cleve. 2017. A static code smell detector for SQL queries embedded in Java code. In 2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 147--152.
[44]
Csaba Nagy and Anthony Cleve. 2018. SQLInspect: A static analyzer to inspect database usage in Java applications. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. ACM, 93--96.
[45]
Steffen Olbrich, Daniela S Cruzes, Victor Basili, and Nico Zazworka. 2009. The evolution and impact of code smells: A case study of two open source systems. In Proc. of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement. IEEE, 390--400.
[46]
S. M. Olbrich, D. S. Cruzes, and D. I. K. Sjøberg. 2010. Are all code smells harmful? A study of God Classes and Brain Classes in the evolution of three open source systems. In Proc. of the 2010 IEEE International Conferenceon Software Maintenance. 1--10.
[47]
Fabio Palomba. 2015. Textual analysis for code smell detection. In Proceedings of the 37th International Conference on Software Engineering-Volume 2. IEEE Press, 769--771.
[48]
Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Fausto Fasano, Rocco Oliveto, and Andrea De Lucia. 2018. On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empirical Software Engineering 23, 3 (2018), 1188--1221.
[49]
F. Palomba, G. Bavota, M. D. Penta, R. Oliveto, and A. D. Lucia. 2014. Do They Really Smell Bad? A Study on Developers' Perception of Bad Code Smells. In 2014 IEEE International Conference on Software Maintenance and Evolution. 101--110.
[50]
Ralph Peters and Andy Zaidman. 2012. Evaluating the lifespan of code smells using software repository mining. In 2012 16th European Conference on Software Maintenance and Reengineering. IEEE, 411--416.
[51]
Richard Peto and Julian Peto. 1972. Asymptotically efficient rank invariant test procedures. Journal of the Royal Statistical Society: Series A (General) 135, 2(1972), 185--198.
[52]
Gregory Piatetsky-Shapiro. 1991. Discovery, analysis, and presentation of strong rules. Knowledge discovery in databases (1991), 229--238.
[53]
Arthur J. Riel. 1996. Object-Oriented Design Heuristics (1st ed.). Addison-Wesley Longman Publishing Co., Inc., USA.
[54]
Gema Rodríguez-Pérez, Gregorio Robles, and Jesús M González-Barahona. 2018. Reproducibility and credibility in empirical software engineering: A case study based on a systematic literature review of the use of the szz algorithm. Information and Software Technology 99 (2018), 164--176.
[55]
José Amancio M. Santos, João B. Rocha-Junior, Luciana Carla Lins Prates, Rogeres Santos do Nascimento, Mydiã Falcão Freitas, and Manoel Gomes de Mendonça. 2018. A systematic review on the code smell effect. Journal of Systems and Software 144 (2018), 450 - 477.
[56]
T. Sharma, M. Fragkoulis, S. Rizou, M. Bruntink, and D. Spinellis. 2018. Smelly Relations: Measuring and Understanding Database Schema Quality. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). 55--64.
[57]
Raed Shatnawi and Wei Li. 2006. An investigation of bad smells in object-oriented design. In Third International Conference on Information Technology: New Generations (ITNG'06). IEEE, 161--165.
[58]
D. I. K. Sjøberg, A. Yamashita, B. C. D. Anda, A. Mockus, and T. Dybå. 2013. Quantifying the Effect of Code Smells on Maintenance Effort. IEEE Transactions on Software Engineering 39, 8 (Aug 2013), 1144--1156.
[59]
Jacek Śliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do changes induce fixes?. In ACM sigsoft software engineering notes, Vol. 30. ACM, 1--5.
[60]
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proc of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 908--911.
[61]
Nikolaos Tsantalis. 2010. Evaluation and improvement of software architecture: Identification of design problems in object-oriented systems and resolution through refactorings. Diss. Ph. D. dissertation, Univ. of Macedonia (2010).
[62]
Aiko Yamashita and Leon Moonen. 2013. Exploring the Impact of Inter-Smell Relations on Software Maintainability: An Empirical Study. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13). IEEE Press, 682--691.
[63]
Nico Zazworka, Michele A Shaw, Forrest Shull, and Carolyn Seaman. 2011. Investigating the impact of design debt on software quality. In Proceedings of the 2nd Workshop on Managing Technical Debt. ACM, 17--23.

Cited By

View all
  • (2024)An Empirical Study on the Characteristics of Database Access Bugs in Java ApplicationsACM Transactions on Software Engineering and Methodology10.1145/367244933:7(1-25)Online publication date: 13-Jun-2024
  • (2024)MineCPP: Mining Bug Fix Pairs and Their StructuresCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663797(552-556)Online publication date: 10-Jul-2024
  • (2024)A Multivocal Mapping Study of MongoDB Smells2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00086(792-803)Online publication date: 12-Mar-2024
  • Show More Cited By

Index Terms

  1. On the Prevalence, Impact, and Evolution of SQL Code Smells in Data-Intensive Systems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MSR '20: Proceedings of the 17th International Conference on Mining Software Repositories
    June 2020
    675 pages
    ISBN:9781450375177
    DOI:10.1145/3379597
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 September 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Code smells
    2. SQL code smells
    3. data-intensive systems
    4. database access

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    MSR '20
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)An Empirical Study on the Characteristics of Database Access Bugs in Java ApplicationsACM Transactions on Software Engineering and Methodology10.1145/367244933:7(1-25)Online publication date: 13-Jun-2024
    • (2024)MineCPP: Mining Bug Fix Pairs and Their StructuresCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663797(552-556)Online publication date: 10-Jul-2024
    • (2024)A Multivocal Mapping Study of MongoDB Smells2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00086(792-803)Online publication date: 12-Mar-2024
    • (2024)On the Prevalence, Co-occurrence, and Impact of Infrastructure-as-Code Smells2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00009(23-34)Online publication date: 12-Mar-2024
    • (2024)Toward a novel taxonomy to capture code smells caused by refactoringScience of Computer Programming10.1016/j.scico.2024.103120236:COnline publication date: 1-Sep-2024
    • (2024)Prevalence and severity of design anti-patterns in open source programs—A large-scale studyInformation and Software Technology10.1016/j.infsof.2024.107429170:COnline publication date: 1-Jun-2024
    • (2024)Software design analysis and technical debt management based on design rule theoryInformation and Software Technology10.1016/j.infsof.2023.107322164:COnline publication date: 10-Jan-2024
    • (2024)Data-access performance anti-patterns in data-intensive systemsEmpirical Software Engineering10.1007/s10664-024-10535-829:6Online publication date: 29-Aug-2024
    • (2024)Design smells in multi-language systems and bug-proneness: a survival analysisEmpirical Software Engineering10.1007/s10664-024-10476-229:5Online publication date: 3-Jul-2024
    • (2023)GIRT-Data: Sampling GitHub Issue Report Templates2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00026(104-108)Online publication date: May-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media