Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2884781.2884803acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Using (bio)metrics to predict code quality online

Published: 14 May 2016 Publication History

Abstract

Finding and fixing code quality concerns, such as defects or poor understandability of code, decreases software development and evolution costs. A common industrial practice to identify code quality concerns early on are code reviews. While code reviews help to identify problems early on, they also impose costs on development and only take place after a code change is already completed. The goal of our research is to automatically identify code quality concerns while a developer is making a change to the code. By using biometrics, such as heart rate variability, we aim to determine the difficulty a developer experiences working on a part of the code as well as identify and help to fix code quality concerns before they are even committed to the repository.
In a field study with ten professional developers over a two-week period we investigated the use of biometrics to determine code quality concerns. Our results show that biometrics are indeed able to predict quality concerns of parts of the code while a developer is working on, improving upon a naive classifier by more than 26% and outperforming classifiers based on more traditional metrics. In a second study with five professional developers from a different country and company, we found evidence that some of our findings from our initial study can be replicated. Overall, the results from the presented studies suggest that biometrics have the potential to predict code quality concerns online and thus lower development and evolution costs.

References

[1]
A. F. Ackerman, P. J. Fowler, and R. G. Ebenau. Software inspections and the industrial production of software. In Proc. of Symp. on Softw. Validation, 1984.
[2]
E. H. Alikacem and H. Sahraoui. Generic metric extraction framework. In Proc. of IWSM/MetriKon, 2006.
[3]
L. Anthony, P. Carrington, P. Chu, C. Kidd, J. Lai, and A. Sears. Gesture dynamics: Features sensitive to task difficulty and correlated with physiological sensors. Stress, 1418(360), 2011.
[4]
http://www.apple.com/watch/.
[5]
P. Ayres. Systematic mathematical errors and cognitive load. In Contemporary Educational Psychology, 2001.
[6]
A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In Proc. of ICSE, 2013.
[7]
R. Bednarik and M. Tukiainen. An eye-tracking methodology for characterizing program comprehension processes. In Proc. of ETRA, 2006.
[8]
R. Bednarik, H. Vrzakova, and M. Hradis. What do you want to do next: a novel approach for intent prediction in gaze-based interaction. In Proc. of ETRA, 2012.
[9]
G. G. Berntson, J. T. J. Bigger, D. L. Eckberg, P. Grossman, P. G. Kaufmann, M. Malik, H. N. Nagaraja, S. W. Porges, J. P. Saul, P. H. Stone, and M. W. van der Molen. Heart rate variability: origins, methods, and interpretive caveats. Psychophysiology, 34(6):623--648, 1997.
[10]
B. W. Boehm. Software engineering economics. Prentice-Hall, 1981.
[11]
B. W. Boehm, J. R. Brown, and M. Lipow. Quantitative evaluation of software quality. In Proc. of ICSE, 1976.
[12]
A. Bosu, M. Greiler, and C. Bird. Characteristics of useful code reviews: An empirical study at microsoft. In Proc. of MSR, 2015.
[13]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.
[14]
S. Butterworth. On the theory of filter amplifiers. Wireless Engineer, 7:536--541, 1930.
[15]
J. Carter and P. Dewan. Are you having difficulty? In Proc. of CSCW, 2010.
[16]
J. Cohen. A coefficient of agreement for nominal scales. Education and Psychological Measurement, 20:37--46, 1960.
[17]
A. M. Connor. Mining software metrics for the jazz repository. Journal of Systems and Software, 1(5):194--204, 2011.
[18]
D. J. Cornforth, A. Koenig, R. Riener, K. August, A. H. Khandoker, C. Karmakar, M. Palaniswami, and H. F. Jelinek. The role of serious games in robot exoskeleton-assisted rehabilitation of stroke patients. In Serious Games Analytics: Methodologies for Performance Measurement, Assessment, and Improvement. Springer International Publisher, 2015.
[19]
M. Crosby and J. Stelovsky. How do we read algorithms? a case study. Computer, 23(1), 1990.
[20]
W. Cunningham. The wycash portfolio management system. OOPS Messenger, 4(2):29--30, 1993.
[21]
B. Curtis, S. Sheppard, P. Milliman, M. Borst, and T. Love. Measuring the psychological complexity of software maintenance tasks with the Halstead and McCabe metrics. Trans. on Software Engineering, SE-5(2):96--104, 1979.
[22]
R. G. Ebenau and S. H. Strauss. Software Inspection Process. McGraw-Hill, Inc., 1994.
[23]
K. O. Elish and M. O. Elish. Predicting defect-prone software modules using support vector machines. Journal of Systems and Software, 81(5):649--660, 2008.
[24]
http://www.empatica.com.
[25]
http://techcrunch.com/2011/08/07/oh-what-noble-scribe-hath-penned-these-words/.
[26]
S. H. Fairclough, L. Venables, and A. Tattersall. The influence of task demand and learning on the psychophysiological response. International Journal of Psychophysiology, 56, 2005.
[27]
J. Feigenspan, S. Apel, J. Liebig, and C. Kastner. Exploring software measures to assess program comprehension. In Proc. of ESEM, 2011.
[28]
http://findbugs.sourceforge.net/.
[29]
T. Fritz, A. Begel, S. C. Müller, S. Yigit-Elliot, and M. Züger. Using psycho-physiological measures to assess task difficulty in software development. In Proc. of ICSE, 2014.
[30]
E. Giger, M. D'Ambros, M. Pinzger, and H. C. Gall. Method-level bug prediction. In Proc. of ESEM, 2012.
[31]
http://www.niallkennedy.com/blog/2006/11/google-mondrian.html.
[32]
R. Grady and T. Slack. Key lessons in achieving widespread inspection use. Software, 11(4):46--57, 1994.
[33]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations Newsletter, 11(1):10--18, 2009.
[34]
Y. Ikutani and H. Uwano. Brain activity measurement during program comprehension with NIRS. In Proc. of SNPD, 2014.
[35]
K. Kevic, B. M. Walters, T. R. Shaffer, B. Sharif, D. C. Shepherd, and T. Fritz. Tracing software developers' eyes and interactions for change tasks. In Proc. of ESEC/FSE, 2015.
[36]
A. J. Ko and B. A. Myers. A framework and methodology for studying the causes of software errors in programming systems. Journal of Visual Languages & Computing, 16(1):41--84, 2005.
[37]
N. A. Kuznetsov, K. D. Shockley, M. J. Richardson, and M. A. Riley. Effect of precision aiming on respiration and postural-respiratory synergy. Neuroscience letters, 502(1):13--17, 2011.
[38]
J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical data. Biometrics, 33(1):159--174, 1977.
[39]
M. Lanza and R. Marinescu. Object-oriented Metrics in Practice: Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems. Springer, 2006.
[40]
T. Lee, J. Nam, D. Han, S. Kim, and H. P. In. Micro interaction metrics for defect prediction. In Proc. of ESEC/FSE, 2011.
[41]
M. M. Lehman. On understanding laws, evolution, and conservation in the large-program life cycle. Journal of Systems and Software, 1:213--221, 1980.
[42]
S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. Trans. on Software Engineering, 34(4):485--496, 2008.
[43]
O. Maimon and L. Rokach, editors. Data Mining and Knowledge Discovery Handbook. Springer, 2006.
[44]
R. Marinescu. Detection strategies: Metrics-based rules for detecting design flaws. In Proc. of ICSM, 2004.
[45]
S. McConnell. Code complete. Pearson, 2004.
[46]
N. Moha, Y. Guéhéneuc, L. Duchien, and A. Le Meur. Decor: A method for the specification and detection of code and design smells. Trans. on Software Engineering, 36(1), 2010.
[47]
R. Moser, W. Pedrycz, and G. Succi. Analysis of the reliability of a subset of change metrics for defect prediction. In Proc. of ESEM, 2008.
[48]
S. C. Müller and T. Fritz. Stuck and frustrated or in flow and happy: Sensing developers' emotions and progress. In Proc. of ICSE, 2015.
[49]
M. Munro. Product metrics for automatic identification of "bad smell" design problems in java source-code. In Proc. of METRICS, 2005.
[50]
N. Nagappan and T. Ball. Use of relative code churn measures to predict system defect density. In Proc. of ICSE, 2005.
[51]
N. Nagappan, T. Ball, and A. Zeller. Mining metrics to predict component failures. In Proc. of ICSE, 2006.
[52]
N. Nagappan, B. Murphy, and V. Basili. The influence of organizational structure on software quality: An empirical case study. In Proc. of ICSE, 2008.
[53]
T. Nakagawa, Y. Kamei, H. Uwano, A. Monden, K. Matsumoto, and D. M. German. Quantifying programmers' mental workload during program comprehension based on cerebral blood flow measurement: A controlled experiment. In Companion Proc. of ICSE, 2014.
[54]
D. Novak, J. Ziherl, A. Olenšek, M. Milavec, J. Podobnik, M. Mihelj, and M. Munih. Psychophysiological response to robotic rehabilitation tasks in stroke. Trans. on Neural Systems and Rehabilitation Engineering, 18(4), 2010.
[55]
F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, and D. Poshyvanyk. Detecting bad smells in source code using change history information. In Proc. of ASE, 2013.
[56]
C. Parnin. Subvocalization - toward hearing the inner thoughts of developers. In Proc. of ICPC, 2011.
[57]
https://pmd.github.io/.
[58]
Y. Qi. Random forest for bioinformatics. In Ensemble Machine Learning. Springer, 2012.
[59]
S. Radevski, H. Hata, and K. Matsumoto. Real-time monitoring of neural state in assessing and improving software developers' productivity. Proc. of CHASE, 2015.
[60]
http://www.ifi.uzh.ch/seal/people/mueller/PredictCodeQualityWithBiometrics.
[61]
P. Richter, T. Wagner, R. Heger, and G. Weise. Psychophysiological analysis of mental load during driving on rural roads - a quasi-experimental field study. Ergonomics, 41(5), 1998.
[62]
P. C. Rigby, D. M. German, and M.-A. Storey. Open source software peer review practices: A case study of the apache server. In Proc. of ICSE, 2008.
[63]
P. Rodeghero, C. McMillan, P. W. McBurney, N. Bosch, and S. D'Mello. Improving automated source code summarization via an eye-tracking study of programmers. In Proc. of ICSE, 2014.
[64]
S. Schmidth and H. Walach. Electrodermal activity (EDA) - state-of-the-art measurements and techniques for parapsychological purposes. Journal of Parapsychology, 64(2), 2000.
[65]
C. Setz, B. Arnrich, J. Schumm, R. L. Marca, G. Tröster, and U. Ehlert. Discriminating stress from cognitive load using a wearable eda device. Trans. on Information Technology in Biomedicine, 14(2), 2010.
[66]
J. Siegmund, C. Kästner, S. Apel, C. Parnin, A. Bethmann, T. Leich, G. Saake, and A. Brechmann. Understanding understanding source code with functional magnetic resonance imaging. In Proc. of ICSE, 2014.
[67]
L. A. Sroufe and E. Waters. Heart rate as a convergent measure in clinical and developmental research. Merrill-Palmer Quarterly of Behavior and Development, 23(1):3--27, 1977.
[68]
J. Sweller. Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2):257--285, 1988.
[69]
J. Sweller, P. Ayres, and S. Kalyuga. Cognitive Load Theory. Springer, 2011.
[70]
E. van Emden and L. Moonen. Java quality assurance by detecting code smells. In Proc. of WCRE, 2002.
[71]
J. Veltman and A. W. Gaillard. Physiological workload reactions to increasing levels of task difficulty. Ergonomics, 41(5):656--669, 1998.
[72]
G. F. Walter and S. W. Porges. Heart rate and respiratory responses as a function of task difficulty: The use of discriminant analysis in the selection of psychologically sensitive physiological responses. Psychophysiology, 13(6), 1976.
[73]
R. A. Weast and N. G. Neiman. The effect of cognitive load and meaning on selective attention. In Annual Meeting of the Cognitive Science Society, 2010.
[74]
E. J. Weyuker, T. J. Ostrand, and R. M. Bell. Do too many cooks spoil the broth? using the number of developers to enhance defect prediction models. Empirical Software Engineering, 13(5):539--559, 2008.
[75]
G. F. Wilson. An analysis of mental workload in pilots during flight using multiple psychphysiological measures. International Journal of Aviation Psychology, 12(1), 2002.
[76]
H. Zhang, X. Zhang, and M. Gu. Predicting defective software components from code complexity measures. In Proc. of PRDC, 2007.
[77]
T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In Proc. of PROMISE, 2007.

Cited By

View all
  • (2024)Prediction of Residual Defects after Code Review Based on Reviewer ConfidenceIEICE Transactions on Information and Systems10.1587/transinf.2023MPL0002E107.D:3(273-276)Online publication date: 1-Mar-2024
  • (2024)The Influence of Future Perspective on Job Satisfaction and Turnover Intention of Software EngineersIEICE Transactions on Information and Systems10.1587/transinf.2023MPL0001E107.D:3(268-272)Online publication date: 1-Mar-2024
  • (2024)EEG as a potential ground truth for the assessment of cognitive state in software development activities: A multimodal imaging studyPLOS ONE10.1371/journal.pone.029910819:3(e0299108)Online publication date: 7-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '16: Proceedings of the 38th International Conference on Software Engineering
May 2016
1235 pages
ISBN:9781450339001
DOI:10.1145/2884781
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICSE '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)16
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Prediction of Residual Defects after Code Review Based on Reviewer ConfidenceIEICE Transactions on Information and Systems10.1587/transinf.2023MPL0002E107.D:3(273-276)Online publication date: 1-Mar-2024
  • (2024)The Influence of Future Perspective on Job Satisfaction and Turnover Intention of Software EngineersIEICE Transactions on Information and Systems10.1587/transinf.2023MPL0001E107.D:3(268-272)Online publication date: 1-Mar-2024
  • (2024)EEG as a potential ground truth for the assessment of cognitive state in software development activities: A multimodal imaging studyPLOS ONE10.1371/journal.pone.029910819:3(e0299108)Online publication date: 7-Mar-2024
  • (2024)NeuroJIT: Improving Just-In-Time Defect Prediction Using Neurophysiological and Empirical Perceptions of Modern DevelopersProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695056(594-605)Online publication date: 27-Oct-2024
  • (2024)Predicting Code Comprehension: A Novel Approach to Align Human Gaze with Code using Deep Neural NetworksProceedings of the ACM on Software Engineering10.1145/36607951:FSE(1982-2004)Online publication date: 12-Jul-2024
  • (2024)How Do First-Year Engineering Students’ Emotions Change while Working on Programming Problems?ACM Transactions on Computing Education10.1145/364386524:2(1-30)Online publication date: 9-Feb-2024
  • (2024)Investigating the Impact of Emotions on the Quality of Novice Programmers’ CodeInformation Systems and Neuroscience10.1007/978-3-031-58396-4_7(67-78)Online publication date: 26-Jul-2024
  • (2024)Neurophysiological Measurements in the Research Field of Interruption Science: Insights into Applied Methods for Different Interruption Types Based on an Umbrella ReviewInformation Systems and Neuroscience10.1007/978-3-031-58396-4_11(123-152)Online publication date: 26-Jul-2024
  • (2023)A Comprehensive Taxonomy for Prediction Models in Software EngineeringInformation10.3390/info1402011114:2(111)Online publication date: 10-Feb-2023
  • (2023)On the accuracy of code complexity metrics: A neuroscience-based guideline for improvementFrontiers in Neuroscience10.3389/fnins.2022.106536616Online publication date: 7-Feb-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media