Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3155562.3155649guideproceedingsArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
Article
Free access

Predicting relevance of change recommendations

Published: 30 October 2017 Publication History

Abstract

Software change recommendation seeks to suggest artifacts (e.g., files or methods) that are related to changes made by a developer, and thus identifies possible omissions or next steps. While one obvious challenge for recommender systems is to produce accurate recommendations, a complimentary challenge is to rank recommendations based on their relevance . In this paper, we address this challenge for recommendation systems that are based on evolutionary coupling . Such systems use targeted association-rule mining to identify relevant patterns in a software systemâ s change history. Traditionally, this process involves ranking artifacts using interestingness measures such as confidence and support . However, these measures often fall short when used to assess recommendation relevance.
We propose the use of random forest classification models to assess recommendation relevance. This approach improves on past use of various interestingness measures by learning from previous change recommendations. We empirically evaluate our approach on fourteen open source systems and two systems from our industry partners. Furthermore, we consider complimenting two mining algorithms: Co-Change and Tarmaq. The results find that random forest classification significantly outperforms previous approaches, receives lower Brier scores, and has superior trade-off between precision and recall. The results are consistent across software system and mining algorithm.

References

[1]
R. Agrawal, T. Imielinski, and A. Swami. “Mining association rules between sets of items in large databases”. In: ACM SIGMOD International Conference on Management of Data. ACM, 1993, pp. 207–216.
[2]
R. Agrawal and R. Srikant. “Fast Algorithms for Mining Association Rules”. In: International Conference on Very Large Data Bases (VLDB). 1994, pp. 487–499.
[3]
E. Baralis et al. “Generalized association rule mining with constraints”. In: Information Sciences 194 (2012), pp. 68–84.
[4]
C. Bergmeir and J. M. Benítez. “On the use of cross-validation for time series predictor evaluation”. In: Information Sciences 191 (2012), pp. 192–213.
[5]
D. Beyer and A. Noack. “Clustering Software Artifacts Based on Frequent Common Changes”. In: International Workshop on Program Comprehension (IWPC). IEEE, 2005, pp. 259– 268.
[6]
S. Bohner and R. Arnold. Software Change Impact Analysis. CA, USA: IEEE, 1996.
[7]
L. Breiman. “Random Forests”. In: Machine Learning 45.1 (2001), pp. 5–32.
[8]
A. Buja, W. Stuetzle, and Y. Shen. “Loss functions for binary class probability estimation and classification: structure and application”. 2005.
[9]
R. Caruana and A. Niculescu-Mizil. “An empirical comparison of supervised learning algorithms”. In: Proceedings of the 23th International Conference on Machine Learning (2006), pp. 161–168.
[10]
W. Cheetham. “Case-Based Reasoning with Confidence”. In: European Workshop on Advances in Case-Based Reasoning (EWCBR). Lecture Notes in Computer Science, vol 1898. Springer, 2000, pp. 15–25.
[11]
W. Cheetham and J. Price. “Measures of Solution Accuracy in Case-Based Reasoning Systems”. In: European Conference on Case-Based Reasoning (ECCBR). Lecture Notes in Computer Science, vol 3155. Springer, 2004, pp. 106–118.
[12]
D. Cubranic et al. “Hipikat: a project memory for software development”. In: IEEE Transactions on Software Engineering 31.6 (2005), pp. 446–465.
[13]
S. Eick et al. “Does code decay? Assessing the evidence from change management data”. In: IEEE Transactions on Software Engineering 27.1 (2001), pp. 1–12.
[14]
H. Gall, K. Hajek, and M. Jazayeri. “Detection of logical coupling based on product release history”. In: IEEE International Conference on Software Maintenance (ICSM). IEEE, 1998, pp. 190–198.
[15]
H. Gall, M. Jazayeri, and J. Krajewski. “CVS release history data for detecting logical couplings”. In: International Workshop on Principles of Software Evolution (IWPSE). IEEE, 2003, pp. 13–23.
[16]
L. Geng and H. J. Hamilton. “Interestingness measures for data mining”. In: ACM Computing Surveys 38.3 (2006).
[17]
A. E. Hassan and R. Holt. “Predicting change propagation in software systems”. In: IEEE International Conference on Software Maintenance (ICSM). IEEE, 2004, pp. 284–293.
[18]
N. Jiang and L. Gruenwald. “Research issues in data stream association rule mining”. In: ACM SIGMOD Record 35.1 (2006), pp. 14–19.
[19]
H. Kagdi et al. “Blending conceptual and evolutionary couplings to support change impact analysis in source code”. In: Working Conference on Reverse Engineering (WCRE). 2010, pp. 119–128.
[20]
S. Kannan and R. Bhaskaran. “Association Rule Pruning based on Interestingness Measures with Clustering”. In: Journal of Computer Science 6.1 (2009), pp. 35–43.
[21]
T.-d. B. Le and D. Lo. “Beyond support and confidence: Exploring interestingness measures for rule-based specification mining”. In: International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 2015, pp. 331– 340.
[22]
T.-D. B. Le, D. Lo, and F. Thung. “Should I follow this fault localization tool’s output?” In: Empirical Software Engineering 20.5 (2015), pp. 1237–1274.
[23]
T.-D. B. Le, F. Thung, and D. Lo. “Predicting Effectiveness of IR-Based Bug Localization Techniques”. In: 2014 IEEE 25th International Symposium on Software Reliability Engineering. IEEE, 2014, pp. 335–345.
[24]
P. Lenca et al. “On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid”. In: European Journal of Operational Research 184.2 (2008), pp. 610–626.
[25]
W. Lin, S. A. Alvarez, and C. Ruiz. “Efficient Adaptive-Support Association Rule Mining for Recommender Systems”. In: Data Mining and Knowledge Discovery 6.1 (2002), pp. 83–105.
[26]
B. Liu, W. Hsu, and Y. Ma. “Pruning and summarizing the discovered associations”. In: SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 1999, pp. 125–134.
[27]
O. Maimon and L. Rokach. Data Mining and Knowledge Discovery Handbook. Ed. by O. Maimon and L. Rokach. Springer, 2010, p. 1383.
[28]
K. McGarry. “A survey of interestingness measures for knowledge discovery”. In: The Knowledge Engineering Review 20.01 (2005), p. 39.
[29]
L. Moonen et al. “Exploring the Effects of History Length and Age on Mining Software Change Impact”. In: IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM). 2016, pp. 207–216.
[30]
L. Moonen et al. “Practical Guidelines for Change Recommendation using Association Rule Mining”. In: International Conference on Automated Software Engineering (ASE). Singapore: IEEE, 2016.
[31]
C. Parnin and A. Orso. “Are automated debugging techniques actually helping programmers?” In: International Symposium on Software Testing and Analysis (ISSTA). ACM, 2011, p. 199.
[32]
P. Resnick and H. R. Varian. “Recommender systems”. In: Communications of the ACM 40.3 (1997), pp. 56–58.
[33]
R. Robbes, D. Pollet, and M. Lanza. “Logical Coupling Based on Fine-Grained Change Information”. In: Working Conference on Reverse Engineering (WCRE). IEEE, 2008, pp. 42–46.
[34]
T. Rolfsnes et al. “Generalizing the Analysis of Evolutionary Coupling for Software Change Impact Analysis”. In: International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 2016, pp. 201–212.
[35]
T. Rolfsnes et al. “Improving change recommendation using aggregated association rules”. In: International Conference on Mining Software Repositories (MSR). ACM, 2016, pp. 73–84.
[36]
R. Srikant, Q. Vu, and R. Agrawal. “Mining Association Rules with Item Constraints”. In: International Conference on Knowledge Discovery and Data Mining (KDD). AASI, 1997, pp. 67–73.
[37]
P.-N. Tan, V. Kumar, and J. Srivastava. “Selecting the right interestingness measure for association patterns”. In: International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2002, p. 32.
[38]
H. Toivonen et al. “Pruning and Grouping Discovered Association Rules”. In: Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases. Heraklion, Crete, Greece, 1995, pp. 47–52.
[39]
M.-C. Tseng and W.-Y. Lin. “Mining Generalized Association Rules with Multiple Minimum Supports”. In: Lecture Notes in Computer Science (LNCS). Vol. 2114. 2001, pp. 11–20.
[40]
A. T. T. Ying et al. “Predicting source code changes by mining change history”. In: IEEE Transactions on Software Engineering 30.9 (2004), pp. 574–586.
[41]
M. J. Zaki. “Generating non-redundant association rules”. In: SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2000, pp. 34–43.
[42]
M. B. Zanjani, G. Swartzendruber, and H. Kagdi. “Impact analysis of change requests on source code based on interaction and commit histories”. In: International Working Conference on Mining Software Repositories (MSR). 2014, pp. 162–171.
[43]
Z. Zheng, R. Kohavi, and L. Mason. “Real world performance of association rule algorithms”. In: SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2001, pp. 401–406.
[44]
T. Zimmermann et al. “Mining version histories to guide software changes”. In: IEEE Transactions on Software Engineering 31.6 (2005), pp. 429–445. Introduction Overall Approach Related Work Generating Change Recommendations Software Change Recommendation Targeted Association Rule Mining Association Rule Mining Algorithms Overview of Model Features Features of the Query Features of the Change History Features of the Recommendation Experiment Design Generating Change Recommendations Evaluation of Relevance Prediction Results and Discussion RQ 1: Comparison to Confidence as a Relevance Predictor RQ 2: Analysis of Features Threats to Validity Concluding Remarks

Cited By

View all
  • (2022)Revisiting the effect of branch handling strategies on change recommendationProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527870(162-172)Online publication date: 16-May-2022
  • (2021)An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps SolutionsACM Transactions on Software Engineering and Methodology10.1145/344787630:4(1-38)Online publication date: 23-Jul-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ASE '17: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering
October 2017
1033 pages
ISBN:9781538626849

Sponsors

Publisher

IEEE Press

Publication History

Published: 30 October 2017

Author Tags

  1. evolutionary coupling
  2. random forests
  3. recommendation confidence
  4. targeted association rule mining

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)7
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Revisiting the effect of branch handling strategies on change recommendationProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527870(162-172)Online publication date: 16-May-2022
  • (2021)An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps SolutionsACM Transactions on Software Engineering and Methodology10.1145/344787630:4(1-38)Online publication date: 23-Jul-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media