research-article

Action-based Recommendation in Pull-request Development

Authors:

Muhammad Ilyas Azeem,

Sebastiano Panichella,

Andrea Di Sorbo,

Alexander Serebrenik,

Qing WangAuthors Info & Claims

ICSSP '20: Proceedings of the International Conference on Software and System Processes

Pages 115 - 124

https://doi.org/10.1145/3379177.3388904

Published: 16 September 2020 Publication History

Abstract

Pull requests (PRs) selection is a challenging task faced by integrators in pull-based development (PbD), with hundreds of PRs submitted on a daily basis to large open-source projects. Managing these PRs manually consumes integrators' time and resources and may lead to delays in the acceptance, response, or rejection of PRs that can propose bug fixes or feature enhancements. On the one hand, well-known platforms for performing PbD, like GitHub, do not provide built-in recommendation mechanisms for facilitating the management of PRs. On the other hand, prior research on PRs recommendation has focused on the likelihood of either a PR being accepted or receive a response by the integrator. In this paper, we consider both those likelihoods, this to help integrators in the PRs selection process by suggesting to them the appropriate actions to undertake on each specific PR. To this aim, we propose an approach, called CARTESIAN (aCceptance And Response classificaTion-based requESt IdentificAtioN) modeling the PRs recommendation according to PR actions. In particular, CARTESIAN is able to recommend three types of PR actions: accept, respond, and reject. We evaluated CARTESIAN on the PRs of 19 popular GitHub projects. The results of our study demonstrate that our approach can identify PR actions with an average precision and recall of about 86%. Moreover, our findings also highlight that CARTESIAN outperforms the results of two baseline approaches in the task of PRs selection.

References

[1]

Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 2011. Modern Information Retrieval the Concepts and Technology Behind Search. DBLP.

[2]

Earl T. Barr, Christian Bird, Peter C. Rigby, Abram Hindle, Daniel M. German, and Premkumar Devanbu. 2012. Cohesive and Isolated Development with Branches. In Fundamental Approaches to Software Engineering, Juan de Lara and Andrea Zisman (Eds.). Springer Berlin Heidelberg, 316--331.

[3]

Yoav Benjamini and Daniel Yekutieli. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29, 4 (08 2001), 1165--1188. https://doi.org/10.1214/aos/1013699998

[4]

Christian Bird and Alberto Bacchelli. 2013. Expectations, Outcomes, and Challenges of Modern Code Review. IEEE. https://www.microsoft.com/en-us/research/publication/expectations-outcomes-and-challenges-of-modern-code-review/

[5]

H. Borges, A. Hora, and M. T. Valente. 2016. Understanding the Factors That Impact the Popularity of GitHub Repositories. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). 334--344. https://doi.org/10.1109/ICSME.2016.31

[6]

C. Chen, S. Gao, and Z.Xing. 2016. Mining Analogical Libraries in Q A Discussions -- Incorporating Relational and Categorical Knowledge into Word Embedding. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. 338--348. https://doi.org/10.1109/SANER.2016.21

[7]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785--794.

Digital Library

[8]

D. J. Dittman, T. M. Khoshgoftaar, and A. Napolitano. 2015. The Effect of Data Sampling When Using Random Forest on Imbalanced Bioinformatics Data. In 2015 IEEE International Conference on Information Reuse and Integration. 457--463. https://doi.org/10.1109/IRI.2015.76

[9]

Felipe Ebert, Fernando Castor, Nicole Novielli, and Alexander Serebrenik. 2019. Confusion in Code Reviews: Reasons, Impacts, and Coping Strategies. In 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24--27, 2019, Xinyu Wang, David Lo, and Emad Shihab (Eds.). IEEE, 49--60. https://doi.org/10.1109/SANER.2019.8668024

[10]

Yuanrui Fan, Xin Xia, David Lo, and Shanping Li. 2018. Early prediction of merged code changes to prioritize reviewing tasks. Empirical Software Engineering 23, 6 (01 Dec 2018), 3346--3393. https://doi.org/10.1007/s10664--018--9602--0

[11]

Denae Ford, Mahnaz Behroozi, Alexander Serebrenik, and Chris Parnin. 2019. Beyond the code itself: how programmers really look at pull requests. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Society, ICSE 2019, Montreal, QC, Canada, May 25--31, 2019, Rick Kazman and Liliana Pasquale (Eds.). ACM, 51--60. https://doi.org/10.1109/ICSE-SEIS.2019.00014

[12]

Robin Genuer, Jean-Michel Poggi, and Christine Tuleau-Malot. 2010. Variable selection using random forests. Pattern Recognition Letters 31, 14 (2010), 2225--2236. https://doi.org/10.1016/j.patrec.2010.03.014

Digital Library

[13]

K. V. Ghag and K. Shah. 2015. Comparative analysis of effect of stopwords removal on sentiment classification. In 2015 International Conference on Computer, Communication and Control (IC4). 1--6. https://doi.org/10.1109/IC4.2015.7375527

[14]

Georgios Gousios, Martin Pinzger, and Arie van Deursen. 2014. An Exploratory Study of the Pull-based Software Development Model. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). ACM, New York, NY, USA, 345--355. https://doi.org/10.1145/2568225.2568260

Digital Library

[15]

G. Gousios, M. Storey, and A. Bacchelli. 2016. Work Practices and Challenges in Pull-Based Development: The Contributor's Perspective. In International Conference on Software Engineering (ICSE). 285--296.

[16]

G. Gousios, A. Zaidman, M. Storey, and A. v. Deursen. 2015. Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. 358--368. https://doi.org/10.1109/ICSE.2015.55

[17]

Tin Kam Ho. 1995. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, Vol. 1. IEEE, 278--282.

Digital Library

[18]

Jing Jiang, Yun Yang, Jiahuan He, Xavier Blanc, and Li Zhang. 2017. Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development. Information and Software Technology 84 (2017), 48--62. https://doi.org/10.1016/j.infsof.2016.10.006

Digital Library

[19]

Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. Germán, and Daniela E. Damian. 2014. The promises and perils of mining GitHub. In 11th Working Conference on Mining Software Repositories, MSR 2014, Proceedings, May 31-June 1, 2014, Hyderabad, India. 92--101. https://doi.org/10.1145/2597073.2597074

Digital Library

[20]

Zhifang Liao, Yanbing Li, Dayu He, Jinsong Wu, Yan Zhang, and Xiaoping Fan. 2017. Topic-Based Integrator Matching for Pull Request. GLOBECOM 2017-2017 IEEE Global Communications Conference (2017), 1--6.

[21]

J. Liu, J. Li, and L. He. 2016. A Comparative Study of the Effects of Pull Request on GitHub Projects. In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), Vol. 1. 313--322. https://doi.org/10.1109/COMPSAC.2016. 27

[22]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. CoRR abs/1310.4546 (2013). arXiv:1310.4546 http://arxiv.org/abs/1310.4546

[23]

Audris Mockus, Roy T. Fielding, and James D. Herbsleb. 2002. Two Case Studies of Open Source Software Development: Apache and Mozilla. ACM Trans. Softw. Eng. Methodol. 11, 3 (July 2002), 309--346. https://doi.org/10.1145/567793.567795

Digital Library

[24]

Abdillah Mohamed, Li Zhang, Jing Jiang, and Ahmed Ktob. 2018. Predicting Which Pull Requests Will Get Reopened in GitHub. In 25th Asia-Pacific Software Engineering Conference, APSEC 2018, Nara, Japan, December 4-7, 2018. 375--385. https://doi.org/10.1109/APSEC.2018.00052

[25]

William S Noble. 2006. What is a support vector machine? Nature biotechnology 24, 12 (2006), 1565.

[26]

Sebastiano Panichella. 2018. Summarization techniques for code, change, testing, and user feedback (Invited paper). In 2018 IEEE Workshop on Validation, Analysis and Evolution of Software Tests, VST@SANER 2018, Campobasso, Italy, March 20, 2018, Cyrille Artho and Rudolf Ramler (Eds.). IEEE, 1--5. https://doi.org/10.1109/VST.2018.8327148

[27]

Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado Aaron Visaggio, Gerardo Canfora, and Harald C. Gall. 2015. How can i improve my app? Classifying user reviews for software maintenance and evolution. In 2015 IEEE International Conference on Software Maintenance and Evolution, ICSME 2015, Bremen, Germany, September 29-October 1, 2015, Rainer Koschke, Jens Krinke, and Martin P. Robillard (Eds.). IEEE Computer Society, 281--290. https://doi.org/10.1109/ICSM.2015.7332474

Digital Library

[28]

Martin Porter. [n.d.]. The Porter stemmer Algorithm. http://tartarus.org/~martin/PorterStemmer/. Accessed October 23, 2019.

[29]

Mohammad Masudur Rahman and Chanchal K. Roy. 2014. An Insight into the Pull Requests of GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (Hyderabad, India) (MSR 2014). ACM, New York, NY, USA, 364--367. https://doi.org/10.1145/2597073.2597121

Digital Library

[30]

C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano. 2010. RUSBoost: A Hybrid Approach to Alleviating Class Imbalance. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 40, 1 (Jan 2010), 185--197. https://doi.org/10.1109/TSMCA.2009.2029559

Digital Library

[31]

Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When Do Changes Induce Fixes? SIGSOFT Softw. Eng. Notes 30, 4 (May 2005), 1--5. https://doi.org/10.1145/1082983.1083147

Digital Library

[32]

Andrea Di Sorbo, Sebastiano Panichella, Carol V. Alexandru, Junji Shimagaki, Corrado Aaron Visaggio, Gerardo Canfora, and Harald C. Gall. 2016. What would users change in my app? summarizing app reviews for recommending software changes. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016, Thomas Zimmermann, Jane Cleland-Huang, and Zhendong Su (Eds.). ACM, 499--510. https://doi.org/10.1145/2950290.2950299

Digital Library

[33]

Patanamon Thongtanunam, Raula Gaikovina Kula, Ana Erika Camargo Cruz, Norihiro Yoshida, and Hajimu Iida. 2014. Improving Code Review Effectiveness Through Reviewer Recommendations. In Proceedings of the 7th International Workshop on Cooperative and Human Aspects of Software Engineering (Hyderabad, India) (CHASE 2014). ACM, New York, NY, USA, 119--122. https://doi.org/10.1145/2593702.2593705

Digital Library

[34]

Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Influence of Social and Technical Factors for Evaluating Contribution in GitHub. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India). ACM, New York, NY, USA, 356--366. https://doi.org/10.1145/2568225.2568315

Digital Library

[35]

E. v. d. Veen, G. Gousios, and A. Zaidman. 2015. Automatically Prioritizing Pull Requests. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 357--361. https://doi.org/10.1109/MSR.2015.40

[36]

Strother H Walker and David B Duncan. 1967. Estimation of the probability of an event as a function of several independent variables. Biometrika 54, 1-2 (1967), 167--179.

[37]

Yi Wang and David Redmiles. 2016. Cheap talk, cooperation, and trust in global software engineering. Empirical Software Engineering 21, 6 (01 Dec 2016), 2233--2267. https://doi.org/10.1007/s10664-015-9407-3

[38]

X. Ye, H. Shen, X. Ma, R. Bunescu, and C. Liu. 2016. From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering. In International Conference on Software Engineering. 404--415.

[39]

H. Ying, L. Chen, T. Liang, and J. Wu. 2016. EARec: Leveraging Expertise and Authority for Pull-Request Reviewer Recommendation in GitHub. In 2016 IEEE/ACM 3rd International Workshop on CrowdSourcing in Software Engineering (CSI-SE). 29--35. https://doi.org/10.1109/CSI-SE.2016.013

[40]

Y. Yu, H. Wang, V. Filkov, P. Devanbu, and B. Vasilescu. 2015. Wait for It: Determinants of Pull Request Evaluation Latency on GitHub. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 367--371. https://doi.org/10.1109/MSR.2015.42

[41]

Y. Yu, H. Wang, G. Yin, and C. X. Ling. 2014. Who Should Review this Pull-Request: Reviewer Recommendation to Expedite Crowd Collaboration. In 2014 21st Asia-Pacific Software Engineering Conference, Vol. 1. 335--342. https://doi.org/10.1109/APSEC.2014.57

Digital Library

[42]

Yue Yu, Huaimin Wang, Gang Yin, and Tao Wang. 2016. Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? Information and Software Technology 74 (2016), 204--218. https://doi.org/10.1016/j.infsof.2016.01.004

Digital Library

[43]

Yue Yu, Gang Yin, Tao Wang, Cheng Yang, and Huaimin Wang. 2016. Determinants of pull-based development in the context of continuous integration. Science China Information Sciences 59, 8 (18 Jul 2016), 080104. https://doi.org/10.1007/s11432-016-5595-8

[44]

Y. Zhang, G. Yin, Y. Yu, and H. Wang. 2014. A Exploratory Study of @-Mention in GitHub's Pull-Requests. In Asia-Pacific Software Engineering Conference. 343--350.

[45]

Guoliang Zhao, Daniel Alencar da Costa, and Ying Zou. 2019. Improving the pull requests review process using learning-to-rank algorithms. Empirical Software Engineering 24, 4 (2019), 2140--2170.

Digital Library

[46]

Y. Zhou, Y. Su, T. Chen, Z. Huang, H. C. Gall, and S. Panichella. 2020. User Review-Based Change File Localization for Mobile Applications. IEEE Transactions on Software Engineering (2020), 1--1. https://doi.org/10.1109/TSE.2020.2967383

Cited By

Yang LXu JZhang HWu FLyu JLi YBacchelli AFilkov VRay BZhou M(2024)GPP: A Graph-Powered Prioritizer for Code Review RequestsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3694990(104-116)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3694990
Liu BZhang HMa WKuang HYang YXu JGao SGao JRoychoudhury APaiva AAbreu RStorey M(2024)Mining Pull Requests to Detect Process Anomalies in Open Source Software DevelopmentProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639196(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639196
Yang LLiu BJia JXu JXue JZhang HBacchelli A(2024)Prioritizing code review requests to improve review efficiency: a simulation studyEmpirical Software Engineering10.1007/s10664-024-10575-030:1Online publication date: 12-Nov-2024
https://doi.org/10.1007/s10664-024-10575-0
Show More Cited By

Index Terms

Action-based Recommendation in Pull-request Development
1. Software and its engineering
  1. Software creation and management
    1. Collaboration in software development
    2. Software development process management
  2. Software notations and tools
    1. Software maintenance tools

Recommendations

Topic-Based Integrator Matching for Pull Request
GLOBECOM 2017 - 2017 IEEE Global Communications Conference
Pull Request (PR) is the main method for code contributions from the external contributors in GitHub. PR review is an essential part of open source software developments to maintain the quality of software. Matching a new PR for an appropriate integrator ...
Automatic generation of pull request descriptions
ASE '19: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering

Enabled by the pull-based development model, developers can easily contribute to a project through pull requests (PRs). When creating a PR, developers can add a free-form description to describe what changes are made in this PR and/or why. Such a ...
GitHub Actions: The Impact on the Pull Request Process
Abstract
Software projects frequently use automation tools to perform repetitive activities in the distributed software development process. Recently, GitHub introduced GitHub Actions, a feature providing automated workflows for software projects. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSSP '20: Proceedings of the International Conference on Software and System Processes

June 2020

208 pages

ISBN:9781450375122

DOI:10.1145/3379177

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Key Research and Development Program of China

Conference

ICSSP '20

Sponsor:

SIGSOFT

ICSSP '20: International Conference on Software and System Processes

June 26 - 28, 2020

Seoul, Republic of Korea

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
186
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)5

Reflects downloads up to 30 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang LXu JZhang HWu FLyu JLi YBacchelli AFilkov VRay BZhou M(2024)GPP: A Graph-Powered Prioritizer for Code Review RequestsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3694990(104-116)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3694990
Liu BZhang HMa WKuang HYang YXu JGao SGao JRoychoudhury APaiva AAbreu RStorey M(2024)Mining Pull Requests to Detect Process Anomalies in Open Source Software DevelopmentProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639196(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639196
Yang LLiu BJia JXu JXue JZhang HBacchelli A(2024)Prioritizing code review requests to improve review efficiency: a simulation studyEmpirical Software Engineering10.1007/s10664-024-10575-030:1Online publication date: 12-Nov-2024
https://doi.org/10.1007/s10664-024-10575-0
Yang LZhang HXu JLyu JZhou XShao DGao SBacchelli A(2024)A preliminary investigation on using multi-task learning to predict change performance in code reviewsEmpirical Software Engineering10.1007/s10664-024-10526-929:6Online publication date: 28-Sep-2024
https://doi.org/10.1007/s10664-024-10526-9
Chakroborti DSchneider KRoy C(2024)ReBack: recommending backports in social coding environmentsAutomated Software Engineering10.1007/s10515-024-00416-131:1Online publication date: 23-Feb-2024
https://doi.org/10.1007/s10515-024-00416-1
Di Sorbo AZampetti FVisaggio ADi Penta MPanichella S(2023)Automated Identification and Qualitative Characterization of Safety Concerns Reported in UAV Software PlatformsACM Transactions on Software Engineering and Methodology10.1145/356482132:3(1-37)Online publication date: 26-Apr-2023
https://dl.acm.org/doi/10.1145/3564821
Yang LLiu BJia JXue JXu JBacchelli AZhang H(2023)Evaluating Learning-to-Rank Models for Prioritizing Code Review Requests using Process Simulation2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00050(461-472)Online publication date: Mar-2023
https://doi.org/10.1109/SANER56733.2023.00050
Bugayenko YFarina MKruglov APedrycz WPlaksin YSucci G(2023)Automatically Prioritizing Tasks in Software DevelopmentIEEE Access10.1109/ACCESS.2023.330524911(90322-90334)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3305249
Olmedo AArévalo GCassol IPerez QUrtado CVauttier S(2023)Pull Requests Integration Process Optimization: An Empirical StudyEvaluation of Novel Approaches to Software Engineering10.1007/978-3-031-36597-3_8(155-178)Online publication date: 8-Jul-2023
https://doi.org/10.1007/978-3-031-36597-3_8
Chakroborti DSchneider KRoy CRastogi ATufano RBavota GArnaoudova VHaiduc S(2022)BackportsProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527920(636-647)Online publication date: 16-May-2022
https://dl.acm.org/doi/10.1145/3524610.3527920
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents