research-article

Detect Hidden Dependency to Untangle Commits

Authors:

Zhi JinAuthors Info & Claims

ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering

Pages 179 - 190

https://doi.org/10.1145/3691620.3694996

Published: 27 October 2024 Publication History

Get Access

Abstract

In collaborative software development, developers generally make code changes and commit the changes to the repositories. Among others, "making small, single-purpose commits" is considered the best practice for making commits, allowing the team to quickly understand the code changes. Rather than following best practices, developers often make tangled commits, which wrap code changes that implement different purposes. Such commits make it difficult for other developers to understand the code changes when conducting subsequent development. Early works on untangling code changes rely on human-specified heuristic rules or features, do not consider context, and are labor intensive. Recent works model the local context of code changes as a graph at the statement level, with statements as nodes and code dependencies as edges, and then cluster the changed statements. However, recent works ignore the hidden dependencies in the global context, e.g. a pair of tangled code changes may have no code dependency, and a pair of untangled code changes may have obvious code dependency. To solve this problem, we focus on detecting hidden dependencies among code changes. We model the global context of code changes as graphs at finer-grained, hierarchical levels, i.e., at both entity and statement levels. Then we propose a Heterogeneous Directed Graph Neural Network (HD-GNN) to detect hidden dependencies among code changes by aggregating the global context in both connected or disconnected entity-level subgraphs that intersected with the code changes. Evaluation of common C # and Java datasets with 1,612 and 14k tangled commits and manually validated datasets (MVD) with 600 commits shows that HD-GNN achieves an average enhancement of effectiveness of 25% and 19.2% compared to existing approaches and far superior to existing approaches in MVD, without sacrificing time efficiency.

References

[1]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.

Abstract

References

Index Terms

Recommendations

An empirical study of crash-inducing commits in Mozilla Firefox

Investigating the Relationship between Code Smell Agglomerations and Architectural Concerns: Similarities and Dissimilarities from Distributed, Service-Oriented, and Mobile Systems

Utilizing source code syntax patterns to detect bug inducing commits using machine learning models

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Badges

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations