TOSEM: Vol 34, No 1

editorial

Free

Editorial: TOSEM Journal in 2025 and Beyond

Abhik Roychoudhury

Article No.: 1, Pages 1–3https://doi.org/10.1145/3708477

TOSEM is ACM’s flagship journal for publishing software engineering (SE) research. TOSEM stays true to the foundations of the discipline while meaningfully engaging with the wave of disruptive innovations in the field. In this light, we discuss the plans ...

SECTION: Regular Papers

research-article

Automating TODO-missed Methods Detection and Patching

Article No.: 2, Pages 1–28https://doi.org/10.1145/3700793

TODO comments are widely used by developers to remind themselves or others about incomplete tasks. In other words, TODO comments are usually associated with temporary or suboptimal solutions. In practice, all the equivalent suboptimal implementations ...

research-article

Open Access

On Process Discovery Experimentation: Addressing the Need for Research Methodology in Process Discovery

Article No.: 3, Pages 1–29https://doi.org/10.1145/3672447

Process mining aims to derive insights into business processes from event logs recorded from information systems. Process discovery algorithms construct process models that describe the executed process. With the increasing availability of large-scale ...

research-article

Understanding Test Convention Consistency as a Dimension of Test Quality

Article No.: 4, Pages 1–39https://doi.org/10.1145/3672448

Unit tests must be readable to help developers understand and evolve production code. Most existing test quality metrics assess test code’s ability to detect bugs. Few metrics focus on test code’s readability. One standard approach to improve readability ...

research-article

Test Case Minimization with Quantum Annealers

Article No.: 5, Pages 1–24https://doi.org/10.1145/3680467

Quantum annealers are specialized quantum computers for solving combinatorial optimization problems with special quantum computing characteristics, e.g., superposition and entanglement. Theoretically, quantum annealers can outperform classic computers. ...

research-article

Open Access

On the Understandability of Design-Level Security Practices in Infrastructure-as-Code Scripts and Deployment Architectures

Article No.: 6, Pages 1–37https://doi.org/10.1145/3691630

Infrastructure as Code (IaC) automates IT infrastructure deployment, which is particularly beneficial for continuous releases, for instance, in the context of microservices and cloud systems. Despite its flexibility in application architecture, ...

research-article

An Empirical Study of Testing Machine Learning in the Wild

Article No.: 7, Pages 1–63https://doi.org/10.1145/3680463

Background: Recently, machine and deep learning (ML/DL) algorithms have been increasingly adopted in many software systems. Due to their inductive nature, ensuring the quality of these systems remains a significant challenge for the research community. ...

research-article

Context-Aware Fuzzing for Robustness Enhancement of Deep Learning Models

Article No.: 8, Pages 1–68https://doi.org/10.1145/3680464

In the testing-retraining pipeline for enhancing the robustness property of deep learning (DL) models, many state-of-the-art robustness-oriented fuzzing techniques are metric-oriented. The pipeline generates adversarial examples as test cases via such a ...

research-article

MalSensor: Fast and Robust Windows Malware Classification

Article No.: 9, Pages 1–28https://doi.org/10.1145/3688833

Driven by the substantial profits, the evolution of Portable Executable (PE) malware has posed persistent threats. PE malware classification has been an important research field, and numerous classification methods have been proposed. With the development ...

research-article

Open Access

Reputation Gaming in Crowd Technical Knowledge Sharing

Article No.: 10, Pages 1–41https://doi.org/10.1145/3691627

Stack Overflow incentive system awards users with reputation scores to ensure quality. The decentralized nature of the forum may make the incentive system prone to manipulation. This article offers, for the first time, a comprehensive study of the ...

research-article

Benchmarking and Categorizing the Performance of Neural Program Repair Systems for Java

Article No.: 11, Pages 1–35https://doi.org/10.1145/3688834

Recent years have seen a rise in Neural Program Repair (NPR) systems in the software engineering community, which adopt advanced deep learning techniques to automatically fix bugs. Having a comprehensive understanding of existing systems can facilitate ...

research-article

Measuring and Mining Community Evolution in Developer Social Networks with Entropy-Based Indices

Article No.: 12, Pages 1–43https://doi.org/10.1145/3688832

This work presents four novel entropy-based indices for measuring the community evolution of developer social networks (DSNs) in open source software (OSS) projects. The proposed indices offer a quantitative measure of community split, shrink, merge, and ...

research-article

Solving the t-Wise Coverage Maximum Problem via Effective and Efficient Local Search-Based Sampling

Article No.: 13, Pages 1–64https://doi.org/10.1145/3688836

To meet the increasing demand for customized software, highly configurable systems become essential in practice. Such systems offer many options to configure, and ensuring the reliability of these systems is critical. A widely used evaluation metric for ...

research-article

Fine-Tuning Large Language Models to Improve Accuracy and Comprehensibility of Automated Code Review

Article No.: 14, Pages 1–26https://doi.org/10.1145/3695993

As code review is a tedious and costly software quality practice, researchers have proposed several machine learning-based methods to automate the process. The primary focus has been on accuracy, that is, how accurately the algorithms are able to detect ...

research-article

Neuron Semantic-Guided Test Generation for Deep Neural Networks Fuzzing

Article No.: 15, Pages 1–38https://doi.org/10.1145/3688835

In recent years, significant progress has been made in testing methods for deep neural networks (DNNs) to ensure their correctness and robustness. Coverage-guided criteria, such as neuron-wise, layer-wise, and path-/trace-wise, have been proposed for DNN ...

research-article

An Exploratory Study on Machine Learning Model Management

Article No.: 16, Pages 1–31https://doi.org/10.1145/3688841

Effective model management is crucial for ensuring performance and reliability in Machine Learning (ML) systems, given the dynamic nature of data and operational environments. However, standard practices are lacking, often resulting in ad hoc approaches. ...

SECTION: Continuous Special Section: AI and SE

research-article

SimClone: Detecting Tabular Data Clones Using Value Similarity

Article No.: 17, Pages 1–27https://doi.org/10.1145/3676961

Data clones are defined as multiple copies of the same data among datasets. The presence of data clones between datasets can cause issues such as difficulties in managing data assets and data license violations when using datasets with clones to build AI ...

research-article

MarMot: Metamorphic Runtime Monitoring of Autonomous Driving Systems

Article No.: 18, Pages 1–35https://doi.org/10.1145/3678171

Autonomous driving systems (ADSs) are complex cyber-physical systems (CPSs) that must ensure safety even in uncertain conditions. Modern ADSs often employ deep neural networks (DNNs), which may not produce correct results in every possible driving ...

research-article

Open Access

History-Driven Fuzzing for Deep Learning Libraries

Article No.: 19, Pages 1–29https://doi.org/10.1145/3688838

Recently, many Deep Learning (DL) fuzzers have been proposed for API-level testing of DL libraries. However, they either perform unguided input generation (e.g., not considering the relationship between API arguments when generating inputs) or only ...

research-article

Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality

Article No.: 20, Pages 1–31https://doi.org/10.1145/3678168

Bindings for machine learning frameworks (such as TensorFlow and PyTorch) allow developers to integrate a framework’s functionality using a programming language different from the framework’s default language (usually Python). In this article, we study ...

research-article

Don’t Complete It! Preventing Unhelpful Code Completion for Productive and Sustainable Neural Code Completion Systems

Article No.: 21, Pages 1–22https://doi.org/10.1145/3688831

Currently, large pre-trained language models are widely applied in neural code completion systems. Though large code models significantly outperform their smaller counterparts, around 70% of displayed code completions from Github Copilot are not accepted ...

research-article

CARL: Unsupervised Code-Based Adversarial Attacks for Programming Language Models via Reinforcement Learning

Article No.: 22, Pages 1–32https://doi.org/10.1145/3688839

Code based adversarial attacks play a crucial role in revealing vulnerabilities of software system. Recently, pre-trained programming language models (PLMs) have demonstrated remarkable success in various significant software engineering tasks, ...

SECTION: Continuous Special Section: Security and SE

research-article

Open Access

Decision Support Model for Selecting the Optimal Blockchain Oracle Platform: An Evaluation of Key Factors

Article No.: 23, Pages 1–35https://doi.org/10.1145/3697011

Smart contract-based applications are executed in a blockchain environment, and they cannot directly access data from external systems, which is required for the service provision of these applications. Instead, smart contracts use agents known as ...

SECTION: Continuous Special Section: Human-Centric SE

research-article

Diversity’s Double-Edged Sword: Analyzing Race’s Effect on Remote Pair Programming Interactions

Article No.: 24, Pages 1–45https://doi.org/10.1145/3699601

Remote pair programming is widely used in software development, but no research has examined how race affects these interactions between developers. We embarked on this study due to the historical underrepresentation of Black developers in the tech ...

SECTION: Survey

survey

Open Access

Deep Configuration Performance Learning: A Systematic Survey and Taxonomy

Article No.: 25, Pages 1–62https://doi.org/10.1145/3702986

Performance is arguably the most crucial attribute that reflects the quality of a configurable software system. However, given the increasing scale and complexity of modern software, modeling and predicting how various configurations can impact ...

SECTION: Registered Paper

research-article

DiPri: Distance-Based Seed Prioritization for Greybox Fuzzing

Article No.: 26, Pages 1–39https://doi.org/10.1145/3654440

Greybox fuzzing is a powerful testing technique. Given a set of initial seeds, greybox fuzzing continuously generates new test inputs to execute the program under test and drives executions with code coverage as feedback. Seed prioritization is an ...

SECTION: Replicated Computational Results (RCR) Report

research-article

DiPri: Distance-Based Seed Prioritization for Greybox Fuzzing—RCR Report

Article No.: 27, Pages 1–13https://doi.org/10.1145/3701298

This replicated computational results (RCR) report describes how to (1) set up DiPri and (2) replicate the experimental results. The primary artifact is the C/C++ prototype of DiPri, which is essentially an extension of the state-of-the-art greybox fuzzer ...

ACM Transactions on Software Engineering and Methodology

Sections

Issue Downloads

Editorial: TOSEM Journal in 2025 and Beyond

Automating TODO-missed Methods Detection and Patching

On Process Discovery Experimentation: Addressing the Need for Research Methodology in Process Discovery

Understanding Test Convention Consistency as a Dimension of Test Quality

Test Case Minimization with Quantum Annealers

On the Understandability of Design-Level Security Practices in Infrastructure-as-Code Scripts and Deployment Architectures

An Empirical Study of Testing Machine Learning in the Wild

Context-Aware Fuzzing for Robustness Enhancement of Deep Learning Models

MalSensor: Fast and Robust Windows Malware Classification

Reputation Gaming in Crowd Technical Knowledge Sharing

Benchmarking and Categorizing the Performance of Neural Program Repair Systems for Java

Measuring and Mining Community Evolution in Developer Social Networks with Entropy-Based Indices

Solving the t-Wise Coverage Maximum Problem via Effective and Efficient Local Search-Based Sampling

Fine-Tuning Large Language Models to Improve Accuracy and Comprehensibility of Automated Code Review

Neuron Semantic-Guided Test Generation for Deep Neural Networks Fuzzing

An Exploratory Study on Machine Learning Model Management

SimClone: Detecting Tabular Data Clones Using Value Similarity

MarMot: Metamorphic Runtime Monitoring of Autonomous Driving Systems

History-Driven Fuzzing for Deep Learning Libraries

Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality

Don’t Complete It! Preventing Unhelpful Code Completion for Productive and Sustainable Neural Code Completion Systems

CARL: Unsupervised Code-Based Adversarial Attacks for Programming Language Models via Reinforcement Learning

Decision Support Model for Selecting the Optimal Blockchain Oracle Platform: An Evaluation of Key Factors

Diversity’s Double-Edged Sword: Analyzing Race’s Effect on Remote Pair Programming Interactions

Deep Configuration Performance Learning: A Systematic Survey and Taxonomy

DiPri: Distance-Based Seed Prioritization for Greybox Fuzzing

DiPri: Distance-Based Seed Prioritization for Greybox Fuzzing—RCR Report

Sections

Issue Downloads

Save to Binder

Subjects

Comments