Keyword: benchmark : Search

survey

Open Access

JUST ACCEPTED

A Comprehensive Survey of Benchmarks for Improvement of Software's Non-Functional Properties

ACM Computing Surveys (CSUR), Just Accepted https://doi.org/10.1145/3711119

Despite recent increase in research on improvement of non-functional properties of software, such as energy usage or program size, there is a lack of standard benchmarks for such work. This absence hinders progress in the field, and raises questions about ...

research-article

Benchmarking and Categorizing the Performance of Neural Program Repair Systems for Java

ACM Transactions on Software Engineering and Methodology (TOSEM), Volume 34, Issue 1Article No.: 11, Pages 1–35https://doi.org/10.1145/3688834

Recent years have seen a rise in Neural Program Repair (NPR) systems in the software engineering community, which adopt advanced deep learning techniques to automatically fix bugs. Having a comprehensive understanding of existing systems can facilitate ...

poster

ASAG2024: A Combined Benchmark for Short Answer Grading

SIGCSE Virtual 2024: Proceedings of the 2024 on ACM Virtual Global Computing Education Conference V. 2Pages 322–323https://doi.org/10.1145/3649409.3691083

Open-ended questions test a more thorough understanding compared to closed-ended questions and are often a preferred assessment method. However, open-ended questions are tedious to grade and subject to personal bias. Therefore, there have been efforts to ...

research-article

Open Access

A Survey of General-purpose Polyhedral Compilers

ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 4Article No.: 72, Pages 1–26https://doi.org/10.1145/3674735

Since the 1990s, many implementations of polyhedral compilers have been written and distributed, either as source-to-source translating compilers or integrated into wider-purpose compilers. This article provides a survey on those various available ...

research-article

Open Access

The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition

MuSe'24: Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and HumorPages 1–9https://doi.org/10.1145/3689062.3689088

The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems: In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of ...

research-article

VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 11198–11201https://doi.org/10.1145/3664647.3685520

We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models ...

research-article

Open Access

SCREEN: A Benchmark for Situated Conversational Recommendation

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 9591–9600https://doi.org/10.1145/3664647.3681651

Engaging in conversational recommendations within a specific scenario represents a promising paradigm in the real world. Scenario-relevant situations often affect conversations and recommendations from two closely related aspects: varying the ...

research-article

PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 5742–5751https://doi.org/10.1145/3664647.3680982

With the increased attention to model efficiency, post-training sparsity (PTS) has become more and more prevalent because of its effectiveness and efficiency. However, there remain questions on better practice of PTS algorithms and the sparsification ...

research-article

Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 3257–3265https://doi.org/10.1145/3664647.3680896

A plethora of text-guided image editing methods has recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models especially Stable Diffusion. Despite the success of diffusion models in producing high-...

research-article

ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code

ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software EngineeringPages 1895–1906https://doi.org/10.1145/3691620.3695552

In recent years, with the widespread attention of academia and industry on the application of large language models (LLMs) to code-related tasks, an increasing number of large code models (LCMs) have been proposed and corresponding evaluation benchmarks ...

research-article

RepoSim: Evaluating Prompt Strategies for Code Completion via User Behavior Simulation

ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software EngineeringPages 2279–2283https://doi.org/10.1145/3691620.3695299

Large language models (LLMs) have revolutionized code completion tasks. IDE plugins such as MarsCode can generate code recommendations, saving developers significant time and effort. However, current evaluation methods for code completion are limited by ...

research-article

Open Access

LLP-Bench: A Large Scale Tabular Benchmark for Learning from Label Proportions

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge ManagementPages 4374–4381https://doi.org/10.1145/3627673.3680032

With large neural models becoming increasingly accurate and powerful, they have raised privacy and transparency concerns on data usage. Therefore, data platforms, regulations and user expectations are rapidly evolving leading to enforcing privacy via ...

short-paper

Introducing CausalBench: A Flexible Benchmark Framework for Causal Analysis and Machine Learning

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge ManagementPages 5220–5224https://doi.org/10.1145/3627673.3679218

While witnessing the exceptional success of machine learning (ML) technologies in many applications, users are starting to notice a critical shortcoming of ML: correlation is a poor substitute for causation. The conventional way to discover causal ...

short-paper

Advancing Multivariate Time Series Anomaly Detection: A Comprehensive Benchmark with Real-World Data from Alibaba Cloud

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge ManagementPages 5410–5414https://doi.org/10.1145/3627673.3679128

Time series anomaly detection is of significant importance in many real-world applications, including finance, healthcare, network security, industrial equipment, complex computing systems, and space probes. Most of these applications involve multi-...

survey

Open Access

Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit

ACM Computing Surveys (CSUR), Volume 56, Issue 12Article No.: 309, Pages 1–41https://doi.org/10.1145/3664597

Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora, with the aim of developing intelligent tools to improve the quality and productivity of computer programming. Currently, there is already a thriving ...

Article

AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale

Computer Vision – ECCV 2024Pages 490–508https://doi.org/10.1007/978-3-031-73223-2_27

Abstract

While reconstructing human poses in 3D from inexpensive sensors has advanced significantly in recent years, quantifying the dynamics of human motion, including the muscle-generated joint torques and external forces, remains a challenge. Prior ...

Article

MMVR: Millimeter-Wave Multi-view Radar Dataset and Benchmark for Indoor Perception

Computer Vision – ECCV 2024Pages 306–322https://doi.org/10.1007/978-3-031-72986-7_18

Abstract

Compared with an extensive list of automotive radar datasets that support autonomous driving, indoor radar datasets are scarce at a smaller scale in the format of low-resolution radar point clouds and usually under an open-space single-room ...

Article

OpenPSS: An Open Page Stream Segmentation Benchmark

Linking Theory and Practice of Digital LibrariesPages 413–429https://doi.org/10.1007/978-3-031-72437-4_24

Abstract

In recent years, an increasing number of companies and institutions have begun the process of digitizing their physical records to promote digital access and searchability of their collections. For cost-efficiency, documents are often scanned in ...

research-article

Open Access

A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We?

ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and AnalysisPages 223–234https://doi.org/10.1145/3650212.3652123

Log data have facilitated various tasks of software development and maintenance, such as testing, debugging and diagnosing. Due to the unstructured nature of logs, log parsing is typically required to transform log messages into structured data for ...

research-article

Open Access

Generating Feature Models with UVL's Full Expressiveness

SPLC '24: Proceedings of the 28th ACM International Systems and Software Product Line ConferencePages 61–65https://doi.org/10.1145/3646548.3676602

The Universal Variability Language (UVL) is a textual format for specifying feature models. UVL has optional language levels (i.e., extensions) that add more expressive functionality to the base language, such as numerical constraints over attributes. ...

Applied Filters

People

Names

Institutions

Authors

Editors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Paper Award

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences