Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- surveyJanuary 2025JUST ACCEPTED
A Comprehensive Survey of Benchmarks for Improvement of Software's Non-Functional Properties
Despite recent increase in research on improvement of non-functional properties of software, such as energy usage or program size, there is a lack of standard benchmarks for such work. This absence hinders progress in the field, and raises questions about ...
- research-articleDecember 2024
Benchmarking and Categorizing the Performance of Neural Program Repair Systems for Java
ACM Transactions on Software Engineering and Methodology (TOSEM), Volume 34, Issue 1Article No.: 11, Pages 1–35https://doi.org/10.1145/3688834Recent years have seen a rise in Neural Program Repair (NPR) systems in the software engineering community, which adopt advanced deep learning techniques to automatically fix bugs. Having a comprehensive understanding of existing systems can facilitate ...
- posterDecember 2024
ASAG2024: A Combined Benchmark for Short Answer Grading
SIGCSE Virtual 2024: Proceedings of the 2024 on ACM Virtual Global Computing Education Conference V. 2Pages 322–323https://doi.org/10.1145/3649409.3691083Open-ended questions test a more thorough understanding compared to closed-ended questions and are often a preferred assessment method. However, open-ended questions are tedious to grade and subject to personal bias. Therefore, there have been efforts to ...
- research-articleNovember 2024
A Survey of General-purpose Polyhedral Compilers
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 4Article No.: 72, Pages 1–26https://doi.org/10.1145/3674735Since the 1990s, many implementations of polyhedral compilers have been written and distributed, either as source-to-source translating compilers or integrated into wider-purpose compilers. This article provides a survey on those various available ...
- research-articleOctober 2024
The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition
- Shahin Amiriparian,
- Lukas Christ,
- Alexander Kathan,
- Maurice Gerczuk,
- Niklas Müller,
- Steffen Klug,
- Lukas Stappen,
- Andreas König,
- Erik Cambria,
- Björn W. Schuller,
- Simone Eulitz
MuSe'24: Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and HumorPages 1–9https://doi.org/10.1145/3689062.3689088The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems: In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of ...
-
- research-articleOctober 2024
VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models
- Haodong Duan,
- Junming Yang,
- Yuxuan Qiao,
- Xinyu Fang,
- Lin Chen,
- Yuan Liu,
- Xiaoyi Dong,
- Yuhang Zang,
- Pan Zhang,
- Jiaqi Wang,
- Dahua Lin,
- Kai Chen
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 11198–11201https://doi.org/10.1145/3664647.3685520We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models ...
- research-articleOctober 2024
SCREEN: A Benchmark for Situated Conversational Recommendation
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 9591–9600https://doi.org/10.1145/3664647.3681651Engaging in conversational recommendations within a specific scenario represents a promising paradigm in the real world. Scenario-relevant situations often affect conversations and recommendations from two closely related aspects: varying the ...
- research-articleOctober 2024
PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 5742–5751https://doi.org/10.1145/3664647.3680982With the increased attention to model efficiency, post-training sparsity (PTS) has become more and more prevalent because of its effectiveness and efficiency. However, there remain questions on better practice of PTS algorithms and the sparsification ...
- research-articleOctober 2024
Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 3257–3265https://doi.org/10.1145/3664647.3680896A plethora of text-guided image editing methods has recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models especially Stable Diffusion. Despite the success of diffusion models in producing high-...
- research-articleOctober 2024
ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code
ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software EngineeringPages 1895–1906https://doi.org/10.1145/3691620.3695552In recent years, with the widespread attention of academia and industry on the application of large language models (LLMs) to code-related tasks, an increasing number of large code models (LCMs) have been proposed and corresponding evaluation benchmarks ...
- research-articleOctober 2024
RepoSim: Evaluating Prompt Strategies for Code Completion via User Behavior Simulation
ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software EngineeringPages 2279–2283https://doi.org/10.1145/3691620.3695299Large language models (LLMs) have revolutionized code completion tasks. IDE plugins such as MarsCode can generate code recommendations, saving developers significant time and effort. However, current evaluation methods for code completion are limited by ...
- research-articleOctober 2024
LLP-Bench: A Large Scale Tabular Benchmark for Learning from Label Proportions
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge ManagementPages 4374–4381https://doi.org/10.1145/3627673.3680032With large neural models becoming increasingly accurate and powerful, they have raised privacy and transparency concerns on data usage. Therefore, data platforms, regulations and user expectations are rapidly evolving leading to enforcing privacy via ...
- short-paperOctober 2024
Introducing CausalBench: A Flexible Benchmark Framework for Causal Analysis and Machine Learning
- Ahmet Kapkiç,
- Pratanu Mandal,
- Shu Wan,
- Paras Sheth,
- Abhinav Gorantla,
- Yoonhyuk Choi,
- Huan Liu,
- K. Selçuk Candan
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge ManagementPages 5220–5224https://doi.org/10.1145/3627673.3679218While witnessing the exceptional success of machine learning (ML) technologies in many applications, users are starting to notice a critical shortcoming of ML: correlation is a poor substitute for causation. The conventional way to discover causal ...
- short-paperOctober 2024
Advancing Multivariate Time Series Anomaly Detection: A Comprehensive Benchmark with Real-World Data from Alibaba Cloud
- Chaoli Zhang,
- Yingying Zhang,
- Lanshu Peng,
- Qingsong Wen,
- Yiyuan Yang,
- Chongjiong Fan,
- Minqi Jiang,
- Lunting Fan,
- Liang Sun
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge ManagementPages 5410–5414https://doi.org/10.1145/3627673.3679128Time series anomaly detection is of significant importance in many real-world applications, including finance, healthcare, network security, industrial equipment, complex computing systems, and space probes. Most of these applications involve multi-...
- surveyOctober 2024
Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit
ACM Computing Surveys (CSUR), Volume 56, Issue 12Article No.: 309, Pages 1–41https://doi.org/10.1145/3664597Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora, with the aim of developing intelligent tools to improve the quality and productivity of computer programming. Currently, there is already a thriving ...
- ArticleNovember 2024
AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale
- Keenon Werling,
- Janelle Kaneda,
- Tian Tan,
- Rishi Agarwal,
- Six Skov,
- Tom Van Wouwe,
- Scott Uhlrich,
- Nicholas Bianco,
- Carmichael Ong,
- Antoine Falisse,
- Shardul Sapkota,
- Aidan Chandra,
- Joshua Carter,
- Ezio Preatoni,
- Benjamin Fregly,
- Jennifer Hicks,
- Scott Delp,
- C. Karen Liu
AbstractWhile reconstructing human poses in 3D from inexpensive sensors has advanced significantly in recent years, quantifying the dynamics of human motion, including the muscle-generated joint torques and external forces, remains a challenge. Prior ...
- ArticleNovember 2024
MMVR: Millimeter-Wave Multi-view Radar Dataset and Benchmark for Indoor Perception
AbstractCompared with an extensive list of automotive radar datasets that support autonomous driving, indoor radar datasets are scarce at a smaller scale in the format of low-resolution radar point clouds and usually under an open-space single-room ...
- ArticleSeptember 2024
OpenPSS: An Open Page Stream Segmentation Benchmark
Linking Theory and Practice of Digital LibrariesPages 413–429https://doi.org/10.1007/978-3-031-72437-4_24AbstractIn recent years, an increasing number of companies and institutions have begun the process of digitizing their physical records to promote digital access and searchability of their collections. For cost-efficiency, documents are often scanned in ...
A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We?
- Zhihan Jiang,
- Jinyang Liu,
- Junjie Huang,
- Yichen Li,
- Yintong Huo,
- Jiazhen Gu,
- Zhuangbin Chen,
- Jieming Zhu,
- Michael R. Lyu
ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and AnalysisPages 223–234https://doi.org/10.1145/3650212.3652123Log data have facilitated various tasks of software development and maintenance, such as testing, debugging and diagnosing. Due to the unstructured nature of logs, log parsing is typically required to transform log messages into structured data for ...
- research-articleSeptember 2024
Generating Feature Models with UVL's Full Expressiveness
SPLC '24: Proceedings of the 28th ACM International Systems and Software Product Line ConferencePages 61–65https://doi.org/10.1145/3646548.3676602The Universal Variability Language (UVL) is a textual format for specifying feature models. UVL has optional language levels (i.e., extensions) that add more expressive functionality to the base language, such as numerical constraints over attributes. ...