Keyword: inference : Search

research-article

Free

PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 40, Pages 1–19https://doi.org/10.1109/SC41406.2024.00046

Inference of Large Language Models (LLMs) across computer clusters has become a focal point of research in recent times, with many acceleration techniques taking inspiration from CPU speculative execution. These techniques reduce bottlenecks associated ...

research-article

Performance Analysis of CNN Inference/Training with Convolution and Non-Convolution Operations on ASIC Accelerators

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 30, Issue 1Article No.: 3, Pages 1–34https://doi.org/10.1145/3696665

Today’s performance analysis frameworks for deep learning accelerators suffer from two significant limitations. First, although modern convolutional neural networks (CNNs) consist of many types of layers other than convolution, especially during training, ...

research-article

Open Access

JUST ACCEPTED

Hierarchical Bayesian data selection

Simon L. Cotter

ACM Transactions on Probabilistic Machine Learning (TOPML), Just Accepted https://doi.org/10.1145/3699721

There are many issues that can cause problems when attempting to infer model parameters from data. Data and models are both imperfect, and as such there are multiple scenarios in which standard methods of inference will lead to misleading conclusions; ...

short-paper

Open Access

Demo: An Experimental Platform for AI Model Partitioning on Resource-constrained Devices

MobiHoc '24: Proceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile ComputingPages 375–376https://doi.org/10.1145/3641512.3690629

Partitioning Artificial Intelligence (AI) models such as Deep Neural Networks (DNNs) or Transformer-based Architectures is essential for minimizing latency in resource-constrained edge computing environments, which is critical for applications such as ...

Article

Relational Or Single: A Comparative Analysis of Data Synthesis Approaches for Privacy and Utility on a Use Case from Statistical Office

Privacy in Statistical DatabasesPages 403–419https://doi.org/10.1007/978-3-031-69651-0_27

Abstract

This paper presents a case study focused on synthesizing relational datasets within Official Statistics for software and technology testing purposes. Specifically, the focus is on generating synthetic data for testing and validating software code. ...

Article

Open Access

Inference with Transformer Encoders on ARM and RISC-V Multicore Processors

Euro-Par 2024: Parallel ProcessingPages 377–392https://doi.org/10.1007/978-3-031-69766-1_26

Abstract

We delve into the performance of transformer encoder inference on low-power multi-core processors from two perspectives: First, we conduct a detailed profile of the inference process for two members of the BERT family on a modern multi-core ...

research-article

Open Access

LUTIN: Efficient Neural Network Inference with Table Lookup

ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and DesignPages 1–6https://doi.org/10.1145/3665314.3670804

DNN models are becoming increasingly large and complex, but they are also being deployed on commodity devices that require low power and latency but lack specialized accelerators. We introduce LUTIN (LUT-based INference), which reduces the amount of ...

tutorial

Open Access

Demystifying Data Management for Large Language Models

SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of DataPages 547–555https://doi.org/10.1145/3626246.3654683

Navigating the intricacies of data management in the era of Large Language Models (LLMs) presents both challenges and opportunities for database and data management communities. In this tutorial, we offer a comprehensive exploration into the vital role ...

research-article

Open Access

Animation and Artificial Intelligence

Luke Stark

FAccT '24: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and TransparencyPages 1663–1671https://doi.org/10.1145/3630106.3658995

Animation as genre is broadly used across many forms of digital media. In this paper, I argue ChatGPT and similar chatbots powered by Large Language Models (LLMs) can be best understood as animated characters. More than just cartooning, puppetry, or CGI, ...

research-article

"I know even if you don't tell me": Understanding Users' Privacy Preferences Regarding AI-based Inferences of Sensitive Information for Personalization

CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing SystemsArticle No.: 782, Pages 1–21https://doi.org/10.1145/3613904.3642180

Personalization improves user experience by tailoring interactions relevant to each user’s background and preferences. However, personalization requires information about users that platforms often collect without their awareness or their enthusiastic ...

research-article

Mastering Computer Vision Inference Frameworks

ICPE '24 Companion: Companion of the 15th ACM/SPEC International Conference on Performance EngineeringPages 28–33https://doi.org/10.1145/3629527.3651430

In this paper, we present a comprehensive empirical study to evaluate four prominent Computer Vision inference frameworks. Our goal is to shed light on their strengths and weaknesses and provide valuable insights into the challenges of selecting the ...

poster

POSTER: FineCo: Fine-grained Heterogeneous Resource Management for Concurrent DNN Inferences

PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingPages 451–453https://doi.org/10.1145/3627535.3638485

Co-locating multiple DNN servings to share GPU resource is widely used to improve resource utilization while guaranteeing user QoS. Existing GPU sharing mechanism is restricted to model level, and fluctuations in kernel-level resource demands highlight a ...

research-article

Free

JUST ACCEPTED

Adaptive Offloading of Transformer Inference for Weak Edge Devices with Masked Autoencoders

ACM Transactions on Sensor Networks (TOSN), Just Accepted https://doi.org/10.1145/3639824

Transformer is a popular machine learning model used by many intelligent applications in smart cities. However, it has high computational complexity and it would be hard to deploy it in weak-edge devices. This paper presents a novel two-round offloading ...

research-article

Open Access

Inference of Probabilistic Programs with Moment-Matching Gaussian Mixtures

Proceedings of the ACM on Programming Languages (PACMPL), Volume 8, Issue POPLArticle No.: 63, Pages 1882–1912https://doi.org/10.1145/3632905

Computing the posterior distribution of a probabilistic program is a hard task for which no one-fit-for-all solution exists. We propose Gaussian Semantics, which approximates the exact probabilistic semantics of a bounded program by means of Gaussian ...

research-article

Envyr: Instant Execution with Smart Inference

Procedia Computer Science (PROCS), Volume 238, Issue CPages 1068–1073https://doi.org/10.1016/j.procs.2024.06.136

Abstract

This paper introduces a novel framework that eliminates the often cumbersome "build and install" step when running software. Our framework packages a collection of techniques to automatically infer and generate sandboxes, specifically Linux ...

research-article

Frontiers: A Simple Forward Difference-in-Differences Method

Kathleen T. Li

Marketing Science (MKTGS), Volume 43, Issue 2Pages 267–279https://doi.org/10.1287/mksc.2022.0212

I propose a simple forward difference-in-differences method to estimate causal effects from quasiexperimental data and also develop its inference theory, which is widely applicable.

The difference-in-differences (DID) method is the most widely used tool to answer causal questions from quasiexperimental data in marketing and the broader social sciences. Because assignment to treatment in quasiexperiments is not random, the selection ...

research-article

Open Access

ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural Networks

ACM Transactions on Architecture and Code Optimization (TACO), Volume 20, Issue 4Article No.: 61, Pages 1–24https://doi.org/10.1145/3629522

‘‘Extreme edge”¹ devices, such as smart sensors, are a uniquely challenging environment for the deployment of machine learning. The tiny energy budgets of these devices lie beyond what is feasible for conventional deep neural networks, particularly in ...

research-article

Open Access

Machine Learning Inference on Serverless Platforms Using Model Decomposition

UCC '23: Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud ComputingArticle No.: 33, Pages 1–6https://doi.org/10.1145/3603166.3632535

Serverless offers a scalable and cost-effective service model for users to run applications without focusing on underlying infrastructure or physical servers. While the Serverless architecture is not designed to address the unique challenges posed by ...

research-article

Secure Neural Network Inference as a Service with Resource-Constrained Clients

UCC '23: Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud ComputingArticle No.: 8, Pages 1–10https://doi.org/10.1145/3603166.3632132

Applying services computing to neural networks, a service provider may provide inference with a pre-trained neural network as a service. Clients use the service to get the neural network's output on their input. To protect sensitive data, secure neural ...

research-article

Open Access

Practical Inference of Nullability Types

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 1395–1406https://doi.org/10.1145/3611643.3616326

NullPointerExceptions (NPEs), caused by dereferencing null, fre- quently cause crashes in Java programs. Pluggable type checking is highly effective in preventing Java NPEs. However, this approach is difficult to adopt for large, existing code bases, as ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Paper Award

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences