Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 40, Pages 1–19https://doi.org/10.1109/SC41406.2024.00046Inference of Large Language Models (LLMs) across computer clusters has become a focal point of research in recent times, with many acceleration techniques taking inspiration from CPU speculative execution. These techniques reduce bottlenecks associated ...
- research-articleNovember 2024
Performance Analysis of CNN Inference/Training with Convolution and Non-Convolution Operations on ASIC Accelerators
- Hadi Esmaeilzadeh,
- Soroush Ghodrati,
- Andrew B. Kahng,
- Sean Kinzer,
- Susmita Dey Manasi,
- Sachin S. Sapatnekar,
- Zhiang Wang
ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 30, Issue 1Article No.: 3, Pages 1–34https://doi.org/10.1145/3696665Today’s performance analysis frameworks for deep learning accelerators suffer from two significant limitations. First, although modern convolutional neural networks (CNNs) consist of many types of layers other than convolution, especially during training, ...
- research-articleOctober 2024JUST ACCEPTED
Hierarchical Bayesian data selection
ACM Transactions on Probabilistic Machine Learning (TOPML), Just Accepted https://doi.org/10.1145/3699721There are many issues that can cause problems when attempting to infer model parameters from data. Data and models are both imperfect, and as such there are multiple scenarios in which standard methods of inference will lead to misleading conclusions; ...
- short-paperOctober 2024
Demo: An Experimental Platform for AI Model Partitioning on Resource-constrained Devices
MobiHoc '24: Proceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile ComputingPages 375–376https://doi.org/10.1145/3641512.3690629Partitioning Artificial Intelligence (AI) models such as Deep Neural Networks (DNNs) or Transformer-based Architectures is essential for minimizing latency in resource-constrained edge computing environments, which is critical for applications such as ...
- ArticleSeptember 2024
Relational Or Single: A Comparative Analysis of Data Synthesis Approaches for Privacy and Utility on a Use Case from Statistical Office
AbstractThis paper presents a case study focused on synthesizing relational datasets within Official Statistics for software and technology testing purposes. Specifically, the focus is on generating synthetic data for testing and validating software code. ...
-
- research-articleSeptember 2024
LUTIN: Efficient Neural Network Inference with Table Lookup
ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and DesignPages 1–6https://doi.org/10.1145/3665314.3670804DNN models are becoming increasingly large and complex, but they are also being deployed on commodity devices that require low power and latency but lack specialized accelerators. We introduce LUTIN (LUT-based INference), which reduces the amount of ...
- tutorialJune 2024
Demystifying Data Management for Large Language Models
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of DataPages 547–555https://doi.org/10.1145/3626246.3654683Navigating the intricacies of data management in the era of Large Language Models (LLMs) presents both challenges and opportunities for database and data management communities. In this tutorial, we offer a comprehensive exploration into the vital role ...
- research-articleJune 2024
Animation and Artificial Intelligence
FAccT '24: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and TransparencyPages 1663–1671https://doi.org/10.1145/3630106.3658995Animation as genre is broadly used across many forms of digital media. In this paper, I argue ChatGPT and similar chatbots powered by Large Language Models (LLMs) can be best understood as animated characters. More than just cartooning, puppetry, or CGI, ...
- research-articleMay 2024
"I know even if you don't tell me": Understanding Users' Privacy Preferences Regarding AI-based Inferences of Sensitive Information for Personalization
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing SystemsArticle No.: 782, Pages 1–21https://doi.org/10.1145/3613904.3642180Personalization improves user experience by tailoring interactions relevant to each user’s background and preferences. However, personalization requires information about users that platforms often collect without their awareness or their enthusiastic ...
- research-articleMay 2024
Mastering Computer Vision Inference Frameworks
ICPE '24 Companion: Companion of the 15th ACM/SPEC International Conference on Performance EngineeringPages 28–33https://doi.org/10.1145/3629527.3651430In this paper, we present a comprehensive empirical study to evaluate four prominent Computer Vision inference frameworks. Our goal is to shed light on their strengths and weaknesses and provide valuable insights into the challenges of selecting the ...
- posterFebruary 2024
POSTER: FineCo: Fine-grained Heterogeneous Resource Management for Concurrent DNN Inferences
PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingPages 451–453https://doi.org/10.1145/3627535.3638485Co-locating multiple DNN servings to share GPU resource is widely used to improve resource utilization while guaranteeing user QoS. Existing GPU sharing mechanism is restricted to model level, and fluctuations in kernel-level resource demands highlight a ...
- research-articleJanuary 2024JUST ACCEPTED
Adaptive Offloading of Transformer Inference for Weak Edge Devices with Masked Autoencoders
Transformer is a popular machine learning model used by many intelligent applications in smart cities. However, it has high computational complexity and it would be hard to deploy it in weak-edge devices. This paper presents a novel two-round offloading ...
Inference of Probabilistic Programs with Moment-Matching Gaussian Mixtures
Proceedings of the ACM on Programming Languages (PACMPL), Volume 8, Issue POPLArticle No.: 63, Pages 1882–1912https://doi.org/10.1145/3632905Computing the posterior distribution of a probabilistic program is a hard task for which no one-fit-for-all solution exists. We propose Gaussian Semantics, which approximates the exact probabilistic semantics of a bounded program by means of Gaussian ...
- research-articleSeptember 2024
Envyr: Instant Execution with Smart Inference
Procedia Computer Science (PROCS), Volume 238, Issue CPages 1068–1073https://doi.org/10.1016/j.procs.2024.06.136AbstractThis paper introduces a novel framework that eliminates the often cumbersome "build and install" step when running software. Our framework packages a collection of techniques to automatically infer and generate sandboxes, specifically Linux ...
- research-articleDecember 2023
Frontiers: A Simple Forward Difference-in-Differences Method
I propose a simple forward difference-in-differences method to estimate causal effects from quasiexperimental data and also develop its inference theory, which is widely applicable.
The difference-in-differences (DID) method is the most widely used tool to answer causal questions from quasiexperimental data in marketing and the broader social sciences. Because assignment to treatment in quasiexperiments is not random, the selection ...
- research-articleDecember 2023
ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural Networks
- Zachary Susskind,
- Aman Arora,
- Igor D. S. Miranda,
- Alan T. L. Bacellar,
- Luis A. Q. Villon,
- Rafael F. Katopodis,
- Leandro S. de Araújo,
- Diego L. C. Dutra,
- Priscila M. V. Lima,
- Felipe M. G. França,
- Mauricio Breternitz Jr.,
- Lizy K. John
ACM Transactions on Architecture and Code Optimization (TACO), Volume 20, Issue 4Article No.: 61, Pages 1–24https://doi.org/10.1145/3629522‘‘Extreme edge”1 devices, such as smart sensors, are a uniquely challenging environment for the deployment of machine learning. The tiny energy budgets of these devices lie beyond what is feasible for conventional deep neural networks, particularly in ...
- research-articleApril 2024
Machine Learning Inference on Serverless Platforms Using Model Decomposition
UCC '23: Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud ComputingArticle No.: 33, Pages 1–6https://doi.org/10.1145/3603166.3632535Serverless offers a scalable and cost-effective service model for users to run applications without focusing on underlying infrastructure or physical servers. While the Serverless architecture is not designed to address the unique challenges posed by ...
- research-articleApril 2024
Secure Neural Network Inference as a Service with Resource-Constrained Clients
UCC '23: Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud ComputingArticle No.: 8, Pages 1–10https://doi.org/10.1145/3603166.3632132Applying services computing to neural networks, a service provider may provide inference with a pre-trained neural network as a service. Clients use the service to get the neural network's output on their input. To protect sensitive data, secure neural ...
Practical Inference of Nullability Types
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 1395–1406https://doi.org/10.1145/3611643.3616326NullPointerExceptions (NPEs), caused by dereferencing null, fre- quently cause crashes in Java programs. Pluggable type checking is highly effective in preventing Java NPEs. However, this approach is difficult to adopt for large, existing code bases, as ...