Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024
Unboxing Virgil ADTs for Fun and Profit
JENSFEST '24: Proceedings of the Workshop Dedicated to Jens Palsberg on the Occasion of His 60th BirthdayPages 43–52https://doi.org/10.1145/3694848.3694857Algebraic Data Types (ADTs) are an increasingly common feature in modern programming languages. In many implementations, values of non-nullary, multi-case ADTs are allocated on the heap, which may reduce performance and increase memory usage. This work ...
- research-articleJuly 2024
Investigating Data Movement Strategies for Distribution of Repartitioned Data
PEARC '24: Practice and Experience in Advanced Research Computing 2024: Human Powered ComputingArticle No.: 11, Pages 1–8https://doi.org/10.1145/3626203.3670534Repartitioning in a parallel setting can be defined as the task of redistributing data across processes based on a newly imposed grid/layout. Repartitioning is a fundamental problem, with applications in domains that typically involve computation on ...
- research-articleFebruary 2024
Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 1Article No.: 20, Pages 1–20https://doi.org/10.1145/3633462An important sparse tensor computation is sparse-tensor-dense-matrix multiplication (SpTM), which is used in tensor decomposition and applications. SpTM is a multi-dimensional analog to sparse-matrix-dense-matrix multiplication (SpMM). In this article, we ...
- research-articleFebruary 2024
Extension VM: Interleaved Data Layout in Vector Memory
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 1Article No.: 18, Pages 1–23https://doi.org/10.1145/3631528While vector architecture is widely employed in processors for neural networks, signal processing, and high-performance computing; however, its performance is limited by inefficient column-major memory access. The column-major access limitation originates ...
- research-articleNovember 2023
The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores
- Maciej Besta,
- Robert Gerstenberger,
- Marc Fischer,
- Michal Podstawski,
- Nils Blach,
- Berke Egeli,
- Georgy Mitenkov,
- Wojciech Chlapek,
- Marek Michalewicz,
- Hubert Niewiadomski,
- Juergen Mueller,
- Torsten Hoefler
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 22, Pages 1–18https://doi.org/10.1145/3581784.3607068Graph databases (GDBs) are crucial in academic and industry applications. The key challenges in developing GDBs are achieving high performance, scalability, programmability, and portability. To tackle these challenges, we harness established practices ...
-
- research-articleJanuary 2023
Optimizing Data Layout for Racetrack Memory in Embedded Systems
ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation ConferencePages 110–115https://doi.org/10.1145/3566097.3567854Racetrack memory (RTM), which consists of multiple domain block clusters (DBC) and access ports, is a novel non-volatile memory and has potential as scratchpad memory (SPM) in embedded devices due to its high density and low access latency. However, too ...
- research-articleDecember 2022
Fine-Granular Computation and Data Layout Reorganization for Improving Locality
ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided DesignArticle No.: 5, Pages 1–9https://doi.org/10.1145/3508352.3549386While data locality and cache performance have been investigated in great depth by prior research (in the context of both high-end systems and embedded/mobile systems), one of the important characteristics of prior approaches is that they transform loop ...
- tutorialJune 2022
Dissecting, Designing, and Optimizing LSM-based Data Stores
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 2489–2497https://doi.org/10.1145/3514221.3522563Log-structured merge (LSM) trees have emerged as one of the most commonly used disk-based data structures in modern data systems. LSM-trees employ out-of-place ingestion to support high throughput for writes, while their immutable file structure allows ...
- short-paperJune 2022
Compactionary: A Dictionary for LSM Compactions
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 2429–2432https://doi.org/10.1145/3514221.3520169Log-structured merge (LSM) trees are widely used as the storage layer of modern NoSQL data stores, as they offer efficient ingestion performance. To enable competitive read performance and reduce space amplification, LSM-trees re-organize data layout on ...
- research-articleAugust 2022
Optimizing Data Layout for Training Deep Neural Networks
WWW '22: Companion Proceedings of the Web Conference 2022Pages 548–554https://doi.org/10.1145/3487553.3524856The widespread popularity of deep neural networks (DNNs) has made it an important workload in modern datacenters. Training DNNs is both computation-intensive and memory-intensive. While prior works focus on training parallelization (e.g., data ...
Improving communication by optimizing on-node data movement with data layout
PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 304–317https://doi.org/10.1145/3437801.3441598We present optimizations to improve communication performance by reducing on-node data movement for a class of distributed memory applications. The primary concept is to eliminate the data movement associated with packing and unpacking subsets of the ...
- research-articleMay 2020
Qd-tree: Learning Data Layouts for Big Data Analytics
- Zongheng Yang,
- Badrish Chandramouli,
- Chi Wang,
- Johannes Gehrke,
- Yinan Li,
- Umar Farooq Minhas,
- Per-Åke Larson,
- Donald Kossmann,
- Rajeev Acharya
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataPages 193–208https://doi.org/10.1145/3318464.3389770Corporations today collect data at an unprecedented and accelerating scale, making the need to run queries on large datasets increasingly important. Technologies such as columnar block-based data organization and compression have become standard practice ...
- short-paperMay 2020
Demonstration of Chestnut: An In-memory Data Layout Designer for Database Applications
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataPages 2813–2816https://doi.org/10.1145/3318464.3384712This demonstration showcases Chestnut, a data layout generator for in-memory object-oriented database applications. Given an application and a memory budget, Chestnut generates a customized in-memory data layout and the corresponding query plans that ...
- research-articleJanuary 2020
VFS_CS: a light-weight and extensible virtual file system middleware for cloud storage system
International Journal of Computational Science and Engineering (IJCSE), Volume 21, Issue 4Pages 513–521https://doi.org/10.1504/ijcse.2020.106865In cloud environments, data-intensive applications have been widely deployed to solve non-trivial applications, while cloud-based storage systems usually fail to provide desirable performance and efficiency when running those data-intensive applications. ...
- research-articleJanuary 2020
Laius: an energy-efficient FPGA CNN accelerator with the support of a fixed-point training framework
International Journal of Computational Science and Engineering (IJCSE), Volume 21, Issue 3Pages 418–428https://doi.org/10.1504/ijcse.2020.106064With the development of convolutional neural networks (CNNs), their high computational complexity and energy consumption become significant problems. Many CNN inference accelerators are proposed to reduce the consumption. Most of them are based on 32-bit ...
- research-articleFebruary 2018
A Study of Data Layout in Multi-channel Processing-In-Memory Architecture
ICSCA '18: Proceedings of the 2018 7th International Conference on Software and Computer ApplicationsPages 134–138https://doi.org/10.1145/3185089.3185136In modern computing hardware, the performance gap between processor and memory is one of the most significant factors that limits overall performance improvement of computing system. Also, with the advent of multicore and manycore system, memory ...
- research-articleMay 2017
Designing a graphics processing unit accelerated petaflop capable lattice Boltzmann solver
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 31, Issue 3Pages 246–255https://doi.org/10.1177/1094342016658109The lattice Boltzmann method is a well-established numerical approach for complex fluid flow simulations. Recently, general-purpose graphics processing units GPUs have become available as high-performance computing resources at large scale. We report on ...
- ArticleFebruary 2017
Characterizing data organization effects on heterogeneous memory architectures
CGO '17: Proceedings of the 2017 International Symposium on Code Generation and OptimizationPages 160–170Layout and placement of shared data structures is critical to achieving scalable performance on heterogeneous memory architectures. While recent research has established the importance of data organization and developed mechanisms for data layout ...
- research-articleJanuary 2017
Layout Lock: A Scalable Locking Paradigm for Concurrent Data Layout Modifications
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 17–29https://doi.org/10.1145/3018743.3018753Data-structures can benefit from dynamic data layout modifications when the size or the shape of the data structure changes during the execution, or when different phases in the program execute different workloads. However, in a modern multi-core ...
Also Published in:
ACM SIGPLAN Notices: Volume 52 Issue 8 - research-articleNovember 2016
Optimizing memory efficiency for deep convolutional neural networks on GPUs
SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 54, Pages 1–12Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, however, they require significant execution time. The massive parallel computing ...