TACO: Vol 20, No 3

Volume 20, Issue 3September 2023

Volume 20, Issue 3

September 2023

Editor:

David Kaeli
Northeastern University, USA

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1544-3566

EISSN:1544-3973

Tags:

PDF eReader

Bibliometrics

Issue Downloads

PDFfront matter (TOC, masthead, submission information)

Select All

Export Citations Save to Binder

research-article

Open Access

ASM: An Adaptive Secure Multicore for Co-located Mutually Distrusting Processes

Article No.: 32, Pages 1–24https://doi.org/10.1145/3587480

With the ever-increasing virtualization of software and hardware, the privacy of user-sensitive data is a fundamental concern in computation outsourcing. Secure processors enable a trusted execution environment to guarantee security properties based on ...

research-article

Open Access

Turn-based Spatiotemporal Coherence for GPUs

Article No.: 33, Pages 1–27https://doi.org/10.1145/3593054

This article introduces turn-based spatiotemporal coherence. Spatiotemporal coherence is a novel coherence implementation that assigns write permission to epochs (or turns) as opposed to a processor core. This paradigm shift in the assignment of write ...

research-article

Open Access

Jointly Optimizing Job Assignment and Resource Partitioning for Improving System Throughput in Cloud Datacenters

Article No.: 34, Pages 1–24https://doi.org/10.1145/3593055

Colocating multiple jobs on the same server has been widely applied for improving resource utilization in cloud datacenters. However, the colocated jobs would contend for the shared resources, which could lead to significant performance degradation. An ...

research-article

Open Access

TNT: A Modular Approach to Traversing Physically Heterogeneous NOCs at Bare-wire Latency

Article No.: 35, Pages 1–25https://doi.org/10.1145/3597611

The ideal latency for on-chip network traversal would be the delay incurred from wire traversal alone. Unfortunately, in a realistic modular network, the latency for a packet to traverse the network is significantly higher than this wire delay. The main ...

research-article

Open Access

Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs

Article No.: 36, Pages 1–26https://doi.org/10.1145/3600092

The convolutional neural network (CNN) is an important deep learning method, which is widely used in many fields. However, it is very time consuming to implement the CNN where convolution usually takes most of the time. There are many zero values in ...

research-article

Open Access

GraphTune: An Efficient Dependency-Aware Substrate to Alleviate Irregularity in Concurrent Graph Processing

Article No.: 37, Pages 1–24https://doi.org/10.1145/3600091

With the increasing need for graph analysis, massive Concurrent iterative Graph Processing (CGP) jobs are usually performed on the common large-scale real-world graph. Although several solutions have been proposed, these CGP jobs are not coordinated with ...

research-article

Open Access

The Impact of Page Size and Microarchitecture on Instruction Address Translation Overhead

Article No.: 38, Pages 1–25https://doi.org/10.1145/3600089

As the volume of data processed by applications has increased, considerable attention has been paid to data address translation overheads, leading to the widespread use of larger page sizes (“superpages”) and multi-level translation lookaside buffers (...

research-article

Open Access

Cache Programming for Scientific Loops Using Leases

Article No.: 39, Pages 1–25https://doi.org/10.1145/3600090

Cache management is important in exploiting locality and reducing data movement. This article studies a new type of programmable cache called the lease cache. By assigning leases, software exerts the primary control on when and how long data stays in the ...

research-article

Open Access

MPU: Memory-centric SIMT Processor via In-DRAM Near-bank Computing

Article No.: 40, Pages 1–26https://doi.org/10.1145/3603113

With the growing number of data-intensive workloads, GPU, which is the state-of-the-art single-instruction-multiple-thread (SIMT) processor, is hindered by the memory bandwidth wall. To alleviate this bottleneck, previously proposed 3D-stacking near-bank ...

research-article

Open Access

rNdN: Fast Query Compilation for NVIDIA GPUs

Article No.: 41, Pages 1–25https://doi.org/10.1145/3603503

GPU database systems are an effective solution to query optimization, particularly with compilation and data caching. They fall short, however, in end-to-end workloads, as existing compiler toolchains are too expensive for use with short-running queries. ...

research-article

Open Access

Hierarchical Model Parallelism for Optimizing Inference on Many-core Processor via Decoupled 3D-CNN Structure

Article No.: 42, Pages 1–21https://doi.org/10.1145/3605149

The tremendous success of convolutional neural network (CNN) has made it ubiquitous in many fields of human endeavor. Many applications such as biomedical analysis and scientific data analysis involve analyzing volumetric data. This spawns huge demand for ...

research-article

Open Access

MFFT: A GPU Accelerated Highly Efficient Mixed-Precision Large-Scale FFT Framework

Article No.: 43, Pages 1–23https://doi.org/10.1145/3605148

Fast Fourier transform (FFT) is widely used in computing applications in large-scale parallel programs, and data communication is the main performance bottleneck of FFT and seriously affects its parallel efficiency. To tackle this problem, we propose a ...

research-article

Open Access

Approx-RM: Reducing Energy on Heterogeneous Multicore Processors under Accuracy and Timing Constraints

Article No.: 44, Pages 1–25https://doi.org/10.1145/3605214

Reducing energy consumption while providing performance and quality guarantees is crucial for computing systems ranging from battery-powered embedded systems to data centers. This article considers approximate iterative applications executing on ...

research-article

Open Access

SplitZNS: Towards an Efficient LSM-Tree on Zoned Namespace SSDs

Article No.: 45, Pages 1–26https://doi.org/10.1145/3608476

The Zoned Namespace (ZNS) Solid State Drive (SSD) is a nascent form of storage device that offers novel prospects for the Log Structured Merge Tree (LSM-tree). ZNS exposes erase blocks in SSD as append-only zones, enabling the LSM-tree to gain awareness ...

Subjects

Comments

Please enable JavaScript to view thecomments powered by Disqus.

ACM Transactions on Architecture and Code Optimization

Sections

Issue Downloads

ASM: An Adaptive Secure Multicore for Co-located Mutually Distrusting Processes

Turn-based Spatiotemporal Coherence for GPUs

Jointly Optimizing Job Assignment and Resource Partitioning for Improving System Throughput in Cloud Datacenters

TNT: A Modular Approach to Traversing Physically Heterogeneous NOCs at Bare-wire Latency

Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs

GraphTune: An Efficient Dependency-Aware Substrate to Alleviate Irregularity in Concurrent Graph Processing

The Impact of Page Size and Microarchitecture on Instruction Address Translation Overhead

Cache Programming for Scientific Loops Using Leases

MPU: Memory-centric SIMT Processor via In-DRAM Near-bank Computing

rNdN: Fast Query Compilation for NVIDIA GPUs

Hierarchical Model Parallelism for Optimizing Inference on Many-core Processor via Decoupled 3D-CNN Structure

MFFT: A GPU Accelerated Highly Efficient Mixed-Precision Large-Scale FFT Framework

Approx-RM: Reducing Energy on Heterogeneous Multicore Processors under Accuracy and Timing Constraints

SplitZNS: Towards an Efficient LSM-Tree on Zoned Namespace SSDs