Parallel architectures

Applied Filters

People

Publications

Conferences

Publication Date

Past 5 years

13 Results for: Book/Issue: PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,790,180 records)|Limit your search to The ACM Full-Text Collection (766,447 records)

Showing 1 - 13of13 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

poster
September 2020
Bandwidth Bottleneck in Network-on-Chip for High-Throughput Processors
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 157–158https://doi.org/10.1145/3410463.3414673

A critical component of high-throughput processors such as GPGPUs is the network-on-chip (NoC) that interconnects the cores and the memory partitions together. Different NoC architectures for throughput processors have been proposed but they have often ...
0
325
Metrics
Total Citations0
Total Downloads325
Last 12 Months62
Last 6 weeks10
Get Access
research-article
Public Access
September 2020
MEPHESTO: Modeling Energy-Performance in Heterogeneous SoCs and Their Trade-Offs
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 413–425https://doi.org/10.1145/3410463.3414671

Integrated shared memory heterogeneous architectures are pervasive because they satisfy the diverse needs of mobile, autonomous, and edge computing platforms. Although specialized processing units (PUs) that share a unified system memory improve ...
9
426
Metrics
Total Citations9
Total Downloads426
Last 12 Months129
Last 6 weeks14
View online with eReader
PDF
poster
September 2020
Deep Learning Assisted Resource Partitioning for Improving Performance on Commodity Servers
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 153–154https://doi.org/10.1145/3410463.3414668

In this paper, we introduce a deep reinforcement learning (DRL) framework for solving the problem of partitioning LLC and memory bandwidth coordinately in an end-to-end manner. To this end, we formulate the problem as a markov decision process and ...
0
191
Metrics
Total Citations0
Total Downloads191
Last 12 Months24
Last 6 weeks7
Get Access
poster
September 2020
Decoupled Address Translation for Heterogeneous Memory Systems
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 155–156https://doi.org/10.1145/3410463.3414662

The support for the heterogeneous memory in the conventional virtual memory has an inherent problem. For the efficient translation in the critical translation lookaside buffers (TLBs), the page size has been growing. However, the heterogeneous memory ...
0
256
Metrics
Total Citations0
Total Downloads256
Last 12 Months31
Last 6 weeks5
Get Access
poster
Public Access
September 2020
Collective Affinity Aware Computation Mapping
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 343–344https://doi.org/10.1145/3410463.3414661

This work defines the concept of collective affinity. It is claimed that collective affinity has more potential than single core-centric affinity, for data locality optimization in manycores. The reason is that collective affinity captures the potential ...
0
183
Metrics
Total Citations0
Total Downloads183
Last 12 Months47
Last 6 weeks13
View online with eReader
PDF
research-article
September 2020
GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 43–54https://doi.org/10.1145/3410463.3414656

Recent studies have shown promising performance benefits when multiple stages of a pipelined stencil application are mapped to different parts of a GPU to run concurrently. An important factor for the computing efficiency of such pipelines is the ...
9
285
Metrics
Total Citations9
Total Downloads285
Last 12 Months29
Last 6 weeks9
Get Access
research-article
Open Access
September 2020
Regional Out-of-Order Writes in Total Store Order
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 205–216https://doi.org/10.1145/3410463.3414645

The store buffer, an essential component in today's processors, is designed to hide memory latency by moving stores off the processor's critical path. Furthermore, under the Total Store Order (TSO) memory model, the store buffer ensures the in-order ...
1
437
Metrics
Total Citations1
Total Downloads437
Last 12 Months125
Last 6 weeks27
View online with eReader
PDF
research-article
Public Access
September 2020
TAFE: Thread Address Footprint Estimation for Capturing Data/Thread Locality in GPU Systems
- Kishore Punniyamurthy,
- Andreas Gerstlauer
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 17–29https://doi.org/10.1145/3410463.3414641

In multi-GPU and multi-chiplet GPU systems exhibiting NUMA behavior, information about addresses accessed by threads is crucial for various optimizations such as data/thread co-location and cache/scratchpad memory management. To make optimal decisions ...
4
273
Metrics
Total Citations4
Total Downloads273
Last 12 Months67
Last 6 weeks13
View online with eReader
PDF
research-article
Public Access
September 2020
Enhancing Address Translations in Throughput Processors via Compression
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 191–204https://doi.org/10.1145/3410463.3414633

Efficient memory sharing among multiple compute engines plays an important role in shaping the overall application performance on CPU-GPU heterogeneous platforms. Unified Virtual Memory (UVM) is a promising feature that allows globally-visible data ...
11
727
Metrics
Total Citations11
Total Downloads727
Last 12 Months293
Last 6 weeks58
View online with eReader
PDF
research-article
Public Access
September 2020
PRISM: Architectural Support for Variable-granularity Memory Metadata
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 441–454https://doi.org/10.1145/3410463.3414630

Modern architectures track memory accesses using page granularity metadata such as access and dirty bits, leading to fundamental tradeoffs for system software that uses this metadata. Larger page sizes reduce address translation overheads and page table ...
4
398
Metrics
Total Citations4
Total Downloads398
Last 12 Months102
Last 6 weeks23
View online with eReader
PDF
research-article
Public Access
September 2020
Transmuter: Bridging the Efficiency Gap using Memory and Dataflow Reconfiguration
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 175–190https://doi.org/10.1145/3410463.3414627

With the end of Dennard scaling and Moore's law, it is becoming increasingly difficult to build hardware for emerging applications that meet power and performance targets, while remaining flexible and programmable for end users. This is particularly ...
12
1,793
Metrics
Total Citations12
Total Downloads1,793
Last 12 Months770
Last 6 weeks54
View online with eReader
PDF
research-article
Open Access
September 2020
Ribbon: High Performance Cache Line Flushing for Persistent Memory
- Kai Wu,
- Ivy Peng,
- Jie Ren,
- Dong Li
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 427–439https://doi.org/10.1145/3410463.3414625

Cache line flushing (CLF) is a fundamental building block for programming persistent memory (PM). CLF is prevalent in PM-aware workloads to ensure crash consistency. It also imposes high overhead. Extensive works have explored persistency semantics and ...
2
650
Metrics
Total Citations2
Total Downloads650
Last 12 Months156
Last 6 weeks21
View online with eReader
PDF
research-article
Public Access
September 2020
Analyzing and Leveraging Shared L1 Caches in GPUs
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 161–173https://doi.org/10.1145/3410463.3414623

Graphics Processing Units (GPUs) concurrently execute thousands of threads, which makes them effective for achieving high throughput for a wide range of applications. However, the memory wall often limits peak throughput. GPUs use caches to address this ...
8
1,072
Metrics
Total Citations8
Total Downloads1,072
Last 12 Months430
Last 6 weeks63
View online with eReader
PDF

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Bandwidth Bottleneck in Network-on-Chip for High-Throughput Processors

MEPHESTO: Modeling Energy-Performance in Heterogeneous SoCs and Their Trade-Offs

Deep Learning Assisted Resource Partitioning for Improving Performance on Commodity Servers

Decoupled Address Translation for Heterogeneous Memory Systems

Collective Affinity Aware Computation Mapping

GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU

Regional Out-of-Order Writes in Total Store Order

TAFE: Thread Address Footprint Estimation for Capturing Data/Thread Locality in GPU Systems

Enhancing Address Translations in Throughput Processors via Compression

PRISM: Architectural Support for Variable-granularity Memory Metadata

Transmuter: Bridging the Efficiency Gap using Memory and Dataflow Reconfiguration

Ribbon: High Performance Cache Line Flushing for Persistent Memory

Analyzing and Leveraging Shared L1 Caches in GPUs