Keyword: data movement : Search

research-article

Free

Understanding Data Movement Patterns in HPC: A NERSC Case Study

SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 71, Pages 1–17https://doi.org/10.1109/SC41406.2024.00077

Scientific experiments are producing unprecedented volumes of data with real-time High Performance Computing (HPC) needs. Understanding and ensuring efficient data movement in these emerging data-intensive workloads is becoming critical for successful ...

research-article

Open Access

Investigating Data Movement Strategies for Distribution of Repartitioned Data

PEARC '24: Practice and Experience in Advanced Research Computing 2024: Human Powered ComputingArticle No.: 11, Pages 1–8https://doi.org/10.1145/3626203.3670534

Repartitioning in a parallel setting can be defined as the task of redistributing data across processes based on a newly imposed grid/layout. Repartitioning is a fundamental problem, with applications in domains that typically involve computation on ...

research-article

Open Access

SoftCache: A Software Cache for PCIe-Attached Hardware Accelerators

PASC '24: Proceedings of the Platform for Advanced Scientific Computing ConferenceArticle No.: 3, Pages 1–11https://doi.org/10.1145/3659914.3659917

Hardware accelerators are used to speed up computationally expensive applications in many scientific fields. However, offloading tasks to accelerator cards requires data to be transferred between the memory of the host and the external memory of the ...

invited-talk

IO-SEA: Storage I/O and Data Management for Exascale Architectures

CF '24 Companion: Proceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special SessionsPages 94–100https://doi.org/10.1145/3637543.3654620

The new emerging scientific workloads to be executed in the upcoming exascale supercomputers face major challenges in terms of storage, given their extreme volume of data. In particular, intelligent data placement, instrumentation, and workflow handling ...

abstract

Architectural Support for Efficient Data Movement in Fully Disaggregated Systems

SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer SystemsPages 5–6https://doi.org/10.1145/3578338.3593533

Traditional data centers include monolithic servers that tightly integrate CPU, memory and disk (Figure 1a). Instead, Disaggregated Systems (DSs) [8, 13, 18, 27] organize multiple compute (CC), memory (MC) and storage devices as independent, failure-...

Also Published in:

ACM SIGMETRICS Performance Evaluation Review: Volume 51 Issue 1

research-article

DaeMon: Architectural Support for Efficient Data Movement in Fully Disaggregated Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 7, Issue 1Article No.: 16, Pages 1–36https://doi.org/10.1145/3579445

Resource disaggregation offers a cost effective solution to resource scaling, utilization, and failure-handling in data centers by physically separating hardware devices in a server. Servers are architected as pools of processor, memory, and storage ...

research-article

Locality-Aware Optimizations for Improving Remote Memory Latency in Multi-GPU Systems

PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 304–316https://doi.org/10.1145/3559009.3569649

With generational gains from transistor scaling, GPUs have been able to accelerate traditional computation-intensive workloads. But with the obsolescence of Moore's Law, single GPU systems are no longer able to satisfy the computational and memory ...

research-article

Open Access

GraphRing: an HMC-ring based graph processing framework with optimized data movement

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation ConferencePages 1063–1068https://doi.org/10.1145/3489517.3530571

Due to the irregular memory access and high bandwidth demanding, graph processing is usually inefficient on conventional computer architectures. The recent development of the processing-in-memory (PIM) technique such as hybrid memory cube (HMC) has ...

research-article

Public Access

Beyond time complexity: data movement complexity analysis for matrix multiplication

ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 32, Pages 1–12https://doi.org/10.1145/3524059.3532395

Data movement is becoming the dominant contributor to the time and energy costs of computation across a wide range of application domains. However, time complexity is inadequate to analyze data movement. This work expands upon Data Movement Distance, a ...

research-article

ISKEVA: in-SSD key-value database engine for video analytics applications

LCTES 2022: Proceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded SystemsPages 50–60https://doi.org/10.1145/3519941.3535068

Key-value databases are widely used to store the features or metadata generated from the neural network based video processing platforms. Due to the large volumes of video data, these databases use solid state drives (SSDs) as the primary data storage ...

research-article

Open Access

täkō: a polymorphic cache hierarchy for general-purpose optimization of data movement

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer ArchitecturePages 42–58https://doi.org/10.1145/3470496.3527379

Current systems hide data movement from software behind the load-store interface. Software's inability to observe and respond to data movement is the root cause of many inefficiencies, including the growing fraction of execution time and energy devoted ...

research-article

BurstLink: Techniques for Energy-Efficient Video Display for Conventional and Virtual Reality Systems

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitecturePages 155–169https://doi.org/10.1145/3466752.3480085

Conventional planar video streaming is the most popular application in mobile systems. The rapid growth of 360° video content and virtual reality (VR) devices is accelerating the adoption of VR video streaming. Unfortunately, video streaming consumes ...

research-article

Efficient multi-GPU shared memory via automatic optimization of fine-grained transfers

ISCA '21: Proceedings of the 48th Annual International Symposium on Computer ArchitecturePages 139–152https://doi.org/10.1109/ISCA52012.2021.00020

Despite continuing research into inter-GPU communication mechanisms, extracting performance from multi-GPU systems remains a significant challenge. Inter-GPU communication via bulk DMA-based transfers exposes data transfer latency on the GPU's critical ...

poster

Approximate Pattern Matching for On-Chip Interconnect Traffic Prediction

PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 357–358https://doi.org/10.1145/3410463.3414667

Emerging multi-chip module GPUs (MCM-GPUs) expend over 17% of the total power budget on chip interconnects and this fraction is expected to increase as chip size increases. Towards proactively managing the power consumption of these interconnects, we ...

research-article

Open Access

Fireiron: A Data-Movement-Aware Scheduling Language for GPUs

PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 71–82https://doi.org/10.1145/3410463.3414632

High GPU performance can only be achieved if a kernel efficiently uses the multi-layered compute and memory hierarchies. For example, accelerators such as NVIDIA ?s Tensor Cores require specific mappings of threads to data that must be considered in ...

research-article

Decentralized Offload-based Execution on Memory-centric Compute Cores

MEMSYS '20: Proceedings of the International Symposium on Memory SystemsPages 61–76https://doi.org/10.1145/3422575.3422778

With the end of Dennard scaling, power constraints have led to increasing compute specialization in the form of differently specialized accelerators integrated at various levels of the general-purpose system hierarchy. The result is that the most common ...

keynote

Public Access

High Performance is All about Minimizing Data Movement

Mary Hall

HPDC '20: Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed ComputingPages 3–4https://doi.org/10.1145/3369583.3393611

High-performance applications running on current and future architectures are mostly performance-limited by the cost of data movement, vertically through the memory hierarchy of a node or between CPU host and accelerator, and horizontally across nodes. ...

research-article

ZnG: architecting GPU multi-processors with new flash for scalable data analysis

ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer ArchitecturePages 1064–1075https://doi.org/10.1109/ISCA45697.2020.00090

We propose ZnG, a new GPU-SSD integrated architecture, which can maximize the memory capacity in a GPU and address performance penalties imposed by an SSD. Specifically, ZnG replaces all GPU internal DRAMs with an ultra-low-latency SSD to maximize the ...

research-article

Computing with Near Data

ACM SIGMETRICS Performance Evaluation Review (SIGMETRICS), Volume 47, Issue 1Pages 27–28https://doi.org/10.1145/3376930.3376948

The cost of moving data between compute elements and storage elements plays a signiicant role in shaping the overall performance of applications.We present a compiler-driven approach to reducing data movement costs. Our approach, referred to as ...

research-article

Public Access

GraphQ: Scalable PIM-Based Graph Processing

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on MicroarchitecturePages 712–725https://doi.org/10.1145/3352460.3358256

Processing-In-Memory (PIM) architectures based on recent technology advances (e.g., Hybrid Memory Cube) demonstrate great potential for graph processing. However, existing solutions did not address the key challenge of graph processing---irregular data ...

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Also Published in:

Upcoming Conferences