Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Understanding Data Movement Patterns in HPC: A NERSC Case Study
SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 71, Pages 1–17https://doi.org/10.1109/SC41406.2024.00077Scientific experiments are producing unprecedented volumes of data with real-time High Performance Computing (HPC) needs. Understanding and ensuring efficient data movement in these emerging data-intensive workloads is becoming critical for successful ...
- research-articleJuly 2024
Investigating Data Movement Strategies for Distribution of Repartitioned Data
PEARC '24: Practice and Experience in Advanced Research Computing 2024: Human Powered ComputingArticle No.: 11, Pages 1–8https://doi.org/10.1145/3626203.3670534Repartitioning in a parallel setting can be defined as the task of redistributing data across processes based on a newly imposed grid/layout. Repartitioning is a fundamental problem, with applications in domains that typically involve computation on ...
- research-articleJune 2024
SoftCache: A Software Cache for PCIe-Attached Hardware Accelerators
PASC '24: Proceedings of the Platform for Advanced Scientific Computing ConferenceArticle No.: 3, Pages 1–11https://doi.org/10.1145/3659914.3659917Hardware accelerators are used to speed up computationally expensive applications in many scientific fields. However, offloading tasks to accelerator cards requires data to be transferred between the memory of the host and the external memory of the ...
- invited-talkJuly 2024
IO-SEA: Storage I/O and Data Management for Exascale Architectures
- Daniel Medeiros,
- Eric B. Gregory,
- Philippe Couvee,
- James Hawkes,
- Sebastien Gougeaud,
- Maike Gilliot,
- Olivier Bressand,
- Yoann Valeri,
- Julien Jaeger,
- Damien Chapon,
- Frederic Bournaud,
- Loïc Strafella,
- Daniel Caviedes-Voullième,
- Ghazal Tashakor,
- Jolanta Zjupa,
- Max Holicki,
- Tom Ridley,
- Yanik Müller,
- Filipe Souza Mendes Guimarães,
- Wolfgang Frings,
- Jan-Oliver Mirus,
- Ilya Zhukov,
- Eric Rodrigues Borba,
- Nafiseh Moti,
- Reza Salkhordeh,
- Nadia Derbey,
- Salim Mimouni,
- Simon Derr,
- Buket Benek Gursoy,
- James Grogan,
- Radek Furmánek,
- Martin Golasowski,
- Kateřina Slaninová,
- Jan Martinovič,
- Jan Faltýnek,
- Jenny Wong,
- Metin Cakircali,
- Tiago Quintino,
- Simon Smart,
- Olivier Iffrig,
- Sai Narasimhamurthy,
- Sonja Happ,
- Michael Rauh,
- Stephan Krempel,
- Mark Wiggins,
- Jiří Nováček,
- André Brinkmann,
- Stefano Markidis,
- Philippe Deniel
CF '24 Companion: Proceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special SessionsPages 94–100https://doi.org/10.1145/3637543.3654620The new emerging scientific workloads to be executed in the upcoming exascale supercomputers face major challenges in terms of storage, given their extreme volume of data. In particular, intelligent data placement, instrumentation, and workflow handling ...
- abstractJune 2023
Architectural Support for Efficient Data Movement in Fully Disaggregated Systems
- Christina Giannoula,
- Kailong Huang,
- Jonathan Tang,
- Nectarios Koziris,
- Georgios Goumas,
- Zeshan Chishti,
- Nandita Vijaykumar
SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer SystemsPages 5–6https://doi.org/10.1145/3578338.3593533Traditional data centers include monolithic servers that tightly integrate CPU, memory and disk (Figure 1a). Instead, Disaggregated Systems (DSs) [8, 13, 18, 27] organize multiple compute (CC), memory (MC) and storage devices as independent, failure-...
Also Published in:
ACM SIGMETRICS Performance Evaluation Review: Volume 51 Issue 1 -
- research-articleMarch 2023
DaeMon: Architectural Support for Efficient Data Movement in Fully Disaggregated Systems
- Christina Giannoula,
- Kailong Huang,
- Jonathan Tang,
- Nectarios Koziris,
- Georgios Goumas,
- Zeshan Chishti,
- Nandita Vijaykumar
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 7, Issue 1Article No.: 16, Pages 1–36https://doi.org/10.1145/3579445Resource disaggregation offers a cost effective solution to resource scaling, utilization, and failure-handling in data centers by physically separating hardware devices in a server. Servers are architected as pools of processor, memory, and storage ...
- research-articleJanuary 2023
Locality-Aware Optimizations for Improving Remote Memory Latency in Multi-GPU Systems
PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 304–316https://doi.org/10.1145/3559009.3569649With generational gains from transistor scaling, GPUs have been able to accelerate traditional computation-intensive workloads. But with the obsolescence of Moore's Law, single GPU systems are no longer able to satisfy the computational and memory ...
- research-articleAugust 2022
GraphRing: an HMC-ring based graph processing framework with optimized data movement
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation ConferencePages 1063–1068https://doi.org/10.1145/3489517.3530571Due to the irregular memory access and high bandwidth demanding, graph processing is usually inefficient on conventional computer architectures. The recent development of the processing-in-memory (PIM) technique such as hybrid memory cube (HMC) has ...
- research-articleJune 2022
Beyond time complexity: data movement complexity analysis for matrix multiplication
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 32, Pages 1–12https://doi.org/10.1145/3524059.3532395Data movement is becoming the dominant contributor to the time and energy costs of computation across a wide range of application domains. However, time complexity is inadequate to analyze data movement. This work expands upon Data Movement Distance, a ...
- research-articleJune 2022
ISKEVA: in-SSD key-value database engine for video analytics applications
- Yi Zheng,
- Joshua Fixelle,
- Nagadastagiri Challapalle,
- Pingyi Huo,
- Zhaoyan Shen,
- Zili Shao,
- Mircea Stan,
- Vijaykrishnan Narayanan
LCTES 2022: Proceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded SystemsPages 50–60https://doi.org/10.1145/3519941.3535068Key-value databases are widely used to store the features or metadata generated from the neural network based video processing platforms. Due to the large volumes of video data, these databases use solid state drives (SSDs) as the primary data storage ...
- research-articleJune 2022
täkō: a polymorphic cache hierarchy for general-purpose optimization of data movement
ISCA '22: Proceedings of the 49th Annual International Symposium on Computer ArchitecturePages 42–58https://doi.org/10.1145/3470496.3527379Current systems hide data movement from software behind the load-store interface. Software's inability to observe and respond to data movement is the root cause of many inefficiencies, including the growing fraction of execution time and energy devoted ...
- research-articleOctober 2021
BurstLink: Techniques for Energy-Efficient Video Display for Conventional and Virtual Reality Systems
- Jawad Haj-Yahya,
- Jisung Park,
- Rahul Bera,
- Juan Gómez Luna,
- Efraim Rotem,
- Taha Shahroodi,
- Jeremie Kim,
- Onur Mutlu
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitecturePages 155–169https://doi.org/10.1145/3466752.3480085Conventional planar video streaming is the most popular application in mobile systems. The rapid growth of 360° video content and virtual reality (VR) devices is accelerating the adoption of VR video streaming. Unfortunately, video streaming consumes ...
- research-articleNovember 2021
Efficient multi-GPU shared memory via automatic optimization of fine-grained transfers
ISCA '21: Proceedings of the 48th Annual International Symposium on Computer ArchitecturePages 139–152https://doi.org/10.1109/ISCA52012.2021.00020Despite continuing research into inter-GPU communication mechanisms, extracting performance from multi-GPU systems remains a significant challenge. Inter-GPU communication via bulk DMA-based transfers exposes data transfer latency on the GPU's critical ...
- posterSeptember 2020
Approximate Pattern Matching for On-Chip Interconnect Traffic Prediction
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 357–358https://doi.org/10.1145/3410463.3414667Emerging multi-chip module GPUs (MCM-GPUs) expend over 17% of the total power budget on chip interconnects and this fraction is expected to increase as chip size increases. Towards proactively managing the power consumption of these interconnects, we ...
- research-articleSeptember 2020
Fireiron: A Data-Movement-Aware Scheduling Language for GPUs
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 71–82https://doi.org/10.1145/3410463.3414632High GPU performance can only be achieved if a kernel efficiently uses the multi-layered compute and memory hierarchies. For example, accelerators such as NVIDIA ?s Tensor Cores require specific mappings of threads to data that must be considered in ...
- research-articleMarch 2021
Decentralized Offload-based Execution on Memory-centric Compute Cores
MEMSYS '20: Proceedings of the International Symposium on Memory SystemsPages 61–76https://doi.org/10.1145/3422575.3422778With the end of Dennard scaling, power constraints have led to increasing compute specialization in the form of differently specialized accelerators integrated at various levels of the general-purpose system hierarchy. The result is that the most common ...
- keynoteJune 2020
High Performance is All about Minimizing Data Movement
HPDC '20: Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed ComputingPages 3–4https://doi.org/10.1145/3369583.3393611High-performance applications running on current and future architectures are mostly performance-limited by the cost of data movement, vertically through the memory hierarchy of a node or between CPU host and accelerator, and horizontally across nodes. ...
- research-articleSeptember 2020
ZnG: architecting GPU multi-processors with new flash for scalable data analysis
ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer ArchitecturePages 1064–1075https://doi.org/10.1109/ISCA45697.2020.00090We propose ZnG, a new GPU-SSD integrated architecture, which can maximize the memory capacity in a GPU and address performance penalties imposed by an SSD. Specifically, ZnG replaces all GPU internal DRAMs with an ultra-low-latency SSD to maximize the ...
- research-articleDecember 2019
Computing with Near Data
ACM SIGMETRICS Performance Evaluation Review (SIGMETRICS), Volume 47, Issue 1Pages 27–28https://doi.org/10.1145/3376930.3376948The cost of moving data between compute elements and storage elements plays a signiicant role in shaping the overall performance of applications.We present a compiler-driven approach to reducing data movement costs. Our approach, referred to as ...
- research-articleOctober 2019
GraphQ: Scalable PIM-Based Graph Processing
MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on MicroarchitecturePages 712–725https://doi.org/10.1145/3352460.3358256Processing-In-Memory (PIM) architectures based on recent technology advances (e.g., Hybrid Memory Cube) demonstrate great potential for graph processing. However, existing solutions did not address the key challenge of graph processing---irregular data ...