Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024
Performant Bounds Checking for 64-Bit WebAssembly
VMIL '24: Proceedings of the 16th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate LanguagesPages 23–31https://doi.org/10.1145/3689490.3690400WebAssembly is becoming increasingly popular for various use cases due to its high portability, strict and easily enforceable isolation, and its comparably low run-time overhead. For determinism and security, WebAssembly guarantees that accesses to ...
- research-articleOctober 2024
Rethinking Page Table Structure for Fast Address Translation in GPUs: A Fixed-Size Hashed Page Table
PACT '24: Proceedings of the 2024 International Conference on Parallel Architectures and Compilation TechniquesPages 325–337https://doi.org/10.1145/3656019.3676900GPU memory virtualization has become essential for efficient programming, memory management, and address space sharing among computing devices in heterogeneous systems. Conventional GPU virtual memory systems use multi-level Radix Page Tables (RPTs) to ...
- research-articleJune 2024
A Managed Memory System for Micro Controllers with NOR Flash Memory
ISMM 2024: Proceedings of the 2024 ACM SIGPLAN International Symposium on Memory ManagementPages 57–67https://doi.org/10.1145/3652024.3665511This paper presents a managed memory system for micro controllers with only a small amount of memory but with NOR flash memory. This system is targeted at a device such as Raspberry Pi Pico, which is equipped with ARM Coretex M0+, on-chip 264KB SRAM, and ...
- research-articleJanuary 2024
Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual Memory
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 1Article No.: 14, Pages 1–24https://doi.org/10.1145/3632953The abstraction of a shared memory space over separate CPU and GPU memory domains has eased the burden of portability for many HPC codebases. However, users pay for ease of use provided by system-managed memory with a moderate-to-high performance ...
- research-articleDecember 2023
Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address Mappings
- Konstantinos Kanellopoulos,
- Rahul Bera,
- Kosta Stojiljkovic,
- F. Nisa Bostanci,
- Can Firtina,
- Rachata Ausavarungnirun,
- Rakesh Kumar,
- Nastaran Hajinazar,
- Mohammad Sadrosadati,
- Nandita Vijaykumar,
- Onur Mutlu
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitecturePages 1196–1212https://doi.org/10.1145/3613424.3623789Conventional virtual memory (VM) frameworks enable a virtual address to flexibly map to any physical address. This flexibility necessitates large data structures to store virtual-to-physical mappings, which leads to high address translation latency and ...
-
- research-articleDecember 2023
Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources
- Konstantinos Kanellopoulos,
- Hong Chul Nam,
- Nisa Bostanci,
- Rahul Bera,
- Mohammad Sadrosadati,
- Rakesh Kumar,
- Davide Basilio Bartolini,
- Onur Mutlu
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitecturePages 1178–1195https://doi.org/10.1145/3613424.3614276Address translation is a performance bottleneck in data-intensive workloads due to large datasets and irregular access patterns that lead to frequent high-latency page table walks (PTWs). PTWs can be reduced by using (i) large hardware TLBs or (ii) ...
- research-articleAugust 2023
Energy and Performance Improvements for Convolutional Accelerators Using Lightweight Address Translation Support
CF '23: Proceedings of the 20th ACM International Conference on Computing FrontiersPages 84–90https://doi.org/10.1145/3587135.3592208The growing demand for deep learning applications has led to the design and development of several hardware accelerators to increase performance and energy efficiency. In particular, convolutional accelerators are among those receiving the most attention ...
- research-articleMarch 2023
Reconfigurable Virtual Memory for FPGA-Driven I/O
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3Pages 556–571https://doi.org/10.1145/3582016.3582048FPGAs are increasingly used to accelerate modern applications, and cloud providers offer FPGA platforms on-demand with a variety of FPGAs, I/O peripherals, and memory options. FPGA vendors expose I/O with low-level interfaces that limit application ...
Clio: a hardware-software co-designed disaggregated memory system
ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsPages 417–433https://doi.org/10.1145/3503222.3507762Memory disaggregation has attracted great attention recently because of its benefits in efficient memory utilization and ease of management. So far, memory disaggregation research has all taken one of two approaches: building/emulating memory nodes using ...
- research-articleFebruary 2022
Parallel virtualized memory translation with nested elastic cuckoo page tables
ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsPages 84–97https://doi.org/10.1145/3503222.3507720A major reason why nested or virtualized address translations are slow is because current systems organize page tables in a multi-level tree that is accessed in a sequential manner. A nested translation may potentially require up to twenty-four ...
- research-articleOctober 2021
Increasing GPU Translation Reach by Leveraging Under-Utilized On-Chip Resources
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitecturePages 1169–1181https://doi.org/10.1145/3466752.3480105Many GPU applications issue irregular memory accesses to a very large memory footprint. We confirm observations from prior work that these irregular access patterns are severely bottlenecked by insufficient Translation Lookaside Buffer (TLB) reach, ...
- research-articleJune 2021
Compendia: reducing virtual-memory costs via selective densification
ISMM 2021: Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory ManagementPages 52–65https://doi.org/10.1145/3459898.3463902Virtual-to-physical memory translation is becoming an increasingly dominant cost in workload execution; as data sizes scale, up to four memory accesses are required per translation, and 24 in virtualised systems. However, the radix trees in use today to ...
- ArticleApril 2021
KLOCs: kernel-level object contexts for heterogeneous memory systems
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsPages 65–78https://doi.org/10.1145/3445814.3446745Heterogeneous memory systems promise better performance, energy-efficiency, and cost trade-offs in emerging systems. But delivering on this promise requires efficient OS mechanisms and policies for data tiering and migration. Unfortunately, modern OSes ...
- research-articleAugust 2018
HeteroOS: OS Design for Heterogeneous Memory Management in Datacenters
ACM SIGOPS Operating Systems Review (SIGOPS), Volume 52, Issue 1Pages 13–26https://doi.org/10.1145/3273982.3273985Heterogeneous memory management combined with server virtualization in datacenters is expected to increase the software and OS management complexity. State-of-the-art solutions rely exclusively on the hypervisor (VMM) for expensive page hotness tracking ...
- research-articleJune 2017
HeteroOS: OS Design for Heterogeneous Memory Management in Datacenter
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer ArchitecturePages 521–534https://doi.org/10.1145/3079856.3080245Heterogeneous memory management combined with server virtualization in datacenters is expected to increase the software and OS management complexity. State-of-the-art solutions rely exclusively on the hypervisor (VMM) for expensive page hotness tracking ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 45 Issue 2 - posterDecember 2016
Understanding the Behavior of Spark Workloads from Linux Kernel Parameters Perspective
Middleware Posters and Demos '16: Proceedings of the Posters and Demos Session of the 17th International Middleware ConferencePages 1–2https://doi.org/10.1145/3007592.3007593Despite a number of innovative computer systems with high capacity memory have been built, the design principles behind an operating system kernel have remained unchanged for decades. We argue that kernel parameters is a kind of special interface of ...
- articleJanuary 2016
Using DRAM as Cache for Non-Volatile Main Memory Swapping
The performance of mobile devices such as smartphones and tablets has been rapidly improving in recent years. However, these improvements have been seriously affecting power consumption. One of the greatest challenges is to achieve efficient power ...
- research-articleJune 2015
SuperMalloc: a super fast multithreaded malloc for 64-bit machines
ISMM '15: Proceedings of the 2015 International Symposium on Memory ManagementPages 41–55https://doi.org/10.1145/2754169.2754178SuperMalloc is an implementation of malloc(3) originally designed for X86 Hardware Transactional Memory (HTM)@. It turns out that the same design decisions also make it fast even without HTM@. For the malloc-test benchmark, which is one of the most ...
Also Published in:
ACM SIGPLAN Notices: Volume 50 Issue 11 - ArticleApril 2011
Cooperating Write Buffer Cache and Virtual Memory Management for Flash Memory Based Systems
RTAS '11: Proceedings of the 2011 17th IEEE Real-Time and Embedded Technology and Applications SymposiumPages 147–156https://doi.org/10.1109/RTAS.2011.22Flash memory is becoming the storage media of choice for mobile devices and embedded systems. The performance of flash memory is impacted by the asymmetric speed of read and write operations, limited number of erase times and the absence of in-place ...
- research-articleNovember 2010
An efficient garbage collection for flash memory-based virtual memory systems
IEEE Transactions on Consumer Electronics (ITOCE), Volume 56, Issue 4Pages 2355–2363https://doi.org/10.1109/TCE.2010.5681112As more consumer electronics adopt monolithic kernels, NAND flash memory is used for the swap space in virtual memory systems. While flash memory has the advantages of low-power consumption, shock-resistance and non-volatility, it requires garbage ...