research-article

Charon: Specialized Near-Memory Processing Architecture for Clearing Dead Objects in Memory

Authors:

Jae W. LeeAuthors Info & Claims

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 726 - 739

https://doi.org/10.1145/3352460.3358297

Published: 12 October 2019 Publication History

Abstract

Garbage collection (GC) is a standard feature for high productivity programming, saving a programmer from many nasty memory-related bugs. However, these productivity benefits come with a cost in terms of application throughput, worst-case latency, and energy consumption. Since the first introduction of GC by the Lisp programming language in the 1950s, a myriad of hardware and software techniques have been proposed to reduce this cost. While the idea of accelerating GC in hardware is appealing, its impact has been very limited due to narrow coverage, lack of flexibility, intrusive system changes, and significant hardware cost. Even with specialized hardware GC performance is eventually limited by memory bandwidth bottleneck. Fortunately, emerging 3D stacked DRAM technologies shed new light on this decades-old problem by enabling efficient near-memory processing with ample memory bandwidth. Thus, we propose Charon1, the first 3D stacked memory-based GC accelerator. Through a detailed performance analysis of HotSpot JVM, we derive a set of key algorithmic primitives based on their GC time coverage and implementation complexity in hardware. Then we devise a specialized processing unit to substantially improve their memory-level parallelism and throughput with a low hardware cost. Our evaluation of Charon with the full-production HotSpot JVM running two big data analytics frameworks, Spark and GraphChi, demonstrates a 3.29× geomean speedup and 60.7% energy savings for GC over the baseline 8-core out-of-order processor.

References

[1]

Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A Scalable Processing-in-memory Accelerator for Parallel Graph Processing. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 105--117. https://doi.org/10.1145/2749469.2750386

Digital Library

[2]

Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled Instructions: A Low-overhead, Locality-aware Processing-in-memory Architecture. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). ACM, New York, NY, USA, 336--348. https://doi.org/10.1145/2749469.2750385

Digital Library

[3]

Berkin Akin, Franz Franchetti, and James C. Hoe. 2015. Data Reorganization in Memory Using 3D-stacked DRAM. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 131--143. https://doi.org/10.1145/2749469.2750397

[4]

Nadav Amit. 2017. Optimizing the TLB Shootdown Algorithm with Page Access Tracking. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, Santa Clara, CA, 27--39. https://www.usenix.org/conference/atc17/technical-sessions/presentation/amit

Digital Library

[5]

David F. Bacon, Perry Cheng, and Sunil Shukla. 2013. And then There Were None: A Stall-free Real-time Garbage Collector for Reconfigurable Hardware. Commun. ACM 56, 12 (Dec. 2013), 101--109. https://doi.org/10.1145/2534706.2534726

Digital Library

[6]

Yingyi Bu, Vinayak Borkar, Guoqing Xu, and Michael J. Carey. 2013. A Bloat-aware Design for Big Data Applications. In Proceedings of the 2013 International Symposium on Memory Management (ISMM '13). ACM, New York, NY, USA, 119--130. https://doi.org/10.1145/2491894.2466485

[7]

Chisel3. https://github.com/freechipsproject/chisel3.

[8]

Jiho Choi, Thomas Shull, and Josep Torrellas. 2018. Biased Reference Counting: Minimizing Atomic Operations in Garbage Collection. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT '18). ACM, New York, NY, USA, Article 35, 12 pages. https://doi.org/10.1145/3243176.3243195

Digital Library

[9]

Hybrid Memory Cube Consortium. Hybrid Memory Cube Specification 2.1. http://hybridmemorycube.org/files/SiteDownloads/HMC-30G-VSR_HMCC_Specification_Rev2.1_20151105.pdf.

[10]

Elliott Cooper-Balis, Paul Rosenfeld, and Bruce Jacob. 2012. Buffer-on-board Memory Systems. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12). IEEE Computer Society, Washington, DC, USA, 392--403. http://dl.acm.org/citation.cfm?id=2337159.2337204

Digital Library

[11]

Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual. Reference number: 325462-057US, 2015. https://software.intel.com/en-us/articles/intel-sdm.

[12]

Yasuko Eckert, Nuwan Jayasena, and Gabriel H Loh. 2014. Thermal feasibility of die-stacked processing in memory. In 2nd Workshop on Near-Data Processing (WoNDP '14).

[13]

Hua Fan, Aditya Ramaraju, Marlon McKenzie, Wojciech Golab, and Bernard Wong. 2015. Understanding the causes of consistency anomalies in Apache Cassandra. Proceedings of the VLDB Endowment 8, 7 (2015), 810--813.

Digital Library

[14]

Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, and Shan Lu. 2015. Interruptible Tasks: Treating Memory Pressure As Interrupts for Highly Scalable Data-parallel Programs. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15). ACM, New York, NY, USA, 394--409. https://doi.org/10.1145/2815400.2815407

Digital Library

[15]

Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical Near-Data Processing for In-Memory Analytics Frameworks. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT) (PACT '15). IEEE Computer Society, Washington, DC, USA, 113--124. https://doi.org/10.1109/PACT.2015.22

Digital Library

[16]

Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 751--764. https://doi.org/10.1145/3037697.3037702

Digital Library

[17]

Lokesh Gidra, Gaël Thomas, Julien Sopena, Marc Shapiro, and Nhan Nguyen. 2015. NumaGiC: A Garbage Collector for Big Data on Big NUMA Machines. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). ACM, New York, NY, USA, 661--673. https://doi.org/10.1145/2694344.2694361

Digital Library

[18]

Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, Manuel Costa, Derek G Murray, Steven Hand, and Michael Isard. 2015. Broom: Sweeping out garbage collection from big data systems. In 15th Workshop on Hot Topics in Operating Systems (HotOS {XV}).

[19]

Ramyad Hadidi, Lifeng Nai, Hyojong Kim, and Hyesoon Kim. 2017. CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-In-Memory. ACM Transactions on Architecture and Code Optimization 14 (12 2017), 1--25. https://doi.org/10.1145/3155287

[20]

Syed Minhaj Hassan, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2015. Near Data Processing: Impact and Optimization of 3D Memory System Architecture on the Uncore. In Proceedings of the International Symposium on Memory Systems (MEMSYS). ACM, New York, NY, USA, 11--21. https://doi.org/10.1145/2818950.2818952

Digital Library

[21]

Matthew Hertz, Yi Feng, and Emery D. Berger. 2005. Garbage Collection Without Paging. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '05). ACM, New York, NY, USA, 143--153.

[22]

Byungchul Hong, Gwangsun Kim, Jung Ho Ahn, Yongkee Kwon, Hongsik Kim, and John Kim. 2016. Accelerating Linked-list Traversal Through Near-Data Processing. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT '16). ACM, New York, NY, USA, 113--124. https://doi.org/10.1145/2967938.2967958

Digital Library

[23]

Intel. Intel i7-4790 Processor v4. https://ark.intel.com/ko/products/80806/Intel-Core-i7-4790-Processor-8M-Cache-up-to-4-00-GHz-.

[24]

Java HotSpot Virtual Machine. http://openjdk.java.net/groups/hotspot.

[25]

JEDEC. 2015. JEDEC Standard JESD235A: High Bandwidth Memory (HBM) DRAM. JEDEC Solid State Technology Association, Virginia, USA.

[26]

Dong-Ik Jeon, Kyeong-Bin Park, and Ki-Seok Chung. 2018. HMC-MAC: Processing-in Memory Architecture for Multiply-Accumulate Operations with Hybrid Memory Cube. IEEE Comput. Archit. Lett. 17, 1 (Jan. 2018), 5--8. https://doi.org/10.1109/LCA.2017.2700298

Digital Library

[27]

José A. Joao, Onur Mutlu, and Yale N. Patt. 2009. Flexible Reference-counting-based Hardware Acceleration for Garbage Collection. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 418--428. https://doi.org/10.1145/1555754.1555806

[28]

KDD Cup 2010 Dataset. https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp.

[29]

KDD Cup 2010 transformed Dataset. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.

[30]

Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A Programmable Digital Neuromorphic Architecture with High-density 3D Memory. In Proceedings of the International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 380--392. https://doi.org/10.1109/ISCA.2016.41

Digital Library

[31]

Gwangsun Kim, John Kim, Jung Ho Ahn, and Jaeha Kim. 2013. Memory-centric System Interconnect Design with Hybrid Memory Cubes. In Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE Press, Piscataway, NJ, USA, 145--156. http://dl.acm.org/citation.cfm?id=2523721.2523744

Digital Library

[32]

Hyojong Kim, Ramyad Hadidi, Lifeng Nai, Hyesoon Kim, Nuwan Jayasena, Yasuko Eckert, Onur Kayiran, and Gabriel Loh. 2018. CODA: Enabling Co-location of Computation and Data for Multiple GPU Systems. ACM Trans. Archit. Code Optim. 15, 3, Article 32 (Sept. 2018), 23 pages. https://doi.org/10.1145/3232521

Digital Library

[33]

Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI). USENIX Association, Berkeley, CA, USA, 31--46. http://dl.acm.org/citation.cfm?id=2387880.2387884

[34]

Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). ACM, New York, NY, USA, 469--480. https://doi.org/10.1145/1669112.1669172

[35]

S. Li, D. H. Yoon, K. Chen, J. Zhao, J. H. Ahn, J. B. Brockman, Y. Xie, and N. P. Jouppi. 2012. MAGE: Adaptive Granularity and ECC for resilient and power efficient memory systems. In SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1--11.

[36]

Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. 2009. Identifying Suspicious URLs: An Application of Large-scale Online Learning. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML). ACM, New York, NY, USA, 681--688. https://doi.org/10.1145/1553374.1553462

[37]

Martin Maas, Krste Asanović, Tim Harris, and John Kubiatowicz. 2016. Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 457--471. https://doi.org/10.1145/2872362.2872386

Digital Library

[38]

Martin Maas, Krste Asanović, and John Kubiatowicz. 2018. A Hardware Accelerator for Tracing Garbage Collection. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA '18). IEEE Press, Piscataway, NJ, USA, 138--151. https://doi.org/10.1109/ISCA.2018.00022

Digital Library

[39]

M. Meyer. 2004. A novel processor architecture with exact tag-free pointers. IEEE Micro 24, 3 (May 2004), 46--55. https://doi.org/10.1109/MM.2004.2

Digital Library

[40]

Matthias Meyer. 2005. An On-Chip Garbage Collection Coprocessor for Embedded Real-Time Systems. In Proceedings of the IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). IEEE Computer Society, Washington, DC, USA, 517--524. https://doi.org/10.1109/RTCSA.2005.25

Digital Library

[41]

Matthias Meyer. 2006. A True Hardware Read Barrier. In Proceedings of the International Symposium on Memory Management (ISMM). ACM, New York, NY, USA, 3--16. https://doi.org/10.1145/1133956.1133959

Digital Library

[42]

Matthias Meyer. 2006. A True Hardware Read Barrier. In Proceedings of the 5th International Symposium on Memory Management (ISMM '06). ACM, New York, NY, USA, 3--16.

Digital Library

[43]

SUN Microystems. Memory Management in the Java HotSpot™ Virtual Machine.

[44]

MIT. GraphChallenge Dataset. http://www.graphchallenge.mit.edu.

[45]

David A. Moon. 1984. Garbage Collection in a Large Lisp System. In Proceedings of the 1984 ACM Symposium on Lisp and Functional Programming (LFP '84). ACM, New York, NY, USA, 235--246. https://doi.org/10.1145/800055.802040

Digital Library

[46]

L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, and H. Kim. 2017. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks. In IEEE International Symposium on High Performance Computer Architecture (HPCA). 457--468. https://doi.org/10.1109/HPCA.2017.54

[47]

Lifeng Nai, Ramyad Hadidi, He Xiao, Hyojong Kim, Jaewoong Sim, and Hyesoon Kim. 2018. CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading (IPDPS). 680--689. https://doi.org/10.1109/IPDPS.2018.00077

[48]

R. Nair, S. F. Antao, C. Bertolli, P. Bose, J. R. Brunheroto, T. Chen, C. Cher, C. H. A. Costa, J. Doi, C. Evangelinos, B. M. Fleischer, T. W. Fox, D. S. Gallo, L. Grinberg, J. A. Gunnels, A. C. Jacob, P. Jacob, H. M. Jacobson, T. Karkhanis, C. Kim, J. H. Moreno, J. K. O'Brien, M. Ohmacht, Y. Park, D. A. Prener, B. S. Rosenburg, K. D. Ryu, O. Sallenave, M. J. Serrano, P. D. M. Siegl, K. Sugavanam, and Z. Sura. 2015. Active Memory Cube: A processing-in-memory architecture for exascale systems. IBM Journal of Research and Development 59, 2/3 (March 2015), 17:1--17:14. https://doi.org/10.1147/JRD.2015.2409732

Digital Library

[49]

Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Shan Lu, Sanazsadat Alamian, and Onur Mutlu. 2016. Yak: A High-performance Big-data-friendly Garbage Collector. In Proceedings of the USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 349--365. http://dl.acm.org/citation.cfm?id=3026877.3026905

[50]

Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). ACM, New York, NY, USA, 675--690. https://doi.org/10.1145/2694344.2694345

Digital Library

[51]

M. Ogleari, Y. Yu, C. Qian, E. Miller, and J. Zhao. 2019. String Figure: A Scalable and Elastic Memory Network Architecture. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 647--660. https://doi.org/10.1109/HPCA.2019.00016

[52]

OpenJDK7. http://openjdk.java.net/projects/jdk7/.

[53]

Oracle. Java Tuning White Paper. https://www.oracle.com/technetwork/java/tuning-139912.html.

[54]

Matthew Poremba, Itir Akgun, Jieming Yin, Onur Kayiran, Yuan Xie, and Gabriel H. Loh. 2017. There and Back Again: Optimizing the Interconnect in Networks of Memory Cubes. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 678--690. https://doi.org/10.1145/3079856.3080251

[55]

PHASE project of the Japanese National Institute of Advanced Industrial Science and Technology. Matrix Market. https://math.nist.gov/MatrixMarket/.

[56]

Seth H Pugsley, Jeffrey Jestes, Huihui Zhang, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+ logic devices on MapReduce workloads. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 190--200.

[57]

Seth H. Pugsley, Jeffrey Jestes, Huihui Zhang, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads. In ISPASS. IEEE Computer Society, 190--200. http://dblp.unitrier.de/db/conf/ispass/ispass2014.html#PugsleyJZBSBDL14

[58]

Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-core Systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA). ACM, New York, NY, USA, 475--486. https://doi.org/10.1145/2485922.2485963

Digital Library

[59]

Juri Schmidt, Holger Fröning, and Ulrich Brüning. 2016. Exploring time and energy for complex accesses to a hybrid memory cube. In Proceedings of the Second International Symposium on Memory Systems (MEMSYS). ACM, New York, NY, USA, 142--150. https://doi.org/10.1145/2989081.2989099

Digital Library

[60]

William J. Schmidt and Kelvin D. Nilsen. 1994. Performance of a Hardware-assisted Real-time Garbage Collector. SIGOPS Oper. Syst. Rev. 28, 5 (Nov. 1994), 76--85. https://doi.org/10.1145/381792.195504

Digital Library

[61]

Witawas Srisa-an, Chia-Tien Dan Lo, and Ji-en Morris Chang. 2003. Active Memory Processor: A Hardware Garbage Collector for Real-Time Java Embedded Devices. IEEE Transactions on Mobile Computing 2, 2 (April 2003), 89--101. https://doi.org/10.1109/TMC.2003.1217230

[62]

Sylvain Stanchina and Matthias Meyer. 2007. Mark-sweep or Copying?: A "Best of Both Worlds" Algorithm and a Hardware-supported Real-time Implementation. In Proceedings of the International Symposium on Memory Management (ISMM). ACM, New York, NY, USA, 173--182. https://doi.org/10.1145/1296907.1296928

Digital Library

[63]

David Ungar. 1984. Generation Scavenging: A Non-disruptive High Performance Storage Reclamation Algorithm. In Proceedings of the First ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments (SDE 1). ACM, New York, NY, USA, 157--167.

Digital Library

[64]

David Michael Ungar. 1986. The Design and Evaluation of a High Performance Smalltalk System. Ph.D. Dissertation. University of California at Berkeley, Berkeley, CA, USA. UMI order no. GAX86-24972.

[65]

Steven J. E. Wilton and Norman P. Jouppi. 1996. CACTI: An Enhanced Cache Access and Cycle Time Model. IEEE Journal of Solid-State Circuits 31 (1996), 677--688.

[66]

David S. Wise, Brian Heck, Caleb Hess, Willie Hunt, and Eric Ost. 1997. Research Demonstration of a Hardware Reference-Counting Heap. Lisp Symb. Comput. 10, 2 (July 1997), 159--181. https://doi.org/10.1023/A:1007715101339

[67]

Greg Wright, Matthew L. Seidl, and Mario Wolczko. 2006. An Object-aware Memory Architecture. Sci. Comput. Program. 62, 2 (Oct. 2006), 145--163. https://doi.org/10.1016/j.scico.2006.02.007

[68]

Ting Yang, Emery D. Berger, Scott F. Kaplan, and J. Eliot B. Moss. 2006. CRAMM: Virtual Memory Support for Garbage-collected Applications. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI '06). USENIX Association, Berkeley, CA, USA, 103--116.

Digital Library

[69]

Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud). USENIX Association, Berkeley, CA, USA, 10--10. http://dl.acm.org/citation.cfm?id=1863103.1863113

[70]

ZGC: The Z Garbage Collector. https://openjdk.java.net/projects/zgc/.

[71]

M. Zhang, Y. Zhuo, C. Wang, M. Gao, Y. Wu, K. Chen, C. Kozyrakis, and X. Qian. 2018. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 544--557. https://doi.org/10.1109/HPCA.2018.00053

Cited By

Tian BLi YJiang LCai SGao M(2024)NDPBridge: Enabling Cross-Bank Coordination in Near-DRAM-Bank Processing Architectures2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00052(628-643)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00052
Patel NMamandipoor ANouri MAlian M(2024)SmartDIMM: In-Memory Acceleration of Upper Layer Protocols2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00032(312-329)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00032
Patel NMamandipoor AQuinn DAlian M(2023)XFM: Accelerated Software-Defined Far MemoryProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623776(769-783)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623776
Show More Cited By

Index Terms

Charon: Specialized Near-Memory Processing Architecture for Clearing Dead Objects in Memory
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
      2. Special purpose systems

Recommendations

Yama: A Scalable Generational Garbage Collector for Java in Multiprocessor Systems

The current state-of-the-art generational garbage collector pauses all the program threads when it performs young and old generation garbage collection. As the number of program threads increases, the delay due to garbage collection also increases, thus ...
A generational on-the-fly garbage collector for Java
PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation

An on-the-fly garbage collector does not stop the program threads to perform the collection. Instead, the collector executes in a separate thread (or process) in parallel to the program. On-the-fly collectors are useful for multi-threaded applications ...
An on-the-fly mark and sweep garbage collector based on sliding views
Special Issue: Proceedings of the OOPSLA '03 conference

With concurrent and garbage collected languages like Java and C# becoming popular, the need for a suitable non-intrusive, efficient, and concurrent multiprocessor garbage collector has become acute. We propose a novel mark and sweep on-the-fly algorithm ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

October 2019

1104 pages

ISBN:9781450369381

DOI:10.1145/3352460

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Samsung Research Funding Center of Samsung Electronics award
Institute for Information and Communications Technology Promotion (IITP) grant funded by Korea government (MSIT)

Conference

MICRO '52

Sponsor:

SIGMICRO

MICRO '52: The 52nd Annual IEEE/ACM International Symposium on Microarchitecture

October 12 - 16, 2019

OH, Columbus, USA

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
859
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)2

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tian BLi YJiang LCai SGao M(2024)NDPBridge: Enabling Cross-Bank Coordination in Near-DRAM-Bank Processing Architectures2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00052(628-643)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00052
Patel NMamandipoor ANouri MAlian M(2024)SmartDIMM: In-Memory Acceleration of Upper Layer Protocols2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00032(312-329)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00032
Patel NMamandipoor AQuinn DAlian M(2023)XFM: Accelerated Software-Defined Far MemoryProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623776(769-783)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623776
Xie XGu PDing YNiu DZheng HXie Y(2023)MPU: Memory-centric SIMT Processor via In-DRAM Near-bank ComputingACM Transactions on Architecture and Code Optimization10.1145/360311320:3(1-26)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3603113
Choi DJeong TYeom JChung E(2023)Operand-Oriented Virtual Memory Support for Near-Memory ProcessingIEEE Transactions on Computers10.1109/TC.2023.324388172:8(2250-2263)Online publication date: 1-Aug-2023
https://doi.org/10.1109/TC.2023.3243881
Xu ZLi WYin J(2023)A lightweight distributed processing computer system2023 IEEE 3rd International Conference on Electronic Technology, Communication and Information (ICETCI)10.1109/ICETCI57876.2023.10176861(1770-1774)Online publication date: 26-May-2023
https://doi.org/10.1109/ICETCI57876.2023.10176861
Choe JCrotty AMoreshet THerlihy MBahar RAgrawal KLee I(2022)HybriDSProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538591(321-332)Online publication date: 11-Jul-2022
https://dl.acm.org/doi/10.1145/3490148.3538591
Xu YYe CSolihin YShen XSalapura VZahran MChong FTang L(2022)FFCCDProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527406(274-288)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527406
Zhao CZhang XChamberlain R(2022)Executing Data Integration Effectively and Efficiently Near the MemoryIEEE Design & Test10.1109/MDAT.2021.306995739:2(65-73)Online publication date: Apr-2022
https://doi.org/10.1109/MDAT.2021.3069957
Barker MEdwards SKim M(2022)Synthesized In-BramGarbage Collection for Accelerators with Immutable Memory2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL57034.2022.00019(47-53)Online publication date: Aug-2022
https://doi.org/10.1109/FPL57034.2022.00019
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten