Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3352460.3358297acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Charon: Specialized Near-Memory Processing Architecture for Clearing Dead Objects in Memory

Published: 12 October 2019 Publication History

Abstract

Garbage collection (GC) is a standard feature for high productivity programming, saving a programmer from many nasty memory-related bugs. However, these productivity benefits come with a cost in terms of application throughput, worst-case latency, and energy consumption. Since the first introduction of GC by the Lisp programming language in the 1950s, a myriad of hardware and software techniques have been proposed to reduce this cost. While the idea of accelerating GC in hardware is appealing, its impact has been very limited due to narrow coverage, lack of flexibility, intrusive system changes, and significant hardware cost. Even with specialized hardware GC performance is eventually limited by memory bandwidth bottleneck. Fortunately, emerging 3D stacked DRAM technologies shed new light on this decades-old problem by enabling efficient near-memory processing with ample memory bandwidth. Thus, we propose Charon1, the first 3D stacked memory-based GC accelerator. Through a detailed performance analysis of HotSpot JVM, we derive a set of key algorithmic primitives based on their GC time coverage and implementation complexity in hardware. Then we devise a specialized processing unit to substantially improve their memory-level parallelism and throughput with a low hardware cost. Our evaluation of Charon with the full-production HotSpot JVM running two big data analytics frameworks, Spark and GraphChi, demonstrates a 3.29× geomean speedup and 60.7% energy savings for GC over the baseline 8-core out-of-order processor.

References

[1]
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A Scalable Processing-in-memory Accelerator for Parallel Graph Processing. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 105--117. https://doi.org/10.1145/2749469.2750386
[2]
Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled Instructions: A Low-overhead, Locality-aware Processing-in-memory Architecture. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). ACM, New York, NY, USA, 336--348. https://doi.org/10.1145/2749469.2750385
[3]
Berkin Akin, Franz Franchetti, and James C. Hoe. 2015. Data Reorganization in Memory Using 3D-stacked DRAM. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 131--143. https://doi.org/10.1145/2749469.2750397
[4]
Nadav Amit. 2017. Optimizing the TLB Shootdown Algorithm with Page Access Tracking. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, Santa Clara, CA, 27--39. https://www.usenix.org/conference/atc17/technical-sessions/presentation/amit
[5]
David F. Bacon, Perry Cheng, and Sunil Shukla. 2013. And then There Were None: A Stall-free Real-time Garbage Collector for Reconfigurable Hardware. Commun. ACM 56, 12 (Dec. 2013), 101--109. https://doi.org/10.1145/2534706.2534726
[6]
Yingyi Bu, Vinayak Borkar, Guoqing Xu, and Michael J. Carey. 2013. A Bloat-aware Design for Big Data Applications. In Proceedings of the 2013 International Symposium on Memory Management (ISMM '13). ACM, New York, NY, USA, 119--130. https://doi.org/10.1145/2491894.2466485
[7]
Chisel3. https://github.com/freechipsproject/chisel3.
[8]
Jiho Choi, Thomas Shull, and Josep Torrellas. 2018. Biased Reference Counting: Minimizing Atomic Operations in Garbage Collection. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT '18). ACM, New York, NY, USA, Article 35, 12 pages. https://doi.org/10.1145/3243176.3243195
[9]
Hybrid Memory Cube Consortium. Hybrid Memory Cube Specification 2.1. http://hybridmemorycube.org/files/SiteDownloads/HMC-30G-VSR_HMCC_Specification_Rev2.1_20151105.pdf.
[10]
Elliott Cooper-Balis, Paul Rosenfeld, and Bruce Jacob. 2012. Buffer-on-board Memory Systems. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12). IEEE Computer Society, Washington, DC, USA, 392--403. http://dl.acm.org/citation.cfm?id=2337159.2337204
[11]
Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual. Reference number: 325462-057US, 2015. https://software.intel.com/en-us/articles/intel-sdm.
[12]
Yasuko Eckert, Nuwan Jayasena, and Gabriel H Loh. 2014. Thermal feasibility of die-stacked processing in memory. In 2nd Workshop on Near-Data Processing (WoNDP '14).
[13]
Hua Fan, Aditya Ramaraju, Marlon McKenzie, Wojciech Golab, and Bernard Wong. 2015. Understanding the causes of consistency anomalies in Apache Cassandra. Proceedings of the VLDB Endowment 8, 7 (2015), 810--813.
[14]
Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, and Shan Lu. 2015. Interruptible Tasks: Treating Memory Pressure As Interrupts for Highly Scalable Data-parallel Programs. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15). ACM, New York, NY, USA, 394--409. https://doi.org/10.1145/2815400.2815407
[15]
Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical Near-Data Processing for In-Memory Analytics Frameworks. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT) (PACT '15). IEEE Computer Society, Washington, DC, USA, 113--124. https://doi.org/10.1109/PACT.2015.22
[16]
Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 751--764. https://doi.org/10.1145/3037697.3037702
[17]
Lokesh Gidra, Gaël Thomas, Julien Sopena, Marc Shapiro, and Nhan Nguyen. 2015. NumaGiC: A Garbage Collector for Big Data on Big NUMA Machines. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). ACM, New York, NY, USA, 661--673. https://doi.org/10.1145/2694344.2694361
[18]
Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, Manuel Costa, Derek G Murray, Steven Hand, and Michael Isard. 2015. Broom: Sweeping out garbage collection from big data systems. In 15th Workshop on Hot Topics in Operating Systems (HotOS {XV}).
[19]
Ramyad Hadidi, Lifeng Nai, Hyojong Kim, and Hyesoon Kim. 2017. CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-In-Memory. ACM Transactions on Architecture and Code Optimization 14 (12 2017), 1--25. https://doi.org/10.1145/3155287
[20]
Syed Minhaj Hassan, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2015. Near Data Processing: Impact and Optimization of 3D Memory System Architecture on the Uncore. In Proceedings of the International Symposium on Memory Systems (MEMSYS). ACM, New York, NY, USA, 11--21. https://doi.org/10.1145/2818950.2818952
[21]
Matthew Hertz, Yi Feng, and Emery D. Berger. 2005. Garbage Collection Without Paging. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '05). ACM, New York, NY, USA, 143--153.
[22]
Byungchul Hong, Gwangsun Kim, Jung Ho Ahn, Yongkee Kwon, Hongsik Kim, and John Kim. 2016. Accelerating Linked-list Traversal Through Near-Data Processing. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT '16). ACM, New York, NY, USA, 113--124. https://doi.org/10.1145/2967938.2967958
[23]
Intel. Intel i7-4790 Processor v4. https://ark.intel.com/ko/products/80806/Intel-Core-i7-4790-Processor-8M-Cache-up-to-4-00-GHz-.
[24]
Java HotSpot Virtual Machine. http://openjdk.java.net/groups/hotspot.
[25]
JEDEC. 2015. JEDEC Standard JESD235A: High Bandwidth Memory (HBM) DRAM. JEDEC Solid State Technology Association, Virginia, USA.
[26]
Dong-Ik Jeon, Kyeong-Bin Park, and Ki-Seok Chung. 2018. HMC-MAC: Processing-in Memory Architecture for Multiply-Accumulate Operations with Hybrid Memory Cube. IEEE Comput. Archit. Lett. 17, 1 (Jan. 2018), 5--8. https://doi.org/10.1109/LCA.2017.2700298
[27]
José A. Joao, Onur Mutlu, and Yale N. Patt. 2009. Flexible Reference-counting-based Hardware Acceleration for Garbage Collection. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 418--428. https://doi.org/10.1145/1555754.1555806
[28]
KDD Cup 2010 Dataset. https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp.
[29]
KDD Cup 2010 transformed Dataset. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.
[30]
Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A Programmable Digital Neuromorphic Architecture with High-density 3D Memory. In Proceedings of the International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 380--392. https://doi.org/10.1109/ISCA.2016.41
[31]
Gwangsun Kim, John Kim, Jung Ho Ahn, and Jaeha Kim. 2013. Memory-centric System Interconnect Design with Hybrid Memory Cubes. In Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE Press, Piscataway, NJ, USA, 145--156. http://dl.acm.org/citation.cfm?id=2523721.2523744
[32]
Hyojong Kim, Ramyad Hadidi, Lifeng Nai, Hyesoon Kim, Nuwan Jayasena, Yasuko Eckert, Onur Kayiran, and Gabriel Loh. 2018. CODA: Enabling Co-location of Computation and Data for Multiple GPU Systems. ACM Trans. Archit. Code Optim. 15, 3, Article 32 (Sept. 2018), 23 pages. https://doi.org/10.1145/3232521
[33]
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI). USENIX Association, Berkeley, CA, USA, 31--46. http://dl.acm.org/citation.cfm?id=2387880.2387884
[34]
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). ACM, New York, NY, USA, 469--480. https://doi.org/10.1145/1669112.1669172
[35]
S. Li, D. H. Yoon, K. Chen, J. Zhao, J. H. Ahn, J. B. Brockman, Y. Xie, and N. P. Jouppi. 2012. MAGE: Adaptive Granularity and ECC for resilient and power efficient memory systems. In SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1--11.
[36]
Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. 2009. Identifying Suspicious URLs: An Application of Large-scale Online Learning. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML). ACM, New York, NY, USA, 681--688. https://doi.org/10.1145/1553374.1553462
[37]
Martin Maas, Krste Asanović, Tim Harris, and John Kubiatowicz. 2016. Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 457--471. https://doi.org/10.1145/2872362.2872386
[38]
Martin Maas, Krste Asanović, and John Kubiatowicz. 2018. A Hardware Accelerator for Tracing Garbage Collection. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA '18). IEEE Press, Piscataway, NJ, USA, 138--151. https://doi.org/10.1109/ISCA.2018.00022
[39]
M. Meyer. 2004. A novel processor architecture with exact tag-free pointers. IEEE Micro 24, 3 (May 2004), 46--55. https://doi.org/10.1109/MM.2004.2
[40]
Matthias Meyer. 2005. An On-Chip Garbage Collection Coprocessor for Embedded Real-Time Systems. In Proceedings of the IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). IEEE Computer Society, Washington, DC, USA, 517--524. https://doi.org/10.1109/RTCSA.2005.25
[41]
Matthias Meyer. 2006. A True Hardware Read Barrier. In Proceedings of the International Symposium on Memory Management (ISMM). ACM, New York, NY, USA, 3--16. https://doi.org/10.1145/1133956.1133959
[42]
Matthias Meyer. 2006. A True Hardware Read Barrier. In Proceedings of the 5th International Symposium on Memory Management (ISMM '06). ACM, New York, NY, USA, 3--16.
[43]
SUN Microystems. Memory Management in the Java HotSpot™ Virtual Machine.
[44]
MIT. GraphChallenge Dataset. http://www.graphchallenge.mit.edu.
[45]
David A. Moon. 1984. Garbage Collection in a Large Lisp System. In Proceedings of the 1984 ACM Symposium on Lisp and Functional Programming (LFP '84). ACM, New York, NY, USA, 235--246. https://doi.org/10.1145/800055.802040
[46]
L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, and H. Kim. 2017. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks. In IEEE International Symposium on High Performance Computer Architecture (HPCA). 457--468. https://doi.org/10.1109/HPCA.2017.54
[47]
Lifeng Nai, Ramyad Hadidi, He Xiao, Hyojong Kim, Jaewoong Sim, and Hyesoon Kim. 2018. CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading (IPDPS). 680--689. https://doi.org/10.1109/IPDPS.2018.00077
[48]
R. Nair, S. F. Antao, C. Bertolli, P. Bose, J. R. Brunheroto, T. Chen, C. Cher, C. H. A. Costa, J. Doi, C. Evangelinos, B. M. Fleischer, T. W. Fox, D. S. Gallo, L. Grinberg, J. A. Gunnels, A. C. Jacob, P. Jacob, H. M. Jacobson, T. Karkhanis, C. Kim, J. H. Moreno, J. K. O'Brien, M. Ohmacht, Y. Park, D. A. Prener, B. S. Rosenburg, K. D. Ryu, O. Sallenave, M. J. Serrano, P. D. M. Siegl, K. Sugavanam, and Z. Sura. 2015. Active Memory Cube: A processing-in-memory architecture for exascale systems. IBM Journal of Research and Development 59, 2/3 (March 2015), 17:1--17:14. https://doi.org/10.1147/JRD.2015.2409732
[49]
Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Shan Lu, Sanazsadat Alamian, and Onur Mutlu. 2016. Yak: A High-performance Big-data-friendly Garbage Collector. In Proceedings of the USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 349--365. http://dl.acm.org/citation.cfm?id=3026877.3026905
[50]
Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). ACM, New York, NY, USA, 675--690. https://doi.org/10.1145/2694344.2694345
[51]
M. Ogleari, Y. Yu, C. Qian, E. Miller, and J. Zhao. 2019. String Figure: A Scalable and Elastic Memory Network Architecture. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 647--660. https://doi.org/10.1109/HPCA.2019.00016
[52]
OpenJDK7. http://openjdk.java.net/projects/jdk7/.
[53]
Oracle. Java Tuning White Paper. https://www.oracle.com/technetwork/java/tuning-139912.html.
[54]
Matthew Poremba, Itir Akgun, Jieming Yin, Onur Kayiran, Yuan Xie, and Gabriel H. Loh. 2017. There and Back Again: Optimizing the Interconnect in Networks of Memory Cubes. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 678--690. https://doi.org/10.1145/3079856.3080251
[55]
PHASE project of the Japanese National Institute of Advanced Industrial Science and Technology. Matrix Market. https://math.nist.gov/MatrixMarket/.
[56]
Seth H Pugsley, Jeffrey Jestes, Huihui Zhang, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+ logic devices on MapReduce workloads. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 190--200.
[57]
Seth H. Pugsley, Jeffrey Jestes, Huihui Zhang, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads. In ISPASS. IEEE Computer Society, 190--200. http://dblp.unitrier.de/db/conf/ispass/ispass2014.html#PugsleyJZBSBDL14
[58]
Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-core Systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA). ACM, New York, NY, USA, 475--486. https://doi.org/10.1145/2485922.2485963
[59]
Juri Schmidt, Holger Fröning, and Ulrich Brüning. 2016. Exploring time and energy for complex accesses to a hybrid memory cube. In Proceedings of the Second International Symposium on Memory Systems (MEMSYS). ACM, New York, NY, USA, 142--150. https://doi.org/10.1145/2989081.2989099
[60]
William J. Schmidt and Kelvin D. Nilsen. 1994. Performance of a Hardware-assisted Real-time Garbage Collector. SIGOPS Oper. Syst. Rev. 28, 5 (Nov. 1994), 76--85. https://doi.org/10.1145/381792.195504
[61]
Witawas Srisa-an, Chia-Tien Dan Lo, and Ji-en Morris Chang. 2003. Active Memory Processor: A Hardware Garbage Collector for Real-Time Java Embedded Devices. IEEE Transactions on Mobile Computing 2, 2 (April 2003), 89--101. https://doi.org/10.1109/TMC.2003.1217230
[62]
Sylvain Stanchina and Matthias Meyer. 2007. Mark-sweep or Copying?: A "Best of Both Worlds" Algorithm and a Hardware-supported Real-time Implementation. In Proceedings of the International Symposium on Memory Management (ISMM). ACM, New York, NY, USA, 173--182. https://doi.org/10.1145/1296907.1296928
[63]
David Ungar. 1984. Generation Scavenging: A Non-disruptive High Performance Storage Reclamation Algorithm. In Proceedings of the First ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments (SDE 1). ACM, New York, NY, USA, 157--167.
[64]
David Michael Ungar. 1986. The Design and Evaluation of a High Performance Smalltalk System. Ph.D. Dissertation. University of California at Berkeley, Berkeley, CA, USA. UMI order no. GAX86-24972.
[65]
Steven J. E. Wilton and Norman P. Jouppi. 1996. CACTI: An Enhanced Cache Access and Cycle Time Model. IEEE Journal of Solid-State Circuits 31 (1996), 677--688.
[66]
David S. Wise, Brian Heck, Caleb Hess, Willie Hunt, and Eric Ost. 1997. Research Demonstration of a Hardware Reference-Counting Heap. Lisp Symb. Comput. 10, 2 (July 1997), 159--181. https://doi.org/10.1023/A:1007715101339
[67]
Greg Wright, Matthew L. Seidl, and Mario Wolczko. 2006. An Object-aware Memory Architecture. Sci. Comput. Program. 62, 2 (Oct. 2006), 145--163. https://doi.org/10.1016/j.scico.2006.02.007
[68]
Ting Yang, Emery D. Berger, Scott F. Kaplan, and J. Eliot B. Moss. 2006. CRAMM: Virtual Memory Support for Garbage-collected Applications. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI '06). USENIX Association, Berkeley, CA, USA, 103--116.
[69]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud). USENIX Association, Berkeley, CA, USA, 10--10. http://dl.acm.org/citation.cfm?id=1863103.1863113
[70]
ZGC: The Z Garbage Collector. https://openjdk.java.net/projects/zgc/.
[71]
M. Zhang, Y. Zhuo, C. Wang, M. Gao, Y. Wu, K. Chen, C. Kozyrakis, and X. Qian. 2018. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 544--557. https://doi.org/10.1109/HPCA.2018.00053

Cited By

View all
  • (2024)NDPBridge: Enabling Cross-Bank Coordination in Near-DRAM-Bank Processing Architectures2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00052(628-643)Online publication date: 29-Jun-2024
  • (2024)SmartDIMM: In-Memory Acceleration of Upper Layer Protocols2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00032(312-329)Online publication date: 2-Mar-2024
  • (2023)XFM: Accelerated Software-Defined Far MemoryProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623776(769-783)Online publication date: 28-Oct-2023
  • Show More Cited By

Index Terms

  1. Charon: Specialized Near-Memory Processing Architecture for Clearing Dead Objects in Memory

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
      October 2019
      1104 pages
      ISBN:9781450369381
      DOI:10.1145/3352460
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 October 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Domain-specific architecture
      2. Garbage collection
      3. Java Virtual Machine
      4. Memory management
      5. Near-memory processing

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • Samsung Research Funding Center of Samsung Electronics award
      • Institute for Information and Communications Technology Promotion (IITP) grant funded by Korea government (MSIT)

      Conference

      MICRO '52
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 484 of 2,242 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)34
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)NDPBridge: Enabling Cross-Bank Coordination in Near-DRAM-Bank Processing Architectures2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00052(628-643)Online publication date: 29-Jun-2024
      • (2024)SmartDIMM: In-Memory Acceleration of Upper Layer Protocols2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00032(312-329)Online publication date: 2-Mar-2024
      • (2023)XFM: Accelerated Software-Defined Far MemoryProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623776(769-783)Online publication date: 28-Oct-2023
      • (2023)MPU: Memory-centric SIMT Processor via In-DRAM Near-bank ComputingACM Transactions on Architecture and Code Optimization10.1145/360311320:3(1-26)Online publication date: 19-Jul-2023
      • (2023)Operand-Oriented Virtual Memory Support for Near-Memory ProcessingIEEE Transactions on Computers10.1109/TC.2023.324388172:8(2250-2263)Online publication date: 1-Aug-2023
      • (2023)A lightweight distributed processing computer system2023 IEEE 3rd International Conference on Electronic Technology, Communication and Information (ICETCI)10.1109/ICETCI57876.2023.10176861(1770-1774)Online publication date: 26-May-2023
      • (2022)HybriDSProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538591(321-332)Online publication date: 11-Jul-2022
      • (2022)FFCCDProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527406(274-288)Online publication date: 18-Jun-2022
      • (2022)Executing Data Integration Effectively and Efficiently Near the MemoryIEEE Design & Test10.1109/MDAT.2021.306995739:2(65-73)Online publication date: Apr-2022
      • (2022)Synthesized In-BramGarbage Collection for Accelerators with Immutable Memory2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL57034.2022.00019(47-53)Online publication date: Aug-2022
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media