Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3315573.3329985acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
research-article

Scaling up parallel GC work-stealing in many-core environments

Published: 23 June 2019 Publication History

Abstract

Parallel copying garbage collection (GC) is widely used in the de facto Java virtual machines such as OpenJDK and OpenJ9. OpenJDK uses work-stealing for copying objects in the Parallel GC and Garbage-First (G1) GC policies to balance the copying task among GC threads. When a thread has no task in its own queue, it tries to steal a task from another thread's queue as a thief. When a thief succeeds in stealing a task, it processes the task and enqueues the children of the task into its queue, which is accessible from other thieves.Unfortunately, the overhead of the work-stealing framework becomes non-negligible when we aim to achieve a minimum GC pause time by increasing the number of GC threads. Since the number of tasks processed per thread decreases, thieves frequently try to steal tasks from others at a low success rate. When a thief fails in steals continuously, it needs to wait in a spin loop on the termination protocol of the work-stealing framework. Spinning in a loop frequently results in high CPU utilization, which is not acceptable in a large-scale data center where severe power management is required. This paper proposes two approaches named steal-best-of-many selection and spin-less termination to reduce the overhead in the work-stealing framework. Steal-best-of-many selection reduces steal failures by changing the number of queue selections to steal in accordance with the number of GC threads. Spin-less termination moves a part of the object copies into a spin loop by changing the procedure of copying GC. It reduces part of the GC pause time for the object copy as well as the CPU utilization for the spin loop. We developed a prototype on OpenJDK8 and evaluated it using SPECjbb2015 and SPECjvm2008 benchmarks. Critical-jOPS performance of SPECjbb2015 improved by 18% at maximum and scores of the SPECjvm2008 benchmarks improved by 1-5%.

References

[1]
Changes for spin-less termination with best-of-many selection. http://cr.openjdk.java.net/~mhorie/logs/spinless_bestofMany.diff.
[2]
Implementing an IBM High-Performance Computing Solution on IBM POWER8. http://www.redbooks.ibm.com/redbooks/pdfs/sg248263.pdf.
[3]
Intel Threading Building Blocks. https://software.intel.com/en-us/intel-tbb.
[4]
JDK-8205921 Optimizing best-of-2 work stealing queue selection. https://bugs.openjdk.java.net/browse/JDK-8205921.
[5]
OpenJ9. https://www.eclipse.org/openj9/.
[6]
OpenJDK. http://openjdk.java.net/.
[7]
Oprofile. http://oprofile.sourceforge.net/about/.
[8]
Parker::park function. hotspot/src/os/linux/vm/os_linux.cpp.
[9]
WebSphere Application Server. https://www.ibm.com/cloud/websphere-application-platform.
[10]
Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 1998. Thread Scheduling for Multiprogrammed Multiprocessors. In Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA'98).
[11]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'95).
[12]
David Chase and Yossi Lev. 2005. Dynamic Circular Work-stealing Deque. In Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'05).
[13]
Guojing Cong, Sreedhar Kodali, Sriram Krishnamoorthy, Doug Lea, Vijay Saraswat, and Tong Wen. 2008. Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing. In International Conference on Parallel Processing (ICPP'08).
[14]
David Cunningham, David Grove, Benjamin Herta, Arun Iyengar, Kiyokuni Kawachiya, Hiroki Murata, Vijay Saraswat, Mikio Takeuchi, and Olivier Tardieu. 2014. Resilient X10: Efficient Failure-aware Programming. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'14).
[15]
James Dinan, D. Brian Larkins, P. Sadayappan, Sriram Krishnamoorthy, and Jarek Nieplocha. 2009. Scalable Work Stealing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC'09).
[16]
Toshio Endo, Kenjiro Taura, and Akinori Yonezawa. 1997. A Scalable Mark-sweep Garbage Collector on Large-scale Shared-memory Machines. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing (SC'97).
[17]
Helin Eric. 2012. Improving load balancing during the marking phase of garbage collection.
[18]
Christine H. Flood, David Detlefs, Nir Shavit, and Xiaolan Zhang. 2001. Parallel Garbage Collection for Shared Memory Multiprocessors. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1 (JVM'01).
[19]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI'98).
[20]
Wessam Hassanein. 2016. Understanding and Improving JVM GC Work Stealing at the Data Center Scale. In Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management (ISMM 2016).
[21]
Danny Hendler and Nir Shavit. 2002. Non-blocking Steal-half Work Queues. In Proceedings of the Twenty-first Annual Symposium on Principles of Distributed Computing (PODC'02).
[22]
Park Hyunkyu, Lee Changmin, and Hun Kim Seung. 2013. Mark-Sharing: A Parallel Garbage Collection Algorithm for Low Synchronization Overhead. In International Conference on Parallel and Distributed Systems 2013 (ICPADS'13).
[23]
Maged M. Michael, Martin T. Vechev, and Vijay A. Saraswat. 2009. Idempotent Work Stealing. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '09).
[24]
Kun Suo, Jia Rao, Hong Jiang, and Witawas Srisa-an. 2018. Characterizing and Optimizing Hotspot Parallel Garbage Collection on Multicore Systems. In Proceedings of the Thirteenth EuroSys Conference (EuroSys '18).
[25]
Olivier Tardieu, Benjamin Herta, David Cunningham, David Grove, Prabhanjan Kambadur, Vijay Saraswat, Avraham Shinnar, Mikio Takeuchi, and Mandana Vaziri. 2014. X10 and APGAS at Petascale. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'14).
[26]
Tom van Dijk and Jaco van de Pol. 2014. Lace: Non-blocking Split Deque for Work-Stealing. In Euro-Par 2014 International Workshops.
[27]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI'12). 15-28.
[28]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud'10).

Cited By

View all
  • (2021)Work-stealing Strategies That Consider Work Amount and HierarchyJournal of Information Processing10.2197/ipsjjip.29.47829(478-489)Online publication date: 2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISMM 2019: Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management
June 2019
135 pages
ISBN:9781450367226
DOI:10.1145/3315573
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OpenJDK
  2. Work-stealing
  3. copying GC

Qualifiers

  • Research-article

Conference

ISMM '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 72 of 156 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)2
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Work-stealing Strategies That Consider Work Amount and HierarchyJournal of Information Processing10.2197/ipsjjip.29.47829(478-489)Online publication date: 2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media