research-article

Scalable concurrent and parallel mark

Authors:

Balaji Iyengar,

Edward Gehringer,

Karthikeyan ManivannanAuthors Info & Claims

ISMM '12: Proceedings of the 2012 international symposium on Memory Management

Pages 61 - 72

https://doi.org/10.1145/2258996.2259006

Published: 15 June 2012 Publication History

Abstract

Parallel marking algorithms use multiple threads to walk through the object heap graph and mark each reachable object as live. Parallel marker threads mark an object "live" by atomically setting a bit in a mark-bitmap or a bit in the object header. Most of these parallel algorithms strive to improve the marking throughput by using work-stealing algorithms for load-balancing and to ensure that all participating threads are kept busy. A purely "processor-centric" load-balancing approach in conjunction with a need to atomically set the mark bit, results in significant contention during parallel marking. This limits the scalability and throughput of parallel marking algorithms.

We describe a new non-blocking and lock-free, work-sharing algorithm, the primary goal being to reduce contention during atomic updates of the mark-bitmap by parallel task-threads. Our work-sharing mechanism uses the address of a word in the mark-bitmap as the key to stripe work among parallel task-threads, with only a subset of the task-threads working on each stripe. This filters out most of the contention during parallel marking with 20% improvements in performance.

In case of concurrent and on-the-fly collector algorithms, mutator threads also generate marking-work for the marking task-threads. In these schemes, mutator threads are also provided with thread-local marking stacks where they collect references to potentially "gray" objects, i.e., objects that haven't been "marked-through" by the collector. We note that since this work is generated by mutators when they reference these objects, there is a high likelihood that these objects continue to be present in the processor cache. We describe and evaluate a scheme to distribute mutator generated marking work among the collector's task-threads that is cognizant of the processor and cache topology. We prototype both our algorithms within the C4 [28] collector that ships as part of an industrial strength JVM for the Linux-X86 platform.

References

[1]

Intel® 64 and ia-32 architectures developer's manual: Combined volumes,. URL http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

[2]

Intel® 64 architecture processor topology enumeration,. URL http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/.

[3]

Standard performance evaluation corporation. spec jvm98. URL http://www.spec.org/jvm98/.

[4]

The volano benchmark. URL http://www.volano.com/benchmarks.html.

[5]

N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In SPAA, pages 119--129, 1998.

Digital Library

[6]

H. Azatchi, Y. Levanoni, H. Paz, and E. Petrank. An on-the-fly mark and sweep garbage collector based on sliding views. pages 269--281. 10.1145/949305.949329.

Digital Library

[7]

K. Barabash, O. Ben-Yitzhak, I. Goft, E. K. Kolodner, V. Leikehman, Y. Ossia, A. Owshanko, and E. Petrank. A parallel, incremental, mostly concurrent garbage collector for servers. ACM Trans. Program. Lang. Syst., 27 (6): 1097--1146, 2005.

Digital Library

[8]

VanDrunen, von Dincklage, and Wiedermann}dacapoS. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications, pages 169--190, New York, NY, USA, Oct. 2006. ACM Press. http://doi.acm.org/10.1145/1167473.1167488.

Digital Library

[9]

H.-J. Boehm. Reducing garbage collector cache misses. pages 59--64.

Digital Library

[10]

C.-Y. Cher, A. L. Hosking, and T. Vijaykumar. Software prefetching for mark-sweep garbage collection: Hardware analysis and software redesign. pages 199--210. 10.1145/1024393.1024417.

Digital Library

[11]

C. Click, G. Tene, and M. Wolf. The pauseless gc algorithm. In Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments, VEE '05, pages 46--56, New York, NY, USA, 2005. ACM. ISBN 1-59593-047-7. URL http://doi.acm.org/10.1145/1064979.1064988.

Digital Library

[12]

D. Detlefs and T. Printezis. A Generational Mostly-concurrent Garbage Collector. Technical report, Mountain View, CA, USA, 2000.

Digital Library

[13]

D. Detlefs, C. H. Flood, S. Heller, and T. Printezis. Garbage-first garbage collection. In ISMM, pages 37--48, 2004.

Digital Library

[14]

E. W. Dijkstra, L. Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. On-the-fly garbage collection: An exercise in cooperation. In Language Hierarchies and Interfaces: International Summer School, volume 46, pages 43--56. Marktoberdorf, Germany, 1976.

Digital Library

[15]

T. Domani, E. K. Kolodner, and E. Petrank. A generational on-the-fly garbage collector for java. In PLDI, pages 274--284, 2000.

Digital Library

[16]

U. Drepper. What every programmer should know about memory. URL http://www.akkadia.org/drepper/cpumemory.pdf.

[17]

T. Endo and K. Taura. Reducing pause time of conservative collectors. In MSP/ISMM, pages 119--131, 2002.

Digital Library

[18]

T. Endo, K. Taura, and A. Yonezawa. Predicting scalability of parallel garbage collectors on shared memory multiprocessors. In Proceedings of the 15th International Parallel & Distributed Processing Symposium, IPDPS '01, pages 43--, Washington, DC, USA, 2001. IEEE Computer Society. ISBN 0-7695-0990-8. URL http://dl.acm.org/citation.cfm?id=645609.662496.

Digital Library

[19]

C. H. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel garbage collection for shared memory multiprocessors. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1, JVM'01, pages 21--21, Berkeley, CA, USA, 2001. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1267847.1267868.

Digital Library

[20]

R. Garner, S. M. Blackburn, and D. Frampton. Effective prefetch for mark-sweep garbage collection. In ISMM, pages 43--54, 2007.

Digital Library

[21]

R. H. Halstead. Multilisp: A language for concurrent symbolic computation. ACM Trans. Prog. Lang. Syst., 7 (4): 501--538, Oct. 1985. 10.1145/4472.4478.

Digital Library

[22]

R. Jones and C. Ryder. A study of Java object demographics. pages 121--130. 10.1145/1375634.1375652.

Digital Library

[23]

R. Jones, A. Hosking, and E. Moss. The Garbage Collection Handbook: The Art of Automatic Memory Management. CRC Applied Algorithms and Data Structures. Chapman & Hall, Aug. 2011. ISBN 978-1420082791.

Digital Library

[24]

T. Ogasawara. Numa-aware memory manager with dominant-thread-based copying gc. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, OOPSLA '09, pages 377--390, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-766-0. URL http://doi.acm.org/10.1145/1640089.1640117.

Digital Library

[25]

Y. Ossia, O. Ben-Yitzhak, I. Goft, E. K. Kolodner, V. Leikehman, and A. Owshanko. A parallel, incremental and concurrent GC for servers. pages 129--140. 10.1145/512529.512546.

Digital Library

[26]

F. Siebert. Limits of parallel marking garbage collection. In Proceedings of the 7th international symposium on Memory management, ISMM '08, pages 21--29, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-134-7. http://doi.acm.org/10.1145/1375634.1375638. URL http://doi.acm.org/10.1145/1375634.1375638.

Digital Library

[27]

F. Siebert. Concurrent, parallel, real-time garbage-collection. In Proceedings of the 2010 international symposium on Memory management, ISMM '10, pages 11--20, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0054-4. http://doi.acm.org/10.1145/1806651.1806654. URL http://doi.acm.org/10.1145/1806651.1806654.

Digital Library

[28]

G. Tene, B. Iyengar, and M. Wolf. C4: the continuously concurrent compacting collector. In Proceedings of the international symposium on Memory management, ISMM '11, pages 79--88, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0263-0. URL http://doi.acm.org/10.1145/1993478.1993491.

Digital Library

[29]

M. M. Tikir and J. K. Hollingsworth. Numa-aware java heaps for server applications. IPDPS '05, pages 108.2--. IEEE Computer Society. ISBN 0-7695-2312-9. URL http://dx.doi.org/10.1109/IPDPS.2005.299.

Digital Library

[30]

M. Wu and X.-F. Li. Task-pushing: a scalable parallel gc marking algorithm without synchronization operations. In IPDPS, pages 1--10, 2007.

Cited By

Pham KCho SLee SNguyen LYeo HJeong ILee SKim NSon Y(2024)ScaleCache: A Scalable Page Cache for Multiple Solid-State DrivesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629588(641-656)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629588
Carpen-Amarie MDice DMarlier PThomas GFelber P(2015)Evaluating HTM for Pauseless Garbage Collectors in JavaProceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 0310.1109/Trustcom.2015.606(1-8)Online publication date: 20-Aug-2015
https://dl.acm.org/doi/10.1109/Trustcom.2015.606
Kejariwal A(2012)Big Data ChallengesProceedings of the 2012 Second International Conference on Cloud and Green Computing10.1109/CGC.2012.17(702-707)Online publication date: 1-Nov-2012
https://dl.acm.org/doi/10.1109/CGC.2012.17
Show More Cited By

Index Terms

Scalable concurrent and parallel mark
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Concurrent programming structures
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

Scalable concurrent and parallel mark
ISMM '12

Parallel marking algorithms use multiple threads to walk through the object heap graph and mark each reachable object as live. Parallel marker threads mark an object "live" by atomically setting a bit in a mark-bitmap or a bit in the object header. Most ...
Lock-free parallel and concurrent garbage collection by mark&sweep

This paper presents a lock-free algorithm for mark&sweep garbage collection (GC) in a realistic model using synchronization primitives load-linked/store-conditional (LL/SC) or compare-and-swap (CAS) offered by machine architectures. The algorithm is ...
Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign
ASPLOS XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems

Tracing garbage collectors traverse references from live program variables, transitively tracing out the closure of live objects. Memory accesses incurred during tracing are essentially random: a given object may contain references to any other object. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISMM '12: Proceedings of the 2012 international symposium on Memory Management

June 2012

152 pages

ISBN:9781450313506

DOI:10.1145/2258996

General Chair:
Martin Vechev
ETH Zurich
,
Program Chair:
Kathryn S. McKinley
The University of Texas at Austin and Microsoft Research

ACM SIGPLAN Notices Volume 47, Issue 11
ISMM '12
November 2012
136 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2426642
Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISMM '12

Sponsor:

SIGPLAN

ISMM '12: International Symposium on Memory Management

June 15 - 16, 2012

Beijing, China

Acceptance Rates

Overall Acceptance Rate 72 of 156 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
263
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)2

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pham KCho SLee SNguyen LYeo HJeong ILee SKim NSon Y(2024)ScaleCache: A Scalable Page Cache for Multiple Solid-State DrivesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629588(641-656)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629588
Carpen-Amarie MDice DMarlier PThomas GFelber P(2015)Evaluating HTM for Pauseless Garbage Collectors in JavaProceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 0310.1109/Trustcom.2015.606(1-8)Online publication date: 20-Aug-2015
https://dl.acm.org/doi/10.1109/Trustcom.2015.606
Kejariwal A(2012)Big Data ChallengesProceedings of the 2012 Second International Conference on Cloud and Green Computing10.1109/CGC.2012.17(702-707)Online publication date: 1-Nov-2012
https://dl.acm.org/doi/10.1109/CGC.2012.17
Hassanein W(2016)Understanding and improving JVM GC work stealing at the data center scaleACM SIGPLAN Notices10.1145/3241624.292670651:11(46-54)Online publication date: 14-Jun-2016
https://dl.acm.org/doi/10.1145/3241624.2926706
Hassanein WFlood CZhang Z(2016)Understanding and improving JVM GC work stealing at the data center scaleProceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management10.1145/2926697.2926706(46-54)Online publication date: 14-Jun-2016
https://dl.acm.org/doi/10.1145/2926697.2926706
Qian JSrisa-an WLi DJiang HSeth SYang YStansifer RKrall A(2015)SmartStealingProceedings of the Principles and Practices of Programming on The Java Platform10.1145/2807426.2807441(170-181)Online publication date: 8-Sep-2015
https://dl.acm.org/doi/10.1145/2807426.2807441

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents