Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2258996.2259006acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
research-article

Scalable concurrent and parallel mark

Published: 15 June 2012 Publication History

Abstract

Parallel marking algorithms use multiple threads to walk through the object heap graph and mark each reachable object as live. Parallel marker threads mark an object "live" by atomically setting a bit in a mark-bitmap or a bit in the object header. Most of these parallel algorithms strive to improve the marking throughput by using work-stealing algorithms for load-balancing and to ensure that all participating threads are kept busy. A purely "processor-centric" load-balancing approach in conjunction with a need to atomically set the mark bit, results in significant contention during parallel marking. This limits the scalability and throughput of parallel marking algorithms.
We describe a new non-blocking and lock-free, work-sharing algorithm, the primary goal being to reduce contention during atomic updates of the mark-bitmap by parallel task-threads. Our work-sharing mechanism uses the address of a word in the mark-bitmap as the key to stripe work among parallel task-threads, with only a subset of the task-threads working on each stripe. This filters out most of the contention during parallel marking with 20% improvements in performance.
In case of concurrent and on-the-fly collector algorithms, mutator threads also generate marking-work for the marking task-threads. In these schemes, mutator threads are also provided with thread-local marking stacks where they collect references to potentially "gray" objects, i.e., objects that haven't been "marked-through" by the collector. We note that since this work is generated by mutators when they reference these objects, there is a high likelihood that these objects continue to be present in the processor cache. We describe and evaluate a scheme to distribute mutator generated marking work among the collector's task-threads that is cognizant of the processor and cache topology. We prototype both our algorithms within the C4 [28] collector that ships as part of an industrial strength JVM for the Linux-X86 platform.

References

[1]
Intel® 64 and ia-32 architectures developer's manual: Combined volumes,. URL http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.
[2]
Intel® 64 architecture processor topology enumeration,. URL http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/.
[3]
Standard performance evaluation corporation. spec jvm98. URL http://www.spec.org/jvm98/.
[4]
The volano benchmark. URL http://www.volano.com/benchmarks.html.
[5]
N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In SPAA, pages 119--129, 1998.
[6]
H. Azatchi, Y. Levanoni, H. Paz, and E. Petrank. An on-the-fly mark and sweep garbage collector based on sliding views. pages 269--281. 10.1145/949305.949329.
[7]
K. Barabash, O. Ben-Yitzhak, I. Goft, E. K. Kolodner, V. Leikehman, Y. Ossia, A. Owshanko, and E. Petrank. A parallel, incremental, mostly concurrent garbage collector for servers. ACM Trans. Program. Lang. Syst., 27 (6): 1097--1146, 2005.
[8]
VanDrunen, von Dincklage, and Wiedermann}dacapoS. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications, pages 169--190, New York, NY, USA, Oct. 2006. ACM Press. http://doi.acm.org/10.1145/1167473.1167488.
[9]
H.-J. Boehm. Reducing garbage collector cache misses. pages 59--64.
[10]
C.-Y. Cher, A. L. Hosking, and T. Vijaykumar. Software prefetching for mark-sweep garbage collection: Hardware analysis and software redesign. pages 199--210. 10.1145/1024393.1024417.
[11]
C. Click, G. Tene, and M. Wolf. The pauseless gc algorithm. In Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments, VEE '05, pages 46--56, New York, NY, USA, 2005. ACM. ISBN 1-59593-047-7. URL http://doi.acm.org/10.1145/1064979.1064988.
[12]
D. Detlefs and T. Printezis. A Generational Mostly-concurrent Garbage Collector. Technical report, Mountain View, CA, USA, 2000.
[13]
D. Detlefs, C. H. Flood, S. Heller, and T. Printezis. Garbage-first garbage collection. In ISMM, pages 37--48, 2004.
[14]
E. W. Dijkstra, L. Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. On-the-fly garbage collection: An exercise in cooperation. In Language Hierarchies and Interfaces: International Summer School, volume 46, pages 43--56. Marktoberdorf, Germany, 1976.
[15]
T. Domani, E. K. Kolodner, and E. Petrank. A generational on-the-fly garbage collector for java. In PLDI, pages 274--284, 2000.
[16]
U. Drepper. What every programmer should know about memory. URL http://www.akkadia.org/drepper/cpumemory.pdf.
[17]
T. Endo and K. Taura. Reducing pause time of conservative collectors. In MSP/ISMM, pages 119--131, 2002.
[18]
T. Endo, K. Taura, and A. Yonezawa. Predicting scalability of parallel garbage collectors on shared memory multiprocessors. In Proceedings of the 15th International Parallel & Distributed Processing Symposium, IPDPS '01, pages 43--, Washington, DC, USA, 2001. IEEE Computer Society. ISBN 0-7695-0990-8. URL http://dl.acm.org/citation.cfm?id=645609.662496.
[19]
C. H. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel garbage collection for shared memory multiprocessors. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1, JVM'01, pages 21--21, Berkeley, CA, USA, 2001. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1267847.1267868.
[20]
R. Garner, S. M. Blackburn, and D. Frampton. Effective prefetch for mark-sweep garbage collection. In ISMM, pages 43--54, 2007.
[21]
R. H. Halstead. Multilisp: A language for concurrent symbolic computation. ACM Trans. Prog. Lang. Syst., 7 (4): 501--538, Oct. 1985. 10.1145/4472.4478.
[22]
R. Jones and C. Ryder. A study of Java object demographics. pages 121--130. 10.1145/1375634.1375652.
[23]
R. Jones, A. Hosking, and E. Moss. The Garbage Collection Handbook: The Art of Automatic Memory Management. CRC Applied Algorithms and Data Structures. Chapman & Hall, Aug. 2011. ISBN 978-1420082791.
[24]
T. Ogasawara. Numa-aware memory manager with dominant-thread-based copying gc. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, OOPSLA '09, pages 377--390, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-766-0. URL http://doi.acm.org/10.1145/1640089.1640117.
[25]
Y. Ossia, O. Ben-Yitzhak, I. Goft, E. K. Kolodner, V. Leikehman, and A. Owshanko. A parallel, incremental and concurrent GC for servers. pages 129--140. 10.1145/512529.512546.
[26]
F. Siebert. Limits of parallel marking garbage collection. In Proceedings of the 7th international symposium on Memory management, ISMM '08, pages 21--29, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-134-7. http://doi.acm.org/10.1145/1375634.1375638. URL http://doi.acm.org/10.1145/1375634.1375638.
[27]
F. Siebert. Concurrent, parallel, real-time garbage-collection. In Proceedings of the 2010 international symposium on Memory management, ISMM '10, pages 11--20, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0054-4. http://doi.acm.org/10.1145/1806651.1806654. URL http://doi.acm.org/10.1145/1806651.1806654.
[28]
G. Tene, B. Iyengar, and M. Wolf. C4: the continuously concurrent compacting collector. In Proceedings of the international symposium on Memory management, ISMM '11, pages 79--88, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0263-0. URL http://doi.acm.org/10.1145/1993478.1993491.
[29]
M. M. Tikir and J. K. Hollingsworth. Numa-aware java heaps for server applications. IPDPS '05, pages 108.2--. IEEE Computer Society. ISBN 0-7695-2312-9. URL http://dx.doi.org/10.1109/IPDPS.2005.299.
[30]
M. Wu and X.-F. Li. Task-pushing: a scalable parallel gc marking algorithm without synchronization operations. In IPDPS, pages 1--10, 2007.

Cited By

View all
  • (2024)ScaleCache: A Scalable Page Cache for Multiple Solid-State DrivesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629588(641-656)Online publication date: 22-Apr-2024
  • (2015)Evaluating HTM for Pauseless Garbage Collectors in JavaProceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 0310.1109/Trustcom.2015.606(1-8)Online publication date: 20-Aug-2015
  • (2012)Big Data ChallengesProceedings of the 2012 Second International Conference on Cloud and Green Computing10.1109/CGC.2012.17(702-707)Online publication date: 1-Nov-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISMM '12: Proceedings of the 2012 international symposium on Memory Management
June 2012
152 pages
ISBN:9781450313506
DOI:10.1145/2258996
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 47, Issue 11
    ISMM '12
    November 2012
    136 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2426642
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. compare-and-swap instruction
  2. concurrent marking
  3. parallel marking
  4. prefetching
  5. processor and cache topology
  6. scalable parallel algorithms
  7. work-sharing
  8. work-stealing

Qualifiers

  • Research-article

Conference

ISMM '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 72 of 156 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)ScaleCache: A Scalable Page Cache for Multiple Solid-State DrivesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629588(641-656)Online publication date: 22-Apr-2024
  • (2015)Evaluating HTM for Pauseless Garbage Collectors in JavaProceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 0310.1109/Trustcom.2015.606(1-8)Online publication date: 20-Aug-2015
  • (2012)Big Data ChallengesProceedings of the 2012 Second International Conference on Cloud and Green Computing10.1109/CGC.2012.17(702-707)Online publication date: 1-Nov-2012
  • (2016)Understanding and improving JVM GC work stealing at the data center scaleACM SIGPLAN Notices10.1145/3241624.292670651:11(46-54)Online publication date: 14-Jun-2016
  • (2016)Understanding and improving JVM GC work stealing at the data center scaleProceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management10.1145/2926697.2926706(46-54)Online publication date: 14-Jun-2016
  • (2015)SmartStealingProceedings of the Principles and Practices of Programming on The Java Platform10.1145/2807426.2807441(170-181)Online publication date: 8-Sep-2015

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media