research-article

A new approach to parallelising tracing algorithms

Authors:

Cosmin E. Oancea,

Stephen M. WattAuthors Info & Claims

ISMM '09: Proceedings of the 2009 international symposium on Memory management

Pages 10 - 19

https://doi.org/10.1145/1542431.1542434

Published: 19 June 2009 Publication History

Abstract

Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshaling etc. Traditional sequential algorithms use a worklist, replacing a nodes with their unvisited children. Previous work on parallel tracing is processor-oriented in associating one worklist per processor: worklist insertion and removal requires no locking, and load balancing requires only occasional locking. However, since multiple queues may contain the same node, significant locking is necessary to avoid concurrent visits by competing processors.

This paper presents a memory-oriented solution: memory is partitioned into segments and each segment has its own worklist containing only nodes in that segment. At a given time at most one processor owns a given worklist. By arranging separate single-reader-single-writer forwarding queues to pass nodes from processor i to processor j we can process objects in an order that gives lock-free mainline code and improved locality of reference. This refactoring is analogous to the way in which a compiler changes an iteration space to eliminate data dependencies.

While it is clear that our solution can be more effective on NUMA systems and even necessary when processor-local memory may not be addressed from other processors, slightly surprisingly, it often gives significantly better speed-up on modern multi-cores architectures too. Using caches to hide memory latency loses much of its effectiveness when there is significant cross-processor memory contention or when locking is necessary.

References

[1]

Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread Scheduling for Multiprogrammed Multiprocessors. In SPAA, 1998.

Digital Library

[2]

C. Attanasio, D. Bacon, A. Cocchi, and S. Smith. A Comparative Evaluation of Parallel Garbage Collectors. In LCPC, Springer Verlag, pages 177--192, 2001.

Digital Library

[3]

G. Attardi and T. Flagella. A Customisable Memory Management Framework. In USENIX C++ Conference, Cambridge, MA, 1994.

Digital Library

[4]

H. Baker. Actor Systems for Real Time Computation. In Tech. Rep. TR-197, 1978.

Digital Library

[5]

K. Barabash, O. Ben-Yitzhak, I. Goft, E. K. Kolodner, V. Leikehman, Y. Ossia, A. Owshanko, and E. Petrank. A Parallel, Incremental Mostly Concurrent Garbage Collector for Servers. In ACM Trans. Program. Lang. Syst. 27(6), pages 1097--1146, 2005.

Digital Library

[6]

S. M. Blackburn, P. Cheng, and K. S. McKinley. Oil and Water? High Performance Garbage Collection in Java with MMTk. In ICSE, 2004.

Digital Library

[7]

H. J. Boehm. Reducing Garbage Collector Cache Misses. In ISMM'00.

Digital Library

[8]

C. J. Cheney. A Nonrecursive List Compacting Algorithm. In Communications of the ACM 13 (11), pages 677--678, December, 1970.

Digital Library

[9]

Perry Cheng and Guy E. Blelloch. A Parallel, Real-Time, Garbage-Collector. In PLDI, pages 125--136, 2001.

Digital Library

[10]

Yannis Chicha and Stephen Watt. A Localised Tracing Scheme applied to Garbage Collection. In APLAS, LNCS 4279, 2006.

Digital Library

[11]

A. Demers, M. Weiser, B. Hayes, H. Boehm, D. G. Bobrow, and S. Shenker. Combining Generational and Conservative Garbage Collection: Frameworks and Implementations. In POPL, 1990.

Digital Library

[12]

D. Doligez and X. Leroy. A Concurrent, Generational Garbage Collector for Multithreaded Implementation of ML. In POPL, 1993.

Digital Library

[13]

T. Endo, K. Taura, and A. Yonezawa. A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared Memory Machines. In SC'97.

Digital Library

[14]

C. Flood, D. Detlefs, N. Shavit, and C. Zhang. Parallel Garbage Collection for Shared Memory Multiprocessors. In JVM, 2001.

Digital Library

[15]

Robert H. Halstead Jr. Multilisp: A Language for Concurrent Symbolic Computation. In ACM Trans. Program. Lang. Syst. 7(4), pages 501--538, 1985.

Digital Library

[16]

Matthew Hertz, Yi Feng, and Emery D. Berger. Garbage Collection Without Paging. In PLDI, 2005.

Digital Library

[17]

Lorenz Huelsbergen and James R. Larus. A Concurrent Copying Collector for Languages that Distinguish Immutable Data. In SIGPLAN Not., 28(7), pages 73--82, 1993.

Digital Library

[18]

A. Imai and E. Tick. Evaluation of Parallel Copying Garbage Collection on a Shared Memory Multiprocessor. In IEEE Trans. Parallel Distrib. Syst. 4(9), pages 1030--1040, 1993.

Digital Library

[19]

Intel. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide. In http://www.intel.com/products/processor/manuals/index.htm, 2008.

[20]

R. Jones and R. Lins. Garbage Collection: Algorithms for Automatic Dynamic Memory Management. In John Wiley and Sons, July, 1996.

Digital Library

[21]

Simon Marlow, Tim Harris, Roshan P. James, and Simon Peyton Jones. Parallel Generational-Copying Garbage Collection with a Block-Structured Heap. In ISMM, 2008.

Digital Library

[22]

M. M. Michael, M. T. Vechev, and V. A. Saraswat Idempotent Work Stealing. In PPoPP'09, pages 45--54.s

Digital Library

[23]

David Plainfosse and Marc Shapiro. A Survey of Distributed Garbage Collection Techniques. In Broadcast Technical Report, 1994.

Digital Library

[24]

Y. Shuf, M. Gupta, H. Franke, A. Appel, and J. Pal Singh. Creating and Preserving Locality of Java Applications at Allocation and Garbage Collection Times. In OOPSLA, 2002.

Digital Library

[25]

David Siegwart and Martin Hirzel. Improving Locality with Parallel Hierarchical Copying GC. In ISMM, pages 52--63, 2006.

Digital Library

[26]

SUN. The SPARC architecture manual (version 9). In Prentice-Hall, Editors: D. L. Weaver and T. Germond, 1994.

Digital Library

Cited By

Oancea CRobroek TGieseke F(2020)Approximate Nearest-Neighbour Fields via Massively-Parallel Propagation-Assisted K-D Trees2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378426(5172-5181)Online publication date: 10-Dec-2020
https://doi.org/10.1109/BigData50022.2020.9378426
Degenbaev UEisinger JHara KHlopko MLippautz MPayer H(2018)Cross-component garbage collectionProceedings of the ACM on Programming Languages10.1145/32765212:OOPSLA(1-24)Online publication date: 24-Oct-2018
https://dl.acm.org/doi/10.1145/3276521
Qian JSrisa-an WLi DJiang HSeth SYang YStansifer RKrall A(2015)SmartStealingProceedings of the Principles and Practices of Programming on The Java Platform10.1145/2807426.2807441(170-181)Online publication date: 8-Sep-2015
https://dl.acm.org/doi/10.1145/2807426.2807441
Show More Cited By

Index Terms

A new approach to parallelising tracing algorithms
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems
IPDPS '03: Proceedings of the 17th International Symposium on Parallel and Distributed Processing

We present an efficient and practical lock-free implementation of a concurrent priority queue that is suitable for both fully concurrent (large multi-processor) systems as well as pre-emptive (multi-process) systems. Many algorithms for concurrent ...
Incrementally parallelizing database transactions with thread-level speculation

With the advent of chip multiprocessors, exploiting intratransaction parallelism in database systems is an attractive way of improving transaction performance. However, exploiting intratransaction parallelism is difficult for two reasons: first, ...
A scalable multi-producer multi-consumer wait-free ring buffer
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

A ring buffer or cyclical queue is a First In, First Out (FIFO) queue that stores elements on a fixed-length array. This allows for efficient O(1) operations, cache-aware optimizations, and low memory overhead. Because ring buffers are limited to only ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISMM '09: Proceedings of the 2009 international symposium on Memory management

June 2009

158 pages

ISBN:9781605583471

DOI:10.1145/1542431

General Chair:
Hillel Kolodner
IBM Haifa Research
,
Program Chair:
Guy Steele
Sun Microsystems Laboratories

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISMM '09

Sponsor:

ISMM '09: International Symposium on Memory Management

June 19 - 20, 2009

Dublin, Ireland

Acceptance Rates

ISMM '09 Paper Acceptance Rate 15 of 32 submissions, 47%;

Overall Acceptance Rate 72 of 156 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
290
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)2

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Oancea CRobroek TGieseke F(2020)Approximate Nearest-Neighbour Fields via Massively-Parallel Propagation-Assisted K-D Trees2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378426(5172-5181)Online publication date: 10-Dec-2020
https://doi.org/10.1109/BigData50022.2020.9378426
Degenbaev UEisinger JHara KHlopko MLippautz MPayer H(2018)Cross-component garbage collectionProceedings of the ACM on Programming Languages10.1145/32765212:OOPSLA(1-24)Online publication date: 24-Oct-2018
https://dl.acm.org/doi/10.1145/3276521
Qian JSrisa-an WLi DJiang HSeth SYang YStansifer RKrall A(2015)SmartStealingProceedings of the Principles and Practices of Programming on The Java Platform10.1145/2807426.2807441(170-181)Online publication date: 8-Sep-2015
https://dl.acm.org/doi/10.1145/2807426.2807441
Alnowaiser KSinger J(2015)Topology-Aware Parallelism for NUMA Copying CollectorsRevised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing - Volume 951910.1007/978-3-319-29778-1_12(191-205)Online publication date: 9-Sep-2015
https://dl.acm.org/doi/10.1007/978-3-319-29778-1_12
Henriksen TElsman MOancea CBerthold JSheeran MNewton R(2014)Size slicingProceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing10.1145/2636228.2636238(31-42)Online publication date: 3-Sep-2014
https://dl.acm.org/doi/10.1145/2636228.2636238
Alnowaiser KSinger JKulkarni MHarris T(2014)A study of connected object locality in NUMA heapsProceedings of the workshop on Memory Systems Performance and Correctness10.1145/2618128.2618132(1-9)Online publication date: 13-Jun-2014
https://dl.acm.org/doi/10.1145/2618128.2618132
Mycroft AVoigt J(2013)Notions of aliasing and ownershipAliasing in Object-Oriented Programming10.5555/2554511.2554517(59-83)Online publication date: 1-Jan-2013
https://dl.acm.org/doi/10.5555/2554511.2554517
Gidra LThomas GSopena JShapiro M(2013)A study of the scalability of stop-the-world garbage collectors on multicoresACM SIGPLAN Notices10.1145/2499368.245114248:4(229-240)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2499368.2451142
Gidra LThomas GSopena JShapiro M(2013)A study of the scalability of stop-the-world garbage collectors on multicoresACM SIGARCH Computer Architecture News10.1145/2490301.245114241:1(229-240)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2490301.2451142
Gidra LThomas GSopena JShapiro MSarkar VBodik R(2013)A study of the scalability of stop-the-world garbage collectors on multicoresProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451142(229-240)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2451116.2451142
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents