Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1542431.1542434acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
research-article

A new approach to parallelising tracing algorithms

Published: 19 June 2009 Publication History

Abstract

Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshaling etc. Traditional sequential algorithms use a worklist, replacing a nodes with their unvisited children. Previous work on parallel tracing is processor-oriented in associating one worklist per processor: worklist insertion and removal requires no locking, and load balancing requires only occasional locking. However, since multiple queues may contain the same node, significant locking is necessary to avoid concurrent visits by competing processors.
This paper presents a memory-oriented solution: memory is partitioned into segments and each segment has its own worklist containing only nodes in that segment. At a given time at most one processor owns a given worklist. By arranging separate single-reader-single-writer forwarding queues to pass nodes from processor i to processor j we can process objects in an order that gives lock-free mainline code and improved locality of reference. This refactoring is analogous to the way in which a compiler changes an iteration space to eliminate data dependencies.
While it is clear that our solution can be more effective on NUMA systems and even necessary when processor-local memory may not be addressed from other processors, slightly surprisingly, it often gives significantly better speed-up on modern multi-cores architectures too. Using caches to hide memory latency loses much of its effectiveness when there is significant cross-processor memory contention or when locking is necessary.

References

[1]
Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread Scheduling for Multiprogrammed Multiprocessors. In SPAA, 1998.
[2]
C. Attanasio, D. Bacon, A. Cocchi, and S. Smith. A Comparative Evaluation of Parallel Garbage Collectors. In LCPC, Springer Verlag, pages 177--192, 2001.
[3]
G. Attardi and T. Flagella. A Customisable Memory Management Framework. In USENIX C++ Conference, Cambridge, MA, 1994.
[4]
H. Baker. Actor Systems for Real Time Computation. In Tech. Rep. TR-197, 1978.
[5]
K. Barabash, O. Ben-Yitzhak, I. Goft, E. K. Kolodner, V. Leikehman, Y. Ossia, A. Owshanko, and E. Petrank. A Parallel, Incremental Mostly Concurrent Garbage Collector for Servers. In ACM Trans. Program. Lang. Syst. 27(6), pages 1097--1146, 2005.
[6]
S. M. Blackburn, P. Cheng, and K. S. McKinley. Oil and Water? High Performance Garbage Collection in Java with MMTk. In ICSE, 2004.
[7]
H. J. Boehm. Reducing Garbage Collector Cache Misses. In ISMM'00.
[8]
C. J. Cheney. A Nonrecursive List Compacting Algorithm. In Communications of the ACM 13 (11), pages 677--678, December, 1970.
[9]
Perry Cheng and Guy E. Blelloch. A Parallel, Real-Time, Garbage-Collector. In PLDI, pages 125--136, 2001.
[10]
Yannis Chicha and Stephen Watt. A Localised Tracing Scheme applied to Garbage Collection. In APLAS, LNCS 4279, 2006.
[11]
A. Demers, M. Weiser, B. Hayes, H. Boehm, D. G. Bobrow, and S. Shenker. Combining Generational and Conservative Garbage Collection: Frameworks and Implementations. In POPL, 1990.
[12]
D. Doligez and X. Leroy. A Concurrent, Generational Garbage Collector for Multithreaded Implementation of ML. In POPL, 1993.
[13]
T. Endo, K. Taura, and A. Yonezawa. A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared Memory Machines. In SC'97.
[14]
C. Flood, D. Detlefs, N. Shavit, and C. Zhang. Parallel Garbage Collection for Shared Memory Multiprocessors. In JVM, 2001.
[15]
Robert H. Halstead Jr. Multilisp: A Language for Concurrent Symbolic Computation. In ACM Trans. Program. Lang. Syst. 7(4), pages 501--538, 1985.
[16]
Matthew Hertz, Yi Feng, and Emery D. Berger. Garbage Collection Without Paging. In PLDI, 2005.
[17]
Lorenz Huelsbergen and James R. Larus. A Concurrent Copying Collector for Languages that Distinguish Immutable Data. In SIGPLAN Not., 28(7), pages 73--82, 1993.
[18]
A. Imai and E. Tick. Evaluation of Parallel Copying Garbage Collection on a Shared Memory Multiprocessor. In IEEE Trans. Parallel Distrib. Syst. 4(9), pages 1030--1040, 1993.
[19]
Intel. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide. In http://www.intel.com/products/processor/manuals/index.htm, 2008.
[20]
R. Jones and R. Lins. Garbage Collection: Algorithms for Automatic Dynamic Memory Management. In John Wiley and Sons, July, 1996.
[21]
Simon Marlow, Tim Harris, Roshan P. James, and Simon Peyton Jones. Parallel Generational-Copying Garbage Collection with a Block-Structured Heap. In ISMM, 2008.
[22]
M. M. Michael, M. T. Vechev, and V. A. Saraswat Idempotent Work Stealing. In PPoPP'09, pages 45--54.s
[23]
David Plainfosse and Marc Shapiro. A Survey of Distributed Garbage Collection Techniques. In Broadcast Technical Report, 1994.
[24]
Y. Shuf, M. Gupta, H. Franke, A. Appel, and J. Pal Singh. Creating and Preserving Locality of Java Applications at Allocation and Garbage Collection Times. In OOPSLA, 2002.
[25]
David Siegwart and Martin Hirzel. Improving Locality with Parallel Hierarchical Copying GC. In ISMM, pages 52--63, 2006.
[26]
SUN. The SPARC architecture manual (version 9). In Prentice-Hall, Editors: D. L. Weaver and T. Germond, 1994.

Cited By

View all
  • (2020)Approximate Nearest-Neighbour Fields via Massively-Parallel Propagation-Assisted K-D Trees2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378426(5172-5181)Online publication date: 10-Dec-2020
  • (2018)Cross-component garbage collectionProceedings of the ACM on Programming Languages10.1145/32765212:OOPSLA(1-24)Online publication date: 24-Oct-2018
  • (2015)SmartStealingProceedings of the Principles and Practices of Programming on The Java Platform10.1145/2807426.2807441(170-181)Online publication date: 8-Sep-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISMM '09: Proceedings of the 2009 international symposium on Memory management
June 2009
158 pages
ISBN:9781605583471
DOI:10.1145/1542431
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. memory-centric tracing algorithm
  2. parallel

Qualifiers

  • Research-article

Conference

ISMM '09
Sponsor:

Acceptance Rates

ISMM '09 Paper Acceptance Rate 15 of 32 submissions, 47%;
Overall Acceptance Rate 72 of 156 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Approximate Nearest-Neighbour Fields via Massively-Parallel Propagation-Assisted K-D Trees2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378426(5172-5181)Online publication date: 10-Dec-2020
  • (2018)Cross-component garbage collectionProceedings of the ACM on Programming Languages10.1145/32765212:OOPSLA(1-24)Online publication date: 24-Oct-2018
  • (2015)SmartStealingProceedings of the Principles and Practices of Programming on The Java Platform10.1145/2807426.2807441(170-181)Online publication date: 8-Sep-2015
  • (2015)Topology-Aware Parallelism for NUMA Copying CollectorsRevised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing - Volume 951910.1007/978-3-319-29778-1_12(191-205)Online publication date: 9-Sep-2015
  • (2014)Size slicingProceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing10.1145/2636228.2636238(31-42)Online publication date: 3-Sep-2014
  • (2014)A study of connected object locality in NUMA heapsProceedings of the workshop on Memory Systems Performance and Correctness10.1145/2618128.2618132(1-9)Online publication date: 13-Jun-2014
  • (2013)Notions of aliasing and ownershipAliasing in Object-Oriented Programming10.5555/2554511.2554517(59-83)Online publication date: 1-Jan-2013
  • (2013)A study of the scalability of stop-the-world garbage collectors on multicoresACM SIGPLAN Notices10.1145/2499368.245114248:4(229-240)Online publication date: 16-Mar-2013
  • (2013)A study of the scalability of stop-the-world garbage collectors on multicoresACM SIGARCH Computer Architecture News10.1145/2490301.245114241:1(229-240)Online publication date: 16-Mar-2013
  • (2013)A study of the scalability of stop-the-world garbage collectors on multicoresProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451142(229-240)Online publication date: 16-Mar-2013
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media