research-article

Transactional Acceleration of Concurrent Data Structures

Authors:

Michael SpearAuthors Info & Claims

SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures

Pages 244 - 253

https://doi.org/10.1145/2755573.2755598

Published: 13 June 2015 Publication History

Abstract

Concurrent data structures are a fundamental building block for scalable multi-threaded programs. While Transactional Memory (TM) was originally conceived as a mechanism for simplifying the creation of concurrent data structures, modern hardware TM systems lack the progress properties needed to completely obviate traditional techniques for designing concurrent data structures, especially those requiring nonblocking progress guarantees. In this paper, we introduce the Prefix Transaction Optimization (PTO) technique for employing hardware TM to accelerate existing concurrent data structures. Our technique consists of three stages: the creation of a prefix transaction, the mechanical optimization of the prefix transaction, and then algorithm-specific optimizations to further improve performance. We apply PTO to five nonblocking data structures, and observe speedups of up to 2x at one thread, and up to 3x at 8 threads.

References

[1]

M. Abadi and L. Lamport. The Existence of Refinement Mappings. Theoretical Computer Science, pages 253--284, May 1991.

Digital Library

[2]

C. Blundell, E. C. Lewis, and M. M. K. Martin. Subtleties of Transactional Memory Atomicity Semantics. Computer Architecture Letters, 5(2), Nov. 2006.

Digital Library

[3]

J. Bobba, K. E. Moore, H. Volos, L. Yen, M. D. Hill, M. M. Swift, and D. A. Wood. Performance Pathologies in Hardware Transactional Memory. In Proceedings of the 34th International Symposium on Computer Architecture, San Diego, CA, June 2007.

Digital Library

[4]

I. Calciu, J. Gottschlich, T. Shpeisman, G. Pokam, and M. Herlihy. Invyswell: A Hybrid Transactional Memory for Haswell's Restricted Transactional Memory. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques, Edmonton, AB, Canada, Aug. 2014.

Digital Library

[5]

T. David, R. Guerraoui, T. Che, and V. Trigonakis. Designing ASCY-compliant Concurrent Search Data Structures. Technical Report EPFL-REPORT-203822, Ecole Polytechnique Federale de Lausanne, 2014.

[6]

D. Dice, T. Harris, A. Kogan, Y. Lev, and M. Moir. Pitfalls of Lazy Subscription. In Proceedings of the 6th Workshop on the Theory of Transactional Memory, Paris, France, July 2014.

[7]

D. Dice, Y. Lev, V. Marathe, M. Moir, M. Olszewski, and D. Nussbaum. Simplifying Concurrent Algorithms by Exploiting Hardware™. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, Santorini, Greece, June 2010.

Digital Library

[8]

A. Dragojevic, M. Herlihy, Y. Lev, and M. Moir. On The Power of Hardware Transactional Memory to Simplify Memory Management. In Proceedings of the 30th ACM Symposium on Principles of Distributed Computing, San Jose, CA, June 2011.

Digital Library

[9]

F. Ellen, P. Fatourou, E. Ruppert, and F. van Breugel. Non-blocking Binary Search Trees. In Proceedings of the 29th ACM Symposium on Principles of Distributed Computing, Zurich, Switzerland, July 2010.

Digital Library

[10]

F. Ellen, Y. Lev, V. Luchangco, and M. Moir. SNZI: Scalable NonZero Indicators. In Proceedings of the Twenty-Sixth ACM Symposium on Principles of Distributed Computing, Portland, OR, Aug. 2007.

Digital Library

[11]

P. Fatourou and N. D. Kallimanis. A Highly-Efficient Wait-Free Universal Construction. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, San Jose, CA, June 2011.

Digital Library

[12]

K. Fraser. Practical Lock-Freedom. PhD thesis, King's College, University of Cambridge, Sept. 2003.

[13]

V. Gramoli. More Than You Ever Wanted to Know about Synchronization. In Proceedings of the 20th ACM Symposium on Principles and Practice of Parallel Programming, San Francisco, CA, Feb. 2015.

Digital Library

[14]

T. Harris. A Pragmatic Implementation of Non-Blocking Linked Lists. In Proceedings of the 15th International Symposium on Distributed Computing, Lisbon, Portugal, Oct. 2001.

Digital Library

[15]

T. Harris, K. Fraser, and I. Pratt. A Practical Multi-word Compare-and-Swap Operation. In Proceedings of the 16th International Conference on Distributed Computing, Toulouse, France, Oct. 2002.

Digital Library

[16]

S. Heller, M. Herlihy, V. Luchangco, M. Moir, W. Scherer, and N. Shavit. A Lazy Concurrent List-Based Set Algorithm. In Proceedings of the 9th international conference on Principles of Distributed Systems, Pisa, Italy, Dec. 2006.

Digital Library

[17]

D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat Combining and the Synchronization-Parallelism Tradeoff. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, Santorini, Greece, June 2010.

Digital Library

[18]

M. Herlihy. A Methodology for Implementing Highly Concurrent Data Structures. In Proceedings of the Second ACM Symposium on Principles and Practice of Parallel Programming, Seattle, WA, Mar. 1990.

Digital Library

[19]

M. Herlihy. Wait-Free Synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124--149, 1991.

Digital Library

[20]

M. Herlihy. A Methodology for Implementing Highly Concurrent Data Objects. ACM Transactions on Programming Languages and Systems, 15(5):745--770, 1993.

Digital Library

[21]

M. P. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-Free Data Structures. In Proceedings of the 20th International Symposium on Computer Architecture, San Diego, CA, May 1993.

Digital Library

[22]

Intel Corporation. Intel Architecture Instruction Set Extensions Programming (Chapter 8: Transactional Synchronization Extensions). Feb. 2012.

[23]

C. Jacobi, T. Slegel, and D. Greiner. Transactional Memory Architecture and Implementation for IBM System Z. In Proceedings of the 45th International Symposium On Microarchitecture, Vancouver, BC, Canada, Dec. 2012.

Digital Library

[24]

P. Jayanti. f-arrays: Implementation and applications. In Proceedings of the 21st ACM Symposium on Principles of Distributed Computing, Monterey, California, July 2002.

Digital Library

[25]

A. Kogan and E. Petrank. A Methodology for Creating Fast Wait-Free Data Structures. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, New Orleans, LA, Feb. 2012.

Digital Library

[26]

L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, C-28(9):241--248, Sept. 1979.

Digital Library

[27]

Y. Lev and J.-W. Maessen. Split Hardware Transactions: True Nesting of Transactions Using Best-Effort Hardware Transactional Memory. In Proceedings of the 13th ACM Symposium on Principles and Practice of Parallel Programming, Salt Lake City, UT, Feb. 2008.

Digital Library

[28]

Y. Liu, V. Luchangco, and M. Spear. Mindicators: A Scalable Approach to Quiescence. In Proceedings of 33rd International Conference on Distributed Computing Systems, Philadelphia, PA, July 2013.

Digital Library

[29]

Y. Liu and M. Spear. Mounds: Array-Based Concurrent Priority Queues. In Proceedings of the 41st International Conference on Parallel Processing, Pittsburgh, PA, Sept. 2012.

Digital Library

[30]

Y. Liu, K. Zhang, and M. Spear. Dynamic-Sized Nonblocking Hash Tables. In Proceedings of the 33rd ACM Symposium on Principles of Distributed Computing, Paris, France, July 2014.

Digital Library

[31]

I. Lotan and N. Shavit. Skiplist-Based Concurrent Priority Queues. In Proceedings of the 14th International Parallel and Distributed Processing Symposium, Cancun, Mexico, May 2000.

Digital Library

[32]

V. Luchangco, M. Moir, and N. Shavit. Nonblocking k-compare-single-swap. In Proceedings of the 15th ACM Symposium on Parallel Algorithms and Architectures, San Diego, CA, June 2003.

Digital Library

[33]

P. E. McKenney. Exploiting Deferred Destruction: An Analysis of Read-Copy-Update Techniques in Operating System Kernels. PhD thesis, OGI School of Science and Engineering at Oregon Health and Sciences University, 2004.

Digital Library

[34]

M. Michael. Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects. IEEE Transactions on Parallel and Distributed Systems, 15(6):491--504, June 2004.

Digital Library

[35]

M. M. Michael and M. L. Scott. Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms. In Proceedings of the 15th ACM Symposium on Principles of Distributed Computing, May 1996.

Digital Library

[36]

A. Morrison and Y. Afek. Fast Concurrent Queues for x86 Processors. In Proceedings of the 18th ACM Symposium on Principles and Practice of Parallel Programming, Shenzhen, China, Feb. 2013.

Digital Library

[37]

S. S. Muchnick. Advanced Compiler Design Implementation. Morgan Kaufmann, 1997.

Digital Library

[38]

N. Neelakantam, R. Rajwar, S. Srinivas, U. Srinivasan, and C. Zilles. Hardware Atomicity for Reliable Software Speculation. In Proceedings of the 34th International Symposium on Computer Architecture, San Diego, CA, June 2007.

Digital Library

[39]

A. Prokopec, N. Bronson, P. Bagwell, and M. Odersky. Concurrent Tries with Efficient Non-Blocking Snapshots. In Proceedings of the 17th ACM Symposium on Principles and Practice of Parallel Programming, Feb. 2012.

Digital Library

[40]

R. Rajwar and J. R. Goodman. Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution. In Proceedings of the 34th IEEE/ACM International Symposium on Microarchitecture, Austin, TX, Dec. 2001.

Digital Library

[41]

N. Shafiei. Non-blocking Patricia Tries with Replace Operations. In Proceedings of 33rd International Conference on Distributed Computing Systems, Philadelphia, PA, July 2013.

Digital Library

[42]

H. Sundell and P. Tsigas. Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems. Journal of Parallel and Distributed Computing, 65:609--627, May 2005.

Digital Library

[43]

S. Timnat, A. Braginsky, A. Kogan, and E. Petrank. Wait-Free Linked-Lists. In Proceedings of the 16th International Conference on Principles of Distributed Systems, Rome, Italy, Dec. 2012.

[44]

A. Wang, M. Gaudet, P. Wu, J. N. Amaral, M. Ohmacht, C. Barton, R. Silvera, and M. Michael. Evaluation of Blue Gene/Q Hardware Support for Transactional Memories. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, Minneapolis, MN, Sept. 2012.

Digital Library

[45]

L. Xiang and M. L. Scott. Compiler Aided Manual Speculation for High Performance Concurrent Data Structures. In Proceedings of the 18th ACM Symposium on Principles and Practice of Parallel Programming, Shenzhen, China, Feb. 2013.

Digital Library

[46]

R. Yoo, C. Hughes, K. Lai, and R. Rajwar. Performance Evaluation of Intel Transactional Synchronization Extensions for High Performance Computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, Nov. 2013.

Digital Library

Cited By

Dice DHerlihy MKogan A(2018)Improving Parallelism in Hardware Transactional MemoryACM Transactions on Architecture and Code Optimization10.1145/317796215:1(1-24)Online publication date: 22-Mar-2018
https://dl.acm.org/doi/10.1145/3177962
Brown TSchiller ESchwarzmann A(2017)A Template for Implementing Fast Lock-free Trees Using HTMProceedings of the ACM Symposium on Principles of Distributed Computing10.1145/3087801.3087834(293-302)Online publication date: 25-Jul-2017
https://dl.acm.org/doi/10.1145/3087801.3087834
Zhou TLuchangco VSpear MScheideler CHajiaghayi M(2017)Hand-Over-Hand Transactions with Precise Memory ReclamationProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087587(255-264)Online publication date: 24-Jul-2017
https://dl.acm.org/doi/10.1145/3087556.3087587

Index Terms

Transactional Acceleration of Concurrent Data Structures
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Looking for efficient implementations of concurrent objects
PaCT'11: Proceedings of the 11th international conference on Parallel computing technologies

As introduced by Taubenfeld, a contention-sensitive implementation of a concurrent object is an implementation such that the overhead introduced by locking is eliminated in the common cases, i.e., when there is no contention or when the operations ...
Unbounded page-based transactional memory
Proceedings of the 2006 ASPLOS Conference

Exploiting thread level parallelism is paramount in the multicore era. Transactions enable programmers to expose such parallelism by greatly simplifying the multi-threaded programming model. Virtualized transactions (unbounded in space and time) are ...
Time-Based Software Transactional Memory

Software transactional memory (STM) is a concurrency control mechanism that is widely considered to be easier to use by programmers than other mechanisms such as locking. The first generations of STMs have either relied on visible read designs, which ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures

June 2015

362 pages

ISBN:9781450335881

DOI:10.1145/2755573

General Chair:
Guy Blelloch
Carnegie Mellon University, USA
,
Program Chair:
Kunal Agrawal
Washington University in St. Louis, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SPAA '15

Sponsor:

SPAA '15: 27th ACM Symposium on Parallelism in Algorithms and Architectures

June 13 - 15, 2015

Oregon, Portland, USA

Acceptance Rates

SPAA '15 Paper Acceptance Rate 31 of 131 submissions, 24%;

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
198
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dice DHerlihy MKogan A(2018)Improving Parallelism in Hardware Transactional MemoryACM Transactions on Architecture and Code Optimization10.1145/317796215:1(1-24)Online publication date: 22-Mar-2018
https://dl.acm.org/doi/10.1145/3177962
Brown TSchiller ESchwarzmann A(2017)A Template for Implementing Fast Lock-free Trees Using HTMProceedings of the ACM Symposium on Principles of Distributed Computing10.1145/3087801.3087834(293-302)Online publication date: 25-Jul-2017
https://dl.acm.org/doi/10.1145/3087801.3087834
Zhou TLuchangco VSpear MScheideler CHajiaghayi M(2017)Hand-Over-Hand Transactions with Precise Memory ReclamationProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087587(255-264)Online publication date: 24-Jul-2017
https://dl.acm.org/doi/10.1145/3087556.3087587

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents