Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2755573.2755598acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Transactional Acceleration of Concurrent Data Structures

Published: 13 June 2015 Publication History

Abstract

Concurrent data structures are a fundamental building block for scalable multi-threaded programs. While Transactional Memory (TM) was originally conceived as a mechanism for simplifying the creation of concurrent data structures, modern hardware TM systems lack the progress properties needed to completely obviate traditional techniques for designing concurrent data structures, especially those requiring nonblocking progress guarantees. In this paper, we introduce the Prefix Transaction Optimization (PTO) technique for employing hardware TM to accelerate existing concurrent data structures. Our technique consists of three stages: the creation of a prefix transaction, the mechanical optimization of the prefix transaction, and then algorithm-specific optimizations to further improve performance. We apply PTO to five nonblocking data structures, and observe speedups of up to 2x at one thread, and up to 3x at 8 threads.

References

[1]
M. Abadi and L. Lamport. The Existence of Refinement Mappings. Theoretical Computer Science, pages 253--284, May 1991.
[2]
C. Blundell, E. C. Lewis, and M. M. K. Martin. Subtleties of Transactional Memory Atomicity Semantics. Computer Architecture Letters, 5(2), Nov. 2006.
[3]
J. Bobba, K. E. Moore, H. Volos, L. Yen, M. D. Hill, M. M. Swift, and D. A. Wood. Performance Pathologies in Hardware Transactional Memory. In Proceedings of the 34th International Symposium on Computer Architecture, San Diego, CA, June 2007.
[4]
I. Calciu, J. Gottschlich, T. Shpeisman, G. Pokam, and M. Herlihy. Invyswell: A Hybrid Transactional Memory for Haswell's Restricted Transactional Memory. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques, Edmonton, AB, Canada, Aug. 2014.
[5]
T. David, R. Guerraoui, T. Che, and V. Trigonakis. Designing ASCY-compliant Concurrent Search Data Structures. Technical Report EPFL-REPORT-203822, Ecole Polytechnique Federale de Lausanne, 2014.
[6]
D. Dice, T. Harris, A. Kogan, Y. Lev, and M. Moir. Pitfalls of Lazy Subscription. In Proceedings of the 6th Workshop on the Theory of Transactional Memory, Paris, France, July 2014.
[7]
D. Dice, Y. Lev, V. Marathe, M. Moir, M. Olszewski, and D. Nussbaum. Simplifying Concurrent Algorithms by Exploiting Hardware™. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, Santorini, Greece, June 2010.
[8]
A. Dragojevic, M. Herlihy, Y. Lev, and M. Moir. On The Power of Hardware Transactional Memory to Simplify Memory Management. In Proceedings of the 30th ACM Symposium on Principles of Distributed Computing, San Jose, CA, June 2011.
[9]
F. Ellen, P. Fatourou, E. Ruppert, and F. van Breugel. Non-blocking Binary Search Trees. In Proceedings of the 29th ACM Symposium on Principles of Distributed Computing, Zurich, Switzerland, July 2010.
[10]
F. Ellen, Y. Lev, V. Luchangco, and M. Moir. SNZI: Scalable NonZero Indicators. In Proceedings of the Twenty-Sixth ACM Symposium on Principles of Distributed Computing, Portland, OR, Aug. 2007.
[11]
P. Fatourou and N. D. Kallimanis. A Highly-Efficient Wait-Free Universal Construction. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, San Jose, CA, June 2011.
[12]
K. Fraser. Practical Lock-Freedom. PhD thesis, King's College, University of Cambridge, Sept. 2003.
[13]
V. Gramoli. More Than You Ever Wanted to Know about Synchronization. In Proceedings of the 20th ACM Symposium on Principles and Practice of Parallel Programming, San Francisco, CA, Feb. 2015.
[14]
T. Harris. A Pragmatic Implementation of Non-Blocking Linked Lists. In Proceedings of the 15th International Symposium on Distributed Computing, Lisbon, Portugal, Oct. 2001.
[15]
T. Harris, K. Fraser, and I. Pratt. A Practical Multi-word Compare-and-Swap Operation. In Proceedings of the 16th International Conference on Distributed Computing, Toulouse, France, Oct. 2002.
[16]
S. Heller, M. Herlihy, V. Luchangco, M. Moir, W. Scherer, and N. Shavit. A Lazy Concurrent List-Based Set Algorithm. In Proceedings of the 9th international conference on Principles of Distributed Systems, Pisa, Italy, Dec. 2006.
[17]
D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat Combining and the Synchronization-Parallelism Tradeoff. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, Santorini, Greece, June 2010.
[18]
M. Herlihy. A Methodology for Implementing Highly Concurrent Data Structures. In Proceedings of the Second ACM Symposium on Principles and Practice of Parallel Programming, Seattle, WA, Mar. 1990.
[19]
M. Herlihy. Wait-Free Synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124--149, 1991.
[20]
M. Herlihy. A Methodology for Implementing Highly Concurrent Data Objects. ACM Transactions on Programming Languages and Systems, 15(5):745--770, 1993.
[21]
M. P. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-Free Data Structures. In Proceedings of the 20th International Symposium on Computer Architecture, San Diego, CA, May 1993.
[22]
Intel Corporation. Intel Architecture Instruction Set Extensions Programming (Chapter 8: Transactional Synchronization Extensions). Feb. 2012.
[23]
C. Jacobi, T. Slegel, and D. Greiner. Transactional Memory Architecture and Implementation for IBM System Z. In Proceedings of the 45th International Symposium On Microarchitecture, Vancouver, BC, Canada, Dec. 2012.
[24]
P. Jayanti. f-arrays: Implementation and applications. In Proceedings of the 21st ACM Symposium on Principles of Distributed Computing, Monterey, California, July 2002.
[25]
A. Kogan and E. Petrank. A Methodology for Creating Fast Wait-Free Data Structures. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, New Orleans, LA, Feb. 2012.
[26]
L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, C-28(9):241--248, Sept. 1979.
[27]
Y. Lev and J.-W. Maessen. Split Hardware Transactions: True Nesting of Transactions Using Best-Effort Hardware Transactional Memory. In Proceedings of the 13th ACM Symposium on Principles and Practice of Parallel Programming, Salt Lake City, UT, Feb. 2008.
[28]
Y. Liu, V. Luchangco, and M. Spear. Mindicators: A Scalable Approach to Quiescence. In Proceedings of 33rd International Conference on Distributed Computing Systems, Philadelphia, PA, July 2013.
[29]
Y. Liu and M. Spear. Mounds: Array-Based Concurrent Priority Queues. In Proceedings of the 41st International Conference on Parallel Processing, Pittsburgh, PA, Sept. 2012.
[30]
Y. Liu, K. Zhang, and M. Spear. Dynamic-Sized Nonblocking Hash Tables. In Proceedings of the 33rd ACM Symposium on Principles of Distributed Computing, Paris, France, July 2014.
[31]
I. Lotan and N. Shavit. Skiplist-Based Concurrent Priority Queues. In Proceedings of the 14th International Parallel and Distributed Processing Symposium, Cancun, Mexico, May 2000.
[32]
V. Luchangco, M. Moir, and N. Shavit. Nonblocking k-compare-single-swap. In Proceedings of the 15th ACM Symposium on Parallel Algorithms and Architectures, San Diego, CA, June 2003.
[33]
P. E. McKenney. Exploiting Deferred Destruction: An Analysis of Read-Copy-Update Techniques in Operating System Kernels. PhD thesis, OGI School of Science and Engineering at Oregon Health and Sciences University, 2004.
[34]
M. Michael. Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects. IEEE Transactions on Parallel and Distributed Systems, 15(6):491--504, June 2004.
[35]
M. M. Michael and M. L. Scott. Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms. In Proceedings of the 15th ACM Symposium on Principles of Distributed Computing, May 1996.
[36]
A. Morrison and Y. Afek. Fast Concurrent Queues for x86 Processors. In Proceedings of the 18th ACM Symposium on Principles and Practice of Parallel Programming, Shenzhen, China, Feb. 2013.
[37]
S. S. Muchnick. Advanced Compiler Design Implementation. Morgan Kaufmann, 1997.
[38]
N. Neelakantam, R. Rajwar, S. Srinivas, U. Srinivasan, and C. Zilles. Hardware Atomicity for Reliable Software Speculation. In Proceedings of the 34th International Symposium on Computer Architecture, San Diego, CA, June 2007.
[39]
A. Prokopec, N. Bronson, P. Bagwell, and M. Odersky. Concurrent Tries with Efficient Non-Blocking Snapshots. In Proceedings of the 17th ACM Symposium on Principles and Practice of Parallel Programming, Feb. 2012.
[40]
R. Rajwar and J. R. Goodman. Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution. In Proceedings of the 34th IEEE/ACM International Symposium on Microarchitecture, Austin, TX, Dec. 2001.
[41]
N. Shafiei. Non-blocking Patricia Tries with Replace Operations. In Proceedings of 33rd International Conference on Distributed Computing Systems, Philadelphia, PA, July 2013.
[42]
H. Sundell and P. Tsigas. Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems. Journal of Parallel and Distributed Computing, 65:609--627, May 2005.
[43]
S. Timnat, A. Braginsky, A. Kogan, and E. Petrank. Wait-Free Linked-Lists. In Proceedings of the 16th International Conference on Principles of Distributed Systems, Rome, Italy, Dec. 2012.
[44]
A. Wang, M. Gaudet, P. Wu, J. N. Amaral, M. Ohmacht, C. Barton, R. Silvera, and M. Michael. Evaluation of Blue Gene/Q Hardware Support for Transactional Memories. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, Minneapolis, MN, Sept. 2012.
[45]
L. Xiang and M. L. Scott. Compiler Aided Manual Speculation for High Performance Concurrent Data Structures. In Proceedings of the 18th ACM Symposium on Principles and Practice of Parallel Programming, Shenzhen, China, Feb. 2013.
[46]
R. Yoo, C. Hughes, K. Lai, and R. Rajwar. Performance Evaluation of Intel Transactional Synchronization Extensions for High Performance Computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, Nov. 2013.

Cited By

View all
  • (2018)Improving Parallelism in Hardware Transactional MemoryACM Transactions on Architecture and Code Optimization10.1145/317796215:1(1-24)Online publication date: 22-Mar-2018
  • (2017)A Template for Implementing Fast Lock-free Trees Using HTMProceedings of the ACM Symposium on Principles of Distributed Computing10.1145/3087801.3087834(293-302)Online publication date: 25-Jul-2017
  • (2017)Hand-Over-Hand Transactions with Precise Memory ReclamationProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087587(255-264)Online publication date: 24-Jul-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures
June 2015
362 pages
ISBN:9781450335881
DOI:10.1145/2755573
  • General Chair:
  • Guy Blelloch,
  • Program Chair:
  • Kunal Agrawal
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. concurrency
  2. lock-freedom
  3. synchronization
  4. transactional memory

Qualifiers

  • Research-article

Funding Sources

Conference

SPAA '15

Acceptance Rates

SPAA '15 Paper Acceptance Rate 31 of 131 submissions, 24%;
Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Improving Parallelism in Hardware Transactional MemoryACM Transactions on Architecture and Code Optimization10.1145/317796215:1(1-24)Online publication date: 22-Mar-2018
  • (2017)A Template for Implementing Fast Lock-free Trees Using HTMProceedings of the ACM Symposium on Principles of Distributed Computing10.1145/3087801.3087834(293-302)Online publication date: 25-Jul-2017
  • (2017)Hand-Over-Hand Transactions with Precise Memory ReclamationProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087587(255-264)Online publication date: 24-Jul-2017

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media