Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2684464.2684476acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdcnConference Proceedingsconference-collections
research-article

On the Performance of Delegation over Cache-Coherent Shared Memory

Published: 04 January 2015 Publication History

Abstract

Delegation is a thread synchronization technique where access to shared data is performed through a dedicated server thread. When a client thread requires shared data access, it makes a request to a server and waits for a response. This paper studies delegation implementation over cache-coherent shared memory, with the goal of optimizing it for high throughput. Whereas client-server communication naturally fits message-passing systems, efficient implementation over cache-coherent shared memory requires careful optimization. We demonstrate optimizations that significantly improve delegation performance on two modern x86 processors (the Intel Xeon Westmere and the AMD Opteron Magny-Cours), enabling us to come up with counter, stack and queue implementations that outperform the best known alternatives in a large number of cases. Our optimized delegation solution achieves 1.4x (resp. 2x) higher throughput compared to the most efficient state-of-the-art delegation solution on the Intel Xeon (resp. AMD Opteron).

References

[1]
A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: a new OS architecture for scalable multicore systems. In Proc. of the ACM SIGOPS 22nd symposium on Operating systems principles, 2009.
[2]
I. Calciu, D. Dice, T. Harris, M. Herlihy, A. Kogan, V. Marathe, and M. Moir. Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores. In International Conference on Principles of Distributed Systems, pages 83--97, 2013.
[3]
J. Cleary, O. Callanan, M. Purcell, and D. Gregg. Fast asymmetric thread synchronization. ACM Transactions on Architecture and Code Optimization, 9(4):27:1--27:22, Jan. 2013.
[4]
T. David, R. Guerraoui, and V. Trigonakis. Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 33--48, 2013.
[5]
P. Fatourou and N. D. Kallimanis. Revisiting the combining synchronization technique. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, 2012.
[6]
D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, 2010.
[7]
M. Herlihy. A Methodology for Implementing Highly Concurrent Data Objects. ACM Transactions Programming Languages and Systems, 15(5):745--770, Nov. 1993.
[8]
Intel. Intel 64 and IA-32 Architectures Software Developers Manual Combined Volumes: 1, 2A, 2B, 2C, 3A, 3B, and 3C, February 2014.
[9]
D. Klaftenegger, K. Sagonas, and K. Winblad. Brief announcement: Queue delegation locking. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '14, pages 70--72, 2014.
[10]
J.-P. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller. Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications. In Proceedings of the 2012 USENIX Annual Technical Conference, 2012.
[11]
J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Transactions Computer Systems, 9(1):21--65, Feb. 1991.
[12]
M. M. Michael and M. L. Scott. Simple, fast, and practical nonblocking and blocking concurrent queue algorithms. In Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing, 1996.
[13]
A. Morrison and Y. Afek. Fast concurrent queues for x86 processors. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, 2013.
[14]
Y. Oyama, K. Taura, and A. Yonezawa. Executing parallel programs with synchronization bottlenecks efficiently. In Proceedings of the International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, 1999.
[15]
J. Park, R. M. Yoo, D. S. Khudia, C. J. Hughes, and D. Kim. Location-aware Cache Management for Many-core Processors with Deep Cache Hierarchy. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '13, pages 20:1--20:12, 2013.
[16]
D. Petrović, T. Ropars, and A. Schiper. Leveraging Hardware Message Passing for Efficient Thread Synchronization. In 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014.
[17]
N. Shavit and D. Touitou. Elimination trees and the construction of pools and stacks: preliminary version. In Proceedings of the 7th annual ACM symposium on Parallel algorithms and architectures, 1995.
[18]
D. Sorin, M. Hill, and D. Wood. A Primer on Memory Consistency and Cache Coherence. Synthesis Lectures on Computer Architecture, 6(3):1--212, 2011.
[19]
M. A. Suleman, O. Mutlu, M. Qureshi, and Y. Patt. Accelerating Critical Section Execution with Asymmetric Multicore Architectures. IEEE Micro, 30(1):60--70, Jan. 2010.
[20]
R. K. Treiber. Systems Programming: Coping with Parallelism. Technical Report RJ 5118, IBM Almaden Research Center, Apr. 1986.
[21]
D. Wentzlaff and A. Agarwal. Factored operating systems (fos): the case for a scalable operating system for multicores. ACM SIGOPS Operating Systems Review, 43(2):76--85, 2009.

Cited By

View all
  • (2019)Cost Evaluation of Synchronization Algorithms for Multicore ArchitecturesAdvanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics10.4018/978-1-5225-7598-6.ch051(697-713)Online publication date: 2019
  • (2018)Cost Evaluation of Synchronization Algorithms for Multicore ArchitecturesEncyclopedia of Information Science and Technology, Fourth Edition10.4018/978-1-5225-2255-3.ch346(3989-4003)Online publication date: 2018
  • (2017)Scalable Adaptive NUMA-Aware LockIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.263069528:6(1754-1769)Online publication date: 1-Jun-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICDCN '15: Proceedings of the 16th International Conference on Distributed Computing and Networking
January 2015
360 pages
ISBN:9781450329286
DOI:10.1145/2684464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 January 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cache-coherent shared memory
  2. delegation
  3. mutual exclusion

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICDCN '15

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Cost Evaluation of Synchronization Algorithms for Multicore ArchitecturesAdvanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics10.4018/978-1-5225-7598-6.ch051(697-713)Online publication date: 2019
  • (2018)Cost Evaluation of Synchronization Algorithms for Multicore ArchitecturesEncyclopedia of Information Science and Technology, Fourth Edition10.4018/978-1-5225-2255-3.ch346(3989-4003)Online publication date: 2018
  • (2017)Scalable Adaptive NUMA-Aware LockIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.263069528:6(1754-1769)Online publication date: 1-Jun-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media