Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3490148.3538578acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article
Open access

Performance Analysis and Modelling of Concurrent Multi-access Data Structures

Published: 11 July 2022 Publication History

Abstract

The major impediment to scaling concurrent data structures is memory contention when accessing shared data structure access-points, leading to thread serialisation, hindering parallelism. Aiming to address this challenge, significant amount of work in the literature has proposed multi-access techniques that improve concurrent data structure parallelism. However, there is little work on analysing and modelling the execution behaviour of concurrent multi-access data structures especially in a shared memory setting.
In this paper, we analyse and model the general execution behaviour of concurrent multi-access data structures in the shared memory setting. We study and analyse the behaviour of the two popular random access patterns: shared (Remote) and exclusive (Local) access, and the behaviour of the two most commonly used atomic primitives for designing lock-free data structures: Compare and Swap, and, Fetch and Add. We model the concurrent multi-accesses by splitting the thread execution procedure into five logical sessions: i) side-work, ii) access-point search iii) access-point acquisition, iv) access-point data acquisition and v) access-point data operation. We model the acquisition of an access-point, as a system of closed queuing networks with parallel servers, and data acquisition in terms of where the data is located within the memory system.
We evaluate our model on a set of concurrent data structure designs including a counter, a stack and a FIFO queue. The evaluation is carried out on two state of the art multi-core processors: Intel Xeon Phi CPU 7290 with 72 physical cores and Intel Xeon E5-2695 with 14 physical cores. Our model is able to predict the throughput performance of the given concurrent data structures with 80% to 100% accuracy on both architectures.

References

[1]
Yehuda Afek, Guy Korland, Maria Natanzon, and Nir Shavit. 2010. Scalable Producer-Consumer Pools Based on Elimination-Diffraction Trees. In Proceedings of the 16th International Euro-Par Conference on Parallel Processing: Part II (Ischia, Italy) (Euro-Par'10). Springer-Verlag, Berlin, Heidelberg, 151--162.
[2]
Yehuda Afek, Guy Korland, and Eitan Yanovsky. 2010. Quasi-Linearizability: Relaxed Consistency for Improved Concurrency. In Proceedings of the 14th International Conference on Principles of Distributed Systems (Tozeur, Tunisia) (OPODIS'10). Springer-Verlag, Berlin, Heidelberg, 395--410.
[3]
Dan Alistarh, Trevor Brown, Justin Kopinsky, Jerry Zheng Li, and Giorgi Nadiradze. 2018. Distributionally Linearizable Data Structures. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, SPAA 2018, Vienna, Austria, July 16--18, 2018, Christian Scheideler and Jeremy T. Fineman (Eds.). ACM, New York, NY, USA, 133--142. https://doi.org/10.1145/3210377.3210411
[4]
Dan Alistarh, Trevor Brown, Justin Kopinsky, and Giorgi Nadiradze. 2018. Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, PODC 2018, Egham, United Kingdom, July 23--27, 2018, Calvin Newport and Idit Keidar (Eds.). ACM, New York, NY, USA, 377--386. https://dl.acm.org/citation.cfm?id=3212756
[5]
Dan Alistarh, Justin Kopinsky, Jerry Li, and Giorgi Nadiradze. 2017. The Power of Choice in Priority Scheduling. In Proceedings of the ACM Symposium on Principles of Distributed Computing (Washington, DC, USA) (PODC '17). ACM, New York, NY, USA, 283--292. https://doi.org/10.1145/3087801.3087810
[6]
Dan Alistarh, Justin Kopinsky, Jerry Li, and Nir Shavit. 2015. The SprayList: A scalable relaxed priority queue. ACM SIGPLAN Notices 50, 8 (2015), 11--20.
[7]
Aras Atalar, Paul Renaud-Goud, and Philippas Tsigas. 2015. Analyzing the Performance of Lock-Free Data Structures: A Conflict-Based Model. In Distributed Computing - 29th International Symposium, DISC 2015, Tokyo, Japan, October 7--9, 2015, Proceedings (Lecture Notes in Computer Science, Vol. 9363), Yoram Moses (Ed.). Springer, 341--355. https://doi.org/10.1007/978--3--662--48653--5_23
[8]
Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov, Maged M. Michael, and Martin Vechev. 2011. Laws of Order: Expensive Synchronization in Concurrent Algorithms Cannot Be Eliminated. SIGPLAN Not. 46, 1 (Jan. 2011), 487--498. https://doi.org/10.1145/1925844.1926442
[9]
Gal Bar-Nissan, Danny Hendler, and Adi Suissa. 2011. A Dynamic Elimination- Combining Stack Algorithm. In Principles of Distributed Systems, Antonio Fernàndez Anta, Giuseppe Lipari, and Matthieu Roy (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 544--561.
[10]
Naama Ben-David and Guy E. Blelloch. 2017. Analyzing Contention and Backoff in Asynchronous Shared Memory. In Proceedings of the ACM Symposium on Principles of Distributed Computing (Washington, DC, USA) (PODC '17). ACM, New York, NY, USA, 53--62. https://doi.org/10.1145/3087801.3087828
[11]
Richard J. Boucherie and Nico M. van Dijk. 1997. On the arrivai theorem for product form queueing networks with blocking. Performance Evaluation 29, 3 (1997), 155--176. https://doi.org/10.1016/S0166--5316(96)00045--4
[12]
Daniel Cederman, Bapi Chatterjee, Nhan Nguyen Dang, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas. 2013. A Study of the Behavior of Synchronization Methods in Commonly Used Languages and Systems. In 27th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2013, Cambridge, MA, USA, May 20--24, 2013. IEEE Computer Society, 1309--1320. https://doi.org/10.1109/IPDPS.2013.91
[13]
Men-Chow Chiang and Gurindar S. Sohi. 1992. Evaluating Design Choices for Shared Bus Multiprocessors in a Throughput-Oriented Environment. IEEE Trans. Comput. 41, 3 (March 1992), 297--317. https://doi.org/10.1109/12.127442
[14]
Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2013. Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (Farminton, Pennsylvania) (SOSP '13). ACM, New York, NY, USA, 33--48. https://doi.org/10.1145/2517349.2522714
[15]
Cynthia Dwork, Maurice Herlihy, and Orli Waarts. 1997. Contention in Shared Memory Algorithms. J. ACM 44, 6 (nov 1997), 779--805. https://doi.org/10.1145/ 268999.269000
[16]
Faith Ellen, Danny Hendler, and Nir Shavit. 2012. On the Inherent Sequentiality of Concurrent Objects. SIAM J. Comput. 41, 3 (2012), 519--536. https://doi.org/10.1137/08072646X arXiv:https://doi.org/10.1137/08072646X
[17]
Phillip B. Gibbons, Yossi Matias, and Vijaya Ramachandran. 1998. The Queue- Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms. SIAM J. Comput. 28, 2 (1998), 733--769. https://doi.org/10.1137/S009753979427491 arXiv:https://doi.org/10.1137/S009753979427491
[18]
Andreas Haas, Thomas A. Henzinger, Andreas Holzer, Christoph M. Kirsch, Michael Lippautz, Hannes Payer, Ali Sezgin, Ana Sokolova, and Helmut Veith. 2016. Local Linearizability for Concurrent Container-Type Data Structures. In 27th International Conference on Concurrency Theory, CONCUR 2016, August 23--26, 2016, Québec City, Canada (LIPIcs, Vol. 59), Josée Desharnais and Radha Jagadeesan (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 6:1--6:15. https://doi.org/10.4230/LIPIcs.CONCUR.2016.6
[19]
Andreas Haas, Michael Lippautz, Thomas A. Henzinger, Hannes Payer, Ana Sokolova, Christoph M. Kirsch, and Ali Sezgin. 2013. Distributed Queues in Shared Memory: Multicore Performance and Scalability Through Quantitative Relaxation. In Proceedings of the ACM International Conference on Computing Frontiers (Ischia, Italy) (CF '13). ACM, New York, NY, USA, Article 17, 9 pages. https://doi.org/10.1145/2482767.2482789
[20]
Daniel Hackenberg, Daniel Molka, and Wolfgang E. Nagel. 2009. Comparing Cache Architectures and Coherency Protocols on x86--64 Multicore SMP Systems. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture (New York, New York) (MICRO 42). ACM, New York, NY, USA, 413--422. https://doi.org/10.1145/1669112.1669165
[21]
Daniel Hackenberg, Daniel Molka, and Wolfgang E. Nagel. 2009. Comparing Cache Architectures and Coherency Protocols on X86--64 Multicore SMP Systems. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (New York, New York) (MICRO 42). ACM, New York, NY, USA, 413--422. https://doi.org/10.1145/1669112.1669165
[22]
Danny Hendler and Shay Kutten. 2006. Constructing Shared Objects That Are Both Robust and High-Throughput. In Proceedings of the 20th International Conference on Distributed Computing (Stockholm, Sweden) (DISC'06). Springer-Verlag, Berlin, Heidelberg, 428--442. https://doi.org/10.1007/11864219_30
[23]
Danny Hendler, Nir Shavit, and Lena Yerushalmi. 2010. A scalable lock-free stack algorithm. J. Parallel and Distrib. Comput. 70, 1 (2010), 1--12.
[24]
Thomas A. Henzinger, Christoph M. Kirsch, Hannes Payer, Ali Sezgin, and Ana Sokolova. 2013. Quantitative Relaxation of Concurrent Data Structures. SIGPLAN Not. 48, 1 (Jan. 2013), 317--328. https://doi.org/10.1145/2480359.2429109
[25]
Fazeleh Sadat Hoseini, Aras Atalar, and Philippas Tsigas. 2019. Modeling the Performance of Atomic Primitives on Modern Architectures. In Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019, Kyoto, Japan, August 05-08, 2019. ACM, 28:1--28:11. https://doi.org/10.1145/3337821.3337901
[26]
Intel Corporation 2014. Intel 64 and IA-32 Architectures Optimization Reference Manual. Intel Corporation.
[27]
Amos Israeli and Lihu Rappoport. 1994. Disjoint-Access-Parallel Implementations of Strong Shared Memory Primitives. In Proceedings of the Thirteenth Annual ACM Symposium on Principles of Distributed Computing (Los Angeles, California, USA) (PODC '94). ACM, New York, NY, USA, 151--160. https://doi.org/10.1145/197917.198079
[28]
G. Juckeland, S. Börner, M. Kluge, S. Kölling, W.E. Nagel, S. Pflüger, H. Röding, S. Seidl, T. William, and R. Wloch. 2004. BenchIT - Performance measurement and comparison for scientific applications. In Parallel Computing, G.R. Joubert, W.E. Nagel, F.J. Peters, and W.V. Walter (Eds.). Advances in Parallel Computing, Vol. 13. North-Holland, 501--508. https://doi.org/10.1016/S0927--5452(04)80064--9
[29]
Daniel Molka, Daniel Hackenberg, Robert Schöne, and Wolfgang E. Nagel. 2015. Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture. In 44th International Conference on Parallel Processing, ICPP 2015, Beijing, China, September 1--4, 2015. IEEE Computer Society, 739--748. https://doi.org/10.1109/ICPP.2015.83
[30]
Sabela Ramos and Torsten Hoefler. 2017. Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL. In 2017 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017, Orlando, FL, USA, May 29 - June 2, 2017. IEEE Computer Society, 297--306. https://doi.org/10.1109/IPDPS.2017.30
[31]
Hamza Rihani, Peter Sanders, and Roman Dementiev. 2015. MultiQueues: Simple Relaxed Concurrent Priority Queues. In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures (Portland, Oregon, USA) (SPAA'15). ACM, New York, NY, USA, 80--82. https://doi.org/10.1145/2755573.2755616
[32]
Adones Rukundo, Aras Atalar, and Philippas Tsigas. 2019. Monotonically Relaxing Concurrent Data-Structure Semantics for Increasing Performance: An Efficient 2D Design Framework. In 33rd International Symposium on Distributed Computing, DISC 2019, October 14--18, 2019, Budapest, Hungary (LIPIcs, Vol. 146), Jukka Suomela (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 31:1--31:15. https://doi.org/10.4230/LIPIcs.DISC.2019.31
[33]
Andreas Sandberg, David Black-Schaffer, and Erik Hagersten. 2012. Efficient Techniques for Predicting Cache Sharing and Throughput. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (Minneapolis, Minnesota, USA) (PACT '12). ACM, New York, NY, USA, 305--314. https://doi.org/10.1145/2370816.2370861
[34]
Hermann Schweizer, Maciej Besta, and Torsten Hoefler. 2015. Evaluating the Cost of Atomic Operations on Modern Architectures. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT '15). IEEE Computer Society, Washington, DC, USA, 445--456. https://doi.org/10.1109/PACT.2015.24
[35]
Nir Shavit. 2011. Data Structures in the Multicore Age. Commun. ACM 54, 3 (March 2011), 76--84. https://doi.org/10.1145/1897852.1897873
[36]
Nir Shavit and Dan Touitou. 1995. Elimination Trees and the Construction of Pools and Stacks: Preliminary Version. In Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures (Santa Barbara, California, USA) (SPAA '95). ACM, New York, NY, USA, 54--63. https://doi.org/10.1145/215399.215419
[37]
Nir Shavit and Asaph Zemach. 2000. Combining funnels: a dynamic approach to software combining. J. Parallel and Distrib. Comput. 60, 11 (2000), 1355--1387.
[38]
Edward Talmage and Jennifer L.Welch. 2017. Relaxed Data Types as Consistency Conditions. In Stabilization, Safety, and Security of Distributed Systems - 19th International Symposium, SSS 2017, Boston, MA, USA, November 5--8, 2017, Proceedings (Lecture Notes in Computer Science, Vol. 10616), Paul G. Spirakis and Philippas Tsigas (Eds.). Springer, 142--156. https://doi.org/10.1007/978--3--319--69084--1_10
[39]
Martin Wimmer, Jakob Gruber, Jesper Larsson Träff, and Philippas Tsigas. 2015. The lock-free k-LSM relaxed priority queue. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, San Francisco, CA, USA, February 7--11, 2015, Albert Cohen and David Grove (Eds.). ACM, 277--278. https://doi.org/10.1145/2688500.2688547

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '22: Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures
July 2022
464 pages
ISBN:9781450391467
DOI:10.1145/3490148
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2022

Check for updates

Author Tags

  1. cache
  2. concurrency
  3. data structures
  4. locality
  5. lock-free
  6. multi-access
  7. multi-core
  8. parallel programming
  9. parallelism
  10. performance modelling
  11. queuing theorem
  12. semantic relaxation

Qualifiers

  • Research-article

Funding Sources

  • SIDA/Bright Project under the Makerere-Sweden bilateral research programme 2015-2020
  • The Swedish Foundation for International Cooperation in Research and Higher Education (STINT)
  • The Swedish Research Council (VR)

Conference

SPAA '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 355
    Total Downloads
  • Downloads (Last 12 months)198
  • Downloads (Last 6 weeks)58
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media