Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A Family of Relaxed Concurrent Queues for Low-Latency Operations and Item Transfers

Published: 16 December 2022 Publication History

Abstract

The producer-consumer communication over shared memory is a critical function of current scalable systems. Queues that provide low latency and high throughput on highly utilized systems can improve the overall performance perceived by the end users. In order to address this demand, we set as priority to achieve both high operation performance and item transfer speed. The Relaxed Concurrent Queues (RCQs) are a family of queues that we have designed and implemented for that purpose. Our key idea is a relaxed ordering model that splits the enqueue and dequeue operations into a stage of sequential assignment to a queue slot and a stage of concurrent execution across the slots. At each slot, we apply no order restrictions among the operations of the same type. We define several variants of the RCQ algorithms with respect to offered concurrency, required hardware instructions, supported operations, occupied memory space, and precondition handling. For specific RCQ algorithms, we provide pseudo-code definitions and reason about their correctness and progress properties. Additionally, we theoretically estimate and experimentally validate the worst-case distance between an RCQ algorithm and a strict first-in-first-out (FIFO) queue. We developed prototype implementations of the RCQ algorithms and experimentally compare them with several representative strict FIFO and relaxed data structures over a range of workload and system settings. The RCQS algorithm is a provably linearizable lock-free member of the RCQ family. We experimentally show that RCQS achieves factors to orders of magnitude advantage over the state-of-the-art strict or relaxed queue algorithms across several latency and throughput statistics of the queue operations and item transfers.

References

[1]
2007. Intel® 64 Architecture Memory Ordering White Paper. Intel Corporation.
[2]
2017. Atomic operations library (memory_order). Retrieved October 6, 2022 from https://en.cppreference.com/w/c/atomic/memory_order.
[3]
2020. Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2: Instruction Set Reference A-Z. Order Number 325383-072US, Intel Corporation.
[4]
2020. tcmalloc. Retrieved October 6, 2022 from https://google.github.io/tcmalloc/.
[5]
Yehuda Afek, Guy Korland, Maria Natanzon, and Nir Shavit. 2010. Scalable producer-consumer pools based on elimination-diffraction trees. In International Euro-Par Conference (Ischia, Italy). Pasqua D’Ambra, Mario Guarracino, and Domenico Talia (Eds.). Lecture Notes in Computer Science, Vol. 6271. Springer, Berlin, Heidelberg, 151–162.
[6]
Yehuda Afek, Guy Korland, and Eitan Yanovsky. 2010. Quasi-linearizability: Relaxed consistency for improved concurrency. In Principles of Distributed Systems (Tozeur, Tunisia), Chenyang Lu, Toshimitsu Masuzawa, and Mohamed Mosbah (Eds.). Lecture Notes in Computer Science, Vol. 6490. Springer, Berlin, Heidelberg, 395–410.
[7]
Dan Alistarh, Justin Kopinsky, Jerry Li, and Nir Shavit. 2015. The spraylist: A scalable relaxed priority queue. In Symposium on Principles and Practice of Parallel Programming (San Francisco, CA). ACM, New York, NY, 11–20.
[8]
Dan Alistarh, Giorgi Nadiradze, and Nikita Koval. 2019. Efficiency guarantees for parallel incremental algorithms under relaxed schedulers. In Symposium on Parallel Algorithms and Architectures (Phoenix, AZ). ACM, New York, NY, 145–154.
[9]
Gunter Bolch, Stefan Greiner, Hermann de Meer, and Kishor S. Trivedi. 2006. Queueing Networks and Markov Chains: Modeling and Performance Evaluation With Computer Science Applications (2nd ed.). John Wiley & Sons, Inc., New York, NY, Chapter 7, 263–309.
[10]
Daniel Cederman, Anders Gidenstam, Phuong Ha, Hkan Sundell, Marina Papatriantafilou, and Philippas Tsigas. 2017. Lock-free concurrent data structures. In Programming Multi-Core and Many-Core Computing Systems, Sabri Pllana and Fatos Xhafa (Eds.). John Wiley & Sons, Ltd, Hoboken, NJ, Chapter 3, 59–79.
[11]
Dave Dice. 2014. PTLQueue : a scalable bounded-capacity MPMC queue. Retrieved October 6, 2022 from https://blogs.oracle.com/dave/ptlqueue-:-a-scalable-bounded-cap acity-mpmc-queue.
[12]
David Dice and Oleksandr Otenko. 2011. Brief announcement: Multilane —a concurrent blocking multiset. In Symposium on Parallelism in Algorithms and Architectures (San Jose, CA). ACM, New York, NY, 313–314.
[13]
Claude Evequoz. 2008. Non-blocking concurrent FIFO queues with single word synchronization primitives. In International Conference on Parallel Processing (Portland, OR). IEEE Computer Society, Los Alamitos, CA, 397–405.
[14]
Panagiota Fatourou and Nikolaos D. Kallimanis. 2012. Revisiting the combining synchronization technique. In Symposium on Principles and Practice of Parallel Programming (New Orleans, LA). ACM, New York, NY, 257–266.
[15]
Hubertus Franke, Rusty Russell, and Matthew Kirkwood. 2002. Fuss, futexes and furwocks: Fast userlevel locking in Linux. In Ottawa Linux Symposium (Ottawa, Canada), Ottawa Linux Symposium (OLS), 479–495.
[16]
Hossein Golestani, Amirhossein Mirhosseini, and Thomas F. Wenisch. 2019. Software data planes: You can’t always spin to win. In Symposium on Cloud Computing (Santa Cruz, CA). ACM, New York, NY, 337–350.
[17]
Jakob Gruber, Jesper Larsson Träff, and Martin Wimmer. 2016. Brief announcement: Benchmarking concurrent priority queues. In Symposium on Parallelism in Algorithms and Architectures (Pacific Grove, CA). ACM, New York, NY, 361–362.
[18]
Andreas Haas, Thomas A. Henzinger, Andreas Holzer, Christoph M. Kirsch, Michael Lippautz, Hannes Payer, Ali Sezgin, Ana Sokolova, and Helmut Veith. 2016. Local linearizability for concurrent container-type data structures. In International Conference on Concurrency Theory (Quebec City, Canada). Josée Desharnais and Radha Jagadeesan (Eds.). Leibniz International Proceedings in Informatics, Vol. 59. Dagstuhl Publishing, Saarbrücken/Wadern, Germany, 6:1–6:15.
[19]
Andreas Haas, Thomas Hütter, Christoph M. Kirsch, Michael Lippautz, Mario Preishuber, and Ana Sokolova. 2015. Scal: A benchmarking suite for concurrent data structures. In International Conference on Networked Systems (Agadir, Morocco). Ahmed Bouajjani and Hugues Fauconnier (Eds.). Lecture Notes in Computer Science, Vol. 9466. Springer, Cham, Switzerland, 1–14.
[20]
Andreas Haas, Michael Lippautz, Thomas A. Henzinger, Hannes Payer, Ana Sokolova, Christoph M. Kirsch, and Ali Sezgin. 2013. Distributed queues in shared memory: Multicore performance and scalability through quantitative relaxation. In International Conference on Computing Frontiers (Ischia, Italy). ACM, New York, NY, Article 17, 9 pages.
[21]
Thomas A. Henzinger, Christoph M. Kirsch, Hannes Payer, Ali Sezgin, and Ana Sokolova. 2013. Quantitative relaxation of concurrent data structures. In Symposium on Principles of Programming Languages (Rome, Italy). ACM, New York, NY, 317–328.
[22]
Maurice Herlihy, Nir Shavit, Victor Luchangco, and Michael Spear. 2021. The Art of Multiprocessor Programming (2nd ed.). Morgan Kaufmann, an imprint of Elsevier, Cambridge, MA, Chapter 3, 49–74.
[23]
Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems 12, 3 (July1990), 463–492.
[24]
Giorgos Kappes and Stergios V. Anastasiadis. 2020. A user-level toolkit for storage I/O isolation on multitenant hosts. In Symposium on Cloud Computing (Virtual Event, USA). ACM, New York, NY, 74–89.
[25]
Giorgos Kappes and Stergios V. Anastasiadis. 2021. POSTER: A lock-free relaxed concurrent queue for fast work distribution. In Symposium on Principles and Practice of Parallel Programming (Virtual Event). ACM, New York, NY, 454–456.
[26]
Bernhard Kerbl, Michael Kenzel, Joerg H. Mueller, Dieter Schmalstieg, and Markus Steinberger. 2018. The broker queue: A fast, linearizable FIFO queue for fine-granular work distribution on the GPU. In International Conference on Supercomputing (Beijing, China). ACM, New York, NY, 76–85.
[27]
Christoph M. Kirsch, Michael Lippautz, and Hannes Payer. 2013. Fast and scalable, lock-free k-FIFO queues. In International Conference on Parallel Computing Technologies (St. Petersburg, Russia). Victor Malyshkin (Ed.). Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, 208–223.
[28]
David Klaftenegger, Konstantinos Sagonas, and Kjell Winblad. 2018. Queue delegation locking. IEEE Transactions on Parallel and Distributed Systems 29, 3 (2018), 687–704.
[29]
Donald Ervin Knuth. 1997. The Art of Computer Programming, Volume 1 (3rd ed.). Addison-Wesley, Boston, MA, Chapter 2, 232–465.
[30]
Justin Kopinsky. 2018. Relaxed Concurrent Ordering Structures. Ph.D. Dissertation. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, Chapter 1, 11–25.
[31]
Leslie Lamport. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers9 (Sept.1979), 690–691.
[32]
Collin Lee, Seo Jin Park, Ankita Kejriwal, Satoshi Matsushita, and John Ousterhout. 2015. Implementing linearizability at large scale and low latency. In Symposium on Operating Systems Principles (Monterey, CA). ACM, New York, NY, 71–86.
[33]
Dinesh P. Mehta. 2018. Handbook of Data Structures and Applications (2nd ed.). Dinesh P. Mehta and Sartaj Sahni (Eds.). CRC Press, Taylor & Francis Group, Boca Raton, FL, Chapter 2, 23–34.
[34]
Maged M. Michael and Michael L. Scott. 1996. Simple, fast and practical non-blocking and blocking concurrent queue algorithms. In Symposium on Principles of Distributed Computing (Philadelphia, PA). ACM, New York, NY, 267–275.
[35]
Adam Morrison. 2016. Scaling synchronization in multicore programs. Communications of the ACM 58, 11 (2016), 44–51.
[36]
Adam Morrison and Yehuda Afek. 2013. Fast concurrent queues for x86 processors. In Symposium on Principles and Practice of Parallel Programming (Shenzhen, China). ACM, New York, NY, 103–112.
[37]
Or Ostrovsky and Adam Morrison. 2020. Scaling concurrent queues by using HTM to profit from failed atomic operations. In Symposium on Principles and Practice of Parallel Programming (San Diego, CA). ACM, New York, NY, 89–101.
[38]
Hamza Rihani, Peter Sanders, and Roman Dementiev. 2015. Brief announcement: MultiQueues: Simple relaxed concurrent priority queues. In Symposium on Parallelism in Algorithms and Architectures (Portland, OR). ACM, New York, NY, 80–82.
[39]
Adones Rukundo, Aras Atalar, and Philippas Tsigas. 2019. Monotonically relaxing concurrent data-structure semantics for increasing performance: An efficient 2D design framework. In International Symposium on Distributed Computing (Budapest, Hungary). Jukka Suomela (Ed.). Leibniz International Proceedings in Informatics, Vol. 146. Dagstuhl Publishing, Saarbrücken/Wadern, Germany, 31:1–31:15.
[40]
William N. Scherer, Doug Lea, and Michael L. Scott. 2006. Scalable synchronous queues. In Symposium on Principles and Practice of Parallel Programming (New York, NY). ACM, New York, NY, 147–156.
[41]
William N. Scherer and Michael L. Scott. 2004. Nonblocking concurrent data structures with condition synchronization. In International Conference on Distributed Computing (Amsterdam, Netherlands), Rachid Guerraoui (Ed.). Lecture Notes in Computer Science, Vol. 3274. Springer, Berlin, Heidelberg, 174–187.
[42]
Peter Sewel, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O. Myreen. 2010. x86-TSO: A rigorous and usable programmer’s model for x86 multiprocessors. Communications of the ACM 53, 7 (May2010), 87–97.
[43]
Niloufar Shafiei. 2009. Non-blocking array-based algorithms for stacks and queues. In International Conference on Distributed Computing and Networking (Hyderabad, India). ACM, New York, NY, 55–66.
[44]
Chien-Hua Shann, Ting-Lu Huang, and Cheng Chen. 2000. A practical nonblocking queue algorithm using compare-and-swap. In International Conference on Parallel and Distributed Systems (Iwate, Japan). IEEE Computer Society, Los Alamitos, CA, 470–476.
[45]
Nir Shavit and Gadi Taubenfeld. 2016. The computability of relaxed data structures: Queues and stacks as examples. Distributed Computing 29, 5 (Oct.2016), 395–407.
[46]
Shumpei Shiina and Kenjiro Taura. 2019. Almost deterministic work stealing. In International Conference for High Performance Computing, Networking, Storage and Analysis (Denver, CO). ACM, New York, NY, 47:1–47:16.
[47]
Håkan Sundell, Anders Gidenstam, Marina Papatriantafilou, and Philippas Tsigas. 2011. A lock-free algorithm for concurrent bags. In Symposium on Parallelism in Algorithms and Architectures (San Jose, CA). ACM, New York, NY, 335–344.
[48]
The kernel development community. 2021. Linux Scheduler. Linux version 5.15.0-rc3. Retrieved October 7, 2022 from https://www.kernel.org/doc/html/latest/scheduler/index.html.
[49]
Philippas Tsigas and Yi Zhang. 2001. A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems. In Symposium on Parallel Algorithms and Architectures (Crete Island, Greece). ACM, New York, NY, 134–143.
[50]
Martin Wimmer, Jakob Gruber, Jesper Larsson Träff, and Philippas Tsigas. 2015. The lock-free k-LSM relaxed priority queue. In Symposium on Principles and Practice of Parallel Programming (San Francisco, CA). ACM, New York, NY, 277–278.
[51]
Chaoran Yang. 2020. A benchmark framework for concurrent queue implementations. Retrieved October 7, 2022 from https://github.com/chaoran/fast-wait-free-queue.
[52]
Chaoran Yang and John Mellor-Crummey. 2016. A wait-free queue as fast as fetch-and-add. In Symposium on Principles and Practice of Parallel Programming (Barcelona, Spain). ACM, New York, NY, 16:1–16:13.
[53]
Tingzhe Zhou, Maged Michael, and Michael Spear. 2019. A practical, scalable, relaxed priority queue. In International Conference on Parallel Processing (Kyoto, Japan). ACM, New York, NY, 57:1–57:10.

Cited By

View all
  • (2024)Diciclo: Flexible User-level Services for Efficient Multitenant IsolationACM Transactions on Computer Systems10.1145/363940442:1-2(1-47)Online publication date: 13-Feb-2024
  • (2024)How to Relax Instantly: Elastic Relaxation of Concurrent Data StructuresEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_9(119-133)Online publication date: 26-Aug-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing
ACM Transactions on Parallel Computing  Volume 9, Issue 4
December 2022
102 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/3572851
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 December 2022
Online AM: 04 October 2022
Accepted: 29 September 2022
Revised: 22 September 2022
Received: 10 May 2021
Published in TOPC Volume 9, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Producer-consumer
  2. data structures
  3. lock-free
  4. blocking
  5. linearizable
  6. array-based queues
  7. relaxed ordering

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)199
  • Downloads (Last 6 weeks)15
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Diciclo: Flexible User-level Services for Efficient Multitenant IsolationACM Transactions on Computer Systems10.1145/363940442:1-2(1-47)Online publication date: 13-Feb-2024
  • (2024)How to Relax Instantly: Elastic Relaxation of Concurrent Data StructuresEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_9(119-133)Online publication date: 26-Aug-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media