Non-blocking synchronization (NBS) has significant advantages over blocking synchronization: The same code can ran on uniprocessors, asynchronous handlers, and on shared memory multiprocessors. NBS is deadlock-free, aids fault-tolerance, eliminates interference between synchronization and the scheduler, and can increase total system throughput. These advantages are becoming even more important with the increased use of parallelism and multiprocessors, and as the cost of a delay increases relative to processor speed. This thesis demonstrates that non-blocking synchronization is practical as the sole co-ordination mechanism in systems by showing that careful design and implementation of operating system software makes implementing efficient non-blocking synchronization far easier, by demonstrating that DCAS &parl0;Double-Compare-and-Swap&parr0; is the necessary and sufficient primitive for implementing NBS, and by demonstrating that efficient hardware DCAS is practical for RISC processors. This thesis presents non-blocking implementations of common data-structures sufficient to implement an operating system kernel. These out-perform all non-blocking implementations of the same data-structures and are comparable to spin-locks under no contention. They exploit properties of well- designed systems and depend on DCAS . I present an O(n) non-blocking implementation of CAS n with extensions that support multi-objects, a contention-reduction technique based on DCAS that is fault-tolerant and OS-independent yet performs as well as the best previously published techniques, and two implementations of dynamic, software transactional memory (STM) that support multi-object updates, and have O(w) overhead cost (for w writes in an update) in the absence of preemption. Finally, I demonstrate that the proposed OS implementation of DCAS is inefficient, and present a design of an efficient, hardware, DCAS implementation that is specific to the R4000 processor; however, the observations that make implementation practical are generally applicable. In short, the incremental costs of adding binary atomic synchronization primitives are very low, given that designers have already implemented unary atomic synchronization primitives.
Cited By
- Hoffmann J, Marmar M and Shao Z Quantitative Reasoning for Proving Lock-Freedom Proceedings of the 2013 28th Annual ACM/IEEE Symposium on Logic in Computer Science, (124-133)
- Shin E, Kim I, Kim J and Eom Y Strata Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V, (217-231)
- Gidenstam A, Papatriantafilou M and Tsigas P (2010). NBmalloc, Algorithmica, 58:2, (304-338), Online publication date: 1-Oct-2010.
- Gidenstam A and Papatriantafilou M (2009). LFTHREADS, ACM SIGARCH Computer Architecture News, 36:5, (88-92), Online publication date: 20-Dec-2008.
- Gidenstam A and Papatriantafilou M LFTHREADS Proceedings of the 11th international conference on Principles of distributed systems, (217-231)
- del Cuvillo J, Zhu W and Gao G Landing openMP on cyclops-64 Proceedings of the 3rd conference on Computing frontiers, (41-50)
- Hart T, McKenney P and Brown A Making lockless synchronization fast Proceedings of the 20th international conference on Parallel and distributed processing, (21-21)
- Attiya H and Hillel E Built-in coloring for highly-concurrent doubly-linked lists Proceedings of the 20th international conference on Distributed Computing, (31-45)
- Attiya H and Hendler D Time and space lower bounds for implementations using k-CAS Proceedings of the 19th international conference on Distributed Computing, (169-183)
- Gidenstam A, Papatriantafilou M and Tsigas P Allocating memory in a lock-free manner Proceedings of the 13th annual European conference on Algorithms, (329-342)
- Michael M (2004). Hazard Pointers, IEEE Transactions on Parallel and Distributed Systems, 15:6, (491-504), Online publication date: 1-Jun-2004.
- Doherty S, Detlefs D, Groves L, Flood C, Luchangco V, Martin P, Moir M, Shavit N and Steele G DCAS is not a silver bullet for nonblocking algorithm design Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, (216-224)
- Boehm H An almost non-blocking stack Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing, (40-49)
- Sundell H and Tsigas P Lock-free and practical doubly linked list-based deques using single-word compare-and-swap Proceedings of the 8th international conference on Principles of Distributed Systems, (240-255)
- Dice D and Garthwaite A (2019). Mostly lock-free malloc, ACM SIGPLAN Notices, 38:2 supplement, (163-174), Online publication date: 15-Feb-2003.
- Luchangco V, Moir M and Shavit N Nonblocking k-compare-single-swap Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures, (314-323)
- Herlihy M, Luchangco V and Moir M Obstruction-Free Synchronization Proceedings of the 23rd International Conference on Distributed Computing Systems
- Dice D and Garthwaite A Mostly lock-free malloc Proceedings of the 3rd international symposium on Memory management, (163-174)
- Michael M High performance dynamic lock-free hash tables and list-based sets Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, (73-82)
- Michael M Safe memory reclamation for dynamic lock-free objects using atomic reads and writes Proceedings of the twenty-first annual symposium on Principles of distributed computing, (21-30)
- Dice D Implementing fast javaTM monitors with relaxed-locks Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1, (13-13)
- Detlefs D, Martin P, Moir M and Steele G Lock-free reference counting Proceedings of the twentieth annual ACM symposium on Principles of distributed computing, (190-199)
Recommendations
DCAS is not a silver bullet for nonblocking algorithm design
SPAA '04: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architecturesDespite years of research, the design of efficient nonblocking algorithms remains difficult. A key reason is that current shared-memory multiprocessor architectures support only single-location synchronisation primitives such as compare-and-swap (CAS) ...