The Moore's-Law-driven performance of simple instructions has improved by orders of magnitude over the past two decades, but shared-memory multiprocessor (SMMP) synchronization operations have not kept pace. SMMP software uses synchronization operations heavily, thus suffering degraded performance and scalability. As a result, many traditional SMMP algorithms are now obsolete.
This dissertation presents read-copy update (RCU), a reader-writer synchronization mechanism in which read-side critical sections incur virtually zero synchronization overhead, thereby achieving near-ideal performance for read-mostly workloads. Write-side critical sections incur substantial synchronization overhead, deferring destruction and maintaining multiple versions of data structures in order to accommodate the synchronization-free read-side critical sections. In addition, writers use some mechanism, such as locking, to ensure orderly updates.
Readers provide a signal enabling writers to determine when it is safe to complete destructive operations, but this signal may be deferred, permitting a single signal operation to serve multiple read-side RCU critical sections.
These read-side signals are observed by a specialized garbage collector, which carries out destructive operations once it is safe to do so. Garbage collectors are typically implemented in a manner similar to a barrier computation. Production-quality garbage collectors batch destructive operations, amortizing signal-observation overhead over many updates.
Although RCU is not itself new, its use has been quite specialized. This dissertation rectifies this situation by showing how RCU can be implemented efficiently in operating system kernels, by demonstrating its system-level performance and complexity benefits, and by providing a set of design patterns that make RCU more generally applicable.
This dissertation compares RCU to traditional synchronization mechanisms, including locking and non-blocking synchronization, using both analytic and empirical methods. The empirical methods include both informal micro-benchmarks and formal system-level benchmarks. These benchmarks show performance benefits ranging from tens of percent to an order of magnitude and little or no increase in code complexity.
Finally, this dissertation demonstrates that RCU has practical value by (1) outlining its use in several production systems, two of which have seen extensive datacenter use, one of which this author designed and implemented, and (2) documenting its widespread use in the Linux 2.6 kernel.
Cited By
- Humphries J, Natu N, Chaugule A, Weisse O, Rhoden B, Don J, Rizzo L, Rombakh O, Turner P and Kozyrakis C ghOSt Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, (588-604)
- Feldman Y, Khyzha A, Enea C, Morrison A, Nanevski A, Rinetzky N and Shoham S (2020). Proving highly-concurrent traversals correct, Proceedings of the ACM on Programming Languages, 4:OOPSLA, (1-29), Online publication date: 13-Nov-2020.
- Meyer R and Wolff S (2019). Decoupling lock-free data structures from memory reclamation for static analysis, Proceedings of the ACM on Programming Languages, 3:POPL, (1-31), Online publication date: 2-Jan-2019.
- Khyzha A, Attiya H, Gotsman A and Rinetzky N (2018). Safe privatization in transactional memory, ACM SIGPLAN Notices, 53:1, (233-245), Online publication date: 23-Mar-2018.
- Prasad A and Gopinath K A frugal approach to reduce RCU grace period overhead Proceedings of the Thirteenth EuroSys Conference, (1-15)
- Kashyap S, Min C, Kim K and Kim T A scalable ordering primitive for multicore machines Proceedings of the Thirteenth EuroSys Conference, (1-15)
- Khyzha A, Attiya H, Gotsman A and Rinetzky N Safe privatization in transactional memory Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (233-245)
- Zhan Y and Porter D Versioned Programming Proceedings of the 9th ACM International on Systems and Storage Conference, (1-12)
- Tassarotti J, Dreyer D and Vafeiadis V (2015). Verifying read-copy-update in a logic for weak memory, ACM SIGPLAN Notices, 50:6, (110-120), Online publication date: 7-Aug-2015.
- Tassarotti J, Dreyer D and Vafeiadis V Verifying read-copy-update in a logic for weak memory Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, (110-120)
- Liu Y, Zhou T and Spear M Transactional Acceleration of Concurrent Data Structures Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures, (244-253)
- Tsai C, Zhan Y, Reddy J, Jiao Y, Zhang T and Porter D How to get more value from your file system directory cache Proceedings of the 25th Symposium on Operating Systems Principles, (441-456)
- Turon A, Vafeiadis V and Dreyer D GPS Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, (691-707)
- Tsai C, Arora K, Bandi N, Jain B, Jannen W, John J, Kalodner H, Kulkarni V, Oliveira D and Porter D Cooperation and security isolation of library OSes for multi-process applications Proceedings of the Ninth European Conference on Computer Systems, (1-14)
- Turon A, Vafeiadis V and Dreyer D (2014). GPS, ACM SIGPLAN Notices, 49:10, (691-707), Online publication date: 31-Dec-2015.
- Arbel M and Attiya H Concurrent updates with RCU Proceedings of the 2014 ACM symposium on Principles of distributed computing, (196-205)
- Kuang J, Waddington D and Tian C Towards a scalable microkernel personality for multicore processors Proceedings of the 19th international conference on Parallel Processing, (620-632)
- Gotsman A, Rinetzky N and Yang H Verifying concurrent memory reclamation algorithms with grace Proceedings of the 22nd European conference on Programming Languages and Systems, (249-269)
- Tazaki H, Mancini E, Camara D, Turletti T and Dabbous W MSWIM demo abstract Proceedings of the 11th ACM international symposium on Mobility management and wireless access, (29-32)
- Al Bahra S (2013). Nonblocking Algorithms and Scalable Multicore Programming, Queue, 11:5, (40-64), Online publication date: 1-May-2013.
- Al Bahra S (2013). Nonblocking algorithms and scalable multicore programming, Communications of the ACM, 56:7, (50-61), Online publication date: 1-Jul-2013.
- Desnoyers M and Dagenais M (2012). Lockless multi-core high-throughput buffering scheme for kernel tracing, ACM SIGOPS Operating Systems Review, 46:3, (65-81), Online publication date: 18-Dec-2012.
- Howard P and Walpole J A case for relativistic programming Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability, (33-38)
- Boehm H Position paper Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability, (9-14)
- Clements A, Kaashoek M and Zeldovich N (2012). Scalable address spaces using RCU balanced trees, ACM SIGPLAN Notices, 47:4, (199-210), Online publication date: 1-Jun-2012.
- Boehm H Can seqlocks get along with programming language memory models? Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, (12-20)
- Clements A, Kaashoek M and Zeldovich N (2012). Scalable address spaces using RCU balanced trees, ACM SIGARCH Computer Architecture News, 40:1, (199-210), Online publication date: 18-Apr-2012.
- Clements A, Kaashoek M and Zeldovich N Scalable address spaces using RCU balanced trees Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, (199-210)
- Dalessandro L, Dice D, Scott M, Shavit N and Spear M Transactional mutex locks Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II, (2-13)
- Triplett J, McKenney P and Walpole J (2010). Scalable concurrent hash tables via relativistic programming, ACM SIGOPS Operating Systems Review, 44:3, (102-109), Online publication date: 17-Aug-2010.
- McKenney P, Michael M, Triplett J and Walpole J (2010). Why the grass may not be greener on the other side, ACM SIGOPS Operating Systems Review, 44:3, (93-101), Online publication date: 17-Aug-2010.
- Dalessandro L, Spear M and Scott M (2010). NOrec, ACM SIGPLAN Notices, 45:5, (67-78), Online publication date: 1-May-2010.
- Dalessandro L, Spear M and Scott M NOrec Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (67-78)
- Porter D, Hofmann O, Rossbach C, Benn A and Witchel E Operating System Transactions Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, (161-176)
- Boehm H and Spertus M Garbage collection in the next C++ standard Proceedings of the 2009 international symposium on Memory management, (30-38)
- McKenney P and Walpole J (2008). Introducing technology into the Linux kernel, ACM SIGOPS Operating Systems Review, 42:5, (4-17), Online publication date: 1-Jul-2008.
- McKenney P, Michael M and Walpole J Why the grass may not be greener on the other side Proceedings of the 4th workshop on Programming languages and operating systems, (1-5)
- Boehm H Reordering constraints for pthread-style locks Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, (173-182)
- Hart T, McKenney P and Brown A Making lockless synchronization fast Proceedings of the 20th international conference on Parallel and distributed processing, (21-21)
Index Terms
- Exploiting deferred destruction: an analysis of read-copy-update techniques in operating system kernels
Recommendations
Safety of Deferred Update in Transactional Memory
ICDCS '13: Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing SystemsTransactional memory allows the user to declare sequences of instructions as speculative transactions that can either commit or abort. If a transaction commits, it appears to be executed sequentially, so that the committed transactions constitute a ...
Deferred Runtime Pipelining for contentious multicore software transactions
EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019DRP is a new concurrency control protocol for software transactional memory that achieves high throughput, even for skewed workloads that exhibit high contention. DRP builds on prior works that chop transactions into pieces to expose more concurrency ...
Speculative client execution in deferred update replication
MW4NG '14: Proceedings of the 9th Workshop on Middleware for Next Generation Internet ComputingDeferred Update Replication (DUR) is a powerful replication technique that allows parallelism of clients' execution while a global certification phase checks the validity of the transactional execution against workloads running on remote nodes. The well-...