Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/563998.564036acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Speculative lock elision: enabling highly concurrent multithreaded execution

Published: 01 December 2001 Publication History

Abstract

Serialization of threads due to critical sections is a fundamental bottleneck to achieving high performance in multithreaded programs. Dynamically, such serialization may be unnecessary because these critical sections could have safely executed concurrently without locks. Current processors cannot fully exploit such parallelism because they do not have mechanisms to dynamically detect such false inter-thread dependences.We propose Speculative Lock Elision (SLE), a novel micro-architectural technique to remove dynamically unnecessary lock-induced serialization and enable highly concurrent multithreaded execution. The key insight is that locks do not always have to be acquired for a correct execution. Synchronization instructions are predicted as being unnecessary and elided. This allows multiple threads to concurrently execute critical sections protected by the same lock. Misspeculation due to inter-thread data conflicts is detected using existing cache mechanisms and rollback is used for recovery. Successful speculative elision is validated and committed without acquiring the lock.SLE can be implemented entirely in microarchitecture without instruction set support and without system-level modifications, is transparent to programmers, and requires only trivial additional hardware support. SLE can provide programmers a fast path to writing correct high-performance multithreaded programs.

References

[1]
J. Allemany and E. W. Felten. Performance issues in non-blocking synchronization on shared-memory multiprocessors. In Proceedings of the 11th ACM Symposium on Principles of Distributed Computing, pages 125-134, August 1992.
[2]
L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowaczyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000.
[3]
B. N. Bershad. Practical Considerations for Lock-Free Concurrent Objects. Technical Report CMU-CS-91-183, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, September 1991.
[4]
D. C. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. TR #1342, Computer Sciences Department, University of Wisconsin, Madison, June 1997.
[5]
M. J. Carey, D. J. DeWitt, M. J. Franklin, N. E. Hall, M. L. McAuliffe, J. F. Naughton, D. T. Schuh, M. H. Solomon, C. K. Tan, O. G. Tsatalos, S. J. White, and M. J. Zwilling. Shoring Up Persistent Applications. In Proceedings of the 1994 ACM SIGMOD International Conference on the Management of Data, pages 383-394, May 1994.
[6]
A. Charlesworth, A. Phelps, R. Williams, and G. Gilbert. Gigaplane-XB: Extending the Ultra Enterprise Family. In Proceedings of the Symposium on High Performance Interconnects V, pages 97-112, August 1997.
[7]
S-E. Choi and E. C. Lewis. A Study of Common Pitfalls in Simple Multi-Threaded Programs. In Proceedings of the Thirty-First ACM SIGCSE Technical Symposium on Computer Science Education, March 2000.
[8]
Alpha Architecture Handbook Version 4, October 1998.
[9]
Intel Corporation. Hyper-Threading Technology. http://developer.intel.com/technology/hyperthread/, 2001.
[10]
International Business Machines Corporation. The PowerPC Architecture: A Specification for a New Family of RISC Processors. Morgan Kaufman, San Francisco, CA, 1998.
[11]
M. Franklin and G. S. Sohi. A Hardware Mechanism for Dynamic Reordering of Memory References. In IEEE Transactions on Computers, 45(5), May 1996.
[12]
K. Gharachorloo, A. Gupta, and J. Hennessy. Two Techniques to Enhance the Performance of Memory Consistency Models. In Proceedings of the 1991 International Conference on Parallel Processing, pages 355-364, August 1991.
[13]
C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC + ILP = RC? In Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 162-171, May 1999.
[14]
S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative Versioning Cache. In Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, pages 195-205, February 1998.
[15]
M. Herlihy. A methodology for implementing highly concurrent data objects, In ACM Transactions on Programming Languages and Systems 15, 5, 745-770, 1993.
[16]
M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-Free Data Structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 289-300, May 1993.
[17]
M. D. Hill. Multiprocessors Should Support Simple Memory Consistency Models. IEEE Computer, 1998.
[18]
Intel Corporation. Pentium Pro Family Developer's Manual, Volume 3: Operating System Writer's Manual, January 1996.
[19]
E. H. Jensen, G. W. Hagensen, and J. M. Broughton. A New Approach to Exclusive Data Access in Shared Memory Multiprocessors. Technical Report UCRL-97663, Lawrence Livermore National Laboratory, Livermore, CA, November 1987.
[20]
A. Käigi, D. Burger, and J. R. Goodman. Efficient Synchronization: Let Them Eat QOLB. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 170-180, June 1997.
[21]
J. Kahle. A DuaI-CPU Processor Chip. In Proceedings of the 1999 International Microprocessor Forum, October 1999.
[22]
T. S. Kawaf, D. J. Shakshober, and D. C. Stanley. Performance Analysis Using Very Large Memory on the 64-bit AlphaServer System. Digital Technical Journal, 8(3), 1996.
[23]
H. T. Kung and J. T. Robinson. On Optimistic methods of Concurrency Control. In ACM Transactions on Database Systems, Vol. 6, No. 2, June 1981, pages 213-226., 1981.
[24]
L. Lamport. Concurrent reading and writing. Communications of the ACM, 20(11), 1977.
[25]
J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 241-251, June 1997.
[26]
D. D. Lee and R. H. Katz. Using Cache Mechanisms to Exploit Nonrefreshing DRAM's for On-Chip Memories. IEEE Journal of Solid-State Circuits, 26(4), April 1991.
[27]
K. M. Lepak and M. H. Lipasti. On the Value Locality of Store Instructions. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 182-191, June 2000.
[28]
C. Mohan. Less Optimism About Optimistic Concurrency Control. In Proceedings of the Second International Workshop on Research Issues on Data Engineering: Transaction and Query Processing, pages 199-204, February 1992.
[29]
C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks using Write-Ahead Logging. ACM Transactions on Database Systems, 17(1), March 1992.
[30]
R. Rajwar, A. Kägi, and J. R. Goodman. Improving the Throughput of Synchronization by Insertion of Delays. In Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, pages 168-179, January 2000.
[31]
P. Ranganathan, V. S. Pai, and S. V. Adve. Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap between Memory Consistency Models. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures, pages 199-210, June 1997.
[32]
L. Rudolph and Z. Segall. Dynamic Decentralized Cache Schemes for MIMD Parallel Processors. In Proceedings of the 11th Annual International Symposium on Computer Architecture, pages 340-347, June 1984.
[33]
Silicon Graphics Inc., Mountain View, CA. MIPS R10000 Microprocessor User's Manual Version 2.0, 1996.
[34]
J. P. Singh, W. D. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared Memory. Computer Architecture News, 20(1):5-44, March 1992.
[35]
J. E. Smith and A. R. Pleszkun. Implementation of Precise Interrupts in Pipelined Processors. In Proceedings of the 12th Annual International Symposium on Computer Architecture, pages 36-44, June 1985.
[36]
D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. Fast Checkpoint/Recovery to Support Kilo-Instruction Speculation and Hardware Fault Tolerance. TR #1420, Computer Sciences Department, University of Wisconsin, Madison, October 2000.
[37]
J. M. Stone, H. S. Stone, R Heidelberger, and J. Turek. Multiple Reservations and the Oklahoma Update. IEEE Parallel & Distributed Technology, 1(4):58-71, November 1993.
[38]
S. Storino and J. Borkenhagen. A Multi-Threaded 64-bit PowerPC Commercial RISC Processor design. In Proceedings of the 11th Annual International Symposium on High-Performance Chips, August 1999.
[39]
A. Thomasian. Concurrency Control: Methods, Performance, and Analysis. ACM Computing Surveys, 30(1), March 1998.
[40]
S. C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 24-36, June 1995.

Cited By

View all
  • (2023)A Prediction System ServiceProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575714(48-60)Online publication date: 27-Jan-2023
  • (2022)Free atomicsProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527385(14-26)Online publication date: 18-Jun-2022
  • (2021)Beyond MPIACM SIGMOD Record10.1145/3456859.345686249:4(12-17)Online publication date: 10-Mar-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
December 2001
355 pages
ISBN:0769513697

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 December 2001

Check for updates

Qualifiers

  • Article

Conference

MICRO-34
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Prediction System ServiceProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575714(48-60)Online publication date: 27-Jan-2023
  • (2022)Free atomicsProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527385(14-26)Online publication date: 18-Jun-2022
  • (2021)Beyond MPIACM SIGMOD Record10.1145/3456859.345686249:4(12-17)Online publication date: 10-Mar-2021
  • (2020)Crafty: efficient, HTM-compatible persistent transactionsProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3385991(59-74)Online publication date: 11-Jun-2020
  • (2020)A parallel two-stage genetic algorithm for route planningProceedings of the 2020 Genetic and Evolutionary Computation Conference Companion10.1145/3377929.3398116(1739-1746)Online publication date: 8-Jul-2020
  • (2020)Memory Tagging: Minimalist Synchronization for Scalable Concurrent Data StructuresProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400213(37-49)Online publication date: 6-Jul-2020
  • (2019)BRAVOProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358835(315-328)Online publication date: 10-Jul-2019
  • (2019)Improving a Genetic Algorithm for Route Planning Using Parallelism with Speculative ExecutionPractice and Experience in Advanced Research Computing 2019: Rise of the Machines (learning)10.1145/3332186.3333251(1-5)Online publication date: 28-Jul-2019
  • (2019)Simplifying Transactional Memory Support in C++ACM Transactions on Architecture and Code Optimization10.1145/332879616:3(1-24)Online publication date: 25-Jul-2019
  • (2019)CoNDAProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322266(629-642)Online publication date: 22-Jun-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media