Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Fast synchronization for chip multiprocessors

Published: 01 November 2005 Publication History

Abstract

This paper presents a novel mechanism for barrier synchronization on chip multi-processors (CMPs). By forcing the invalidation of selected I-cache lines, this mechanism starves threads and thus forces their execution to stop. Threads are let free when all have entered the barrier.We evaluated this mechanism using SMTSim and report much better (and most importantly, more flat) performance than lock-based barriers supported by existing microprocessors.

References

[1]
G. Almasi et al. Design and implementation of message-passing services for the Blue Gene/L supercomputer. IBM Journal of Research and Development, 49(2/3):393--406, Mar. 2005.
[2]
S. Amarasinghe. Multicores from the compiler's perspective: A blessing or a curse? Keynote at CGO'05, San Jose, CA. March 05.
[3]
C. J. Beckman and C. D. Polychronopoulos. Fast barrier synchronization hardware. In Proc. Conf. on Supercomputing, pages 180--189, 1990.
[4]
S. Chaudhry, P. Caprioli, S. Yip, and M. Tremblay. High-performance throughput computing. IEEE Micro, 25(3):32--45, May 2005.
[5]
P. Coteus et al. Packaging the Blue Gene/L supercomputer. IBM Journal of Research and Development, 49(2/3):213--248, Mar. 2005.
[6]
D. E. Culler, J. P. Singh, and A. Gupta. Parallel Computer Architecture. Morgan Kaufmann.
[7]
R. Kalla, B. Sinharoy, and J. M. Tendler. IBM Power5 chip: a dual-core multithreaded processor. IEEE Micro, pages 40--47, March-April 2004.
[8]
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded sparc processor. IEEE Micro, 25(2):21--29, Mar. 2005.
[9]
C. E. Leiserson et al. The network architecture of the Connection Machine CM-5. In Proc. of SPAA, pages 272--285, June 1992.
[10]
J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. on Comp. Sys., 9(1):21--65, Feb. 1991.
[11]
B. E. Saglam and V. J. Mooney. System-on-a-chip processor synchronization support in hardware. In Proc. of Conf. on Design, automation and test in Europe, pages 633--641, Munich, Germany, 2001.
[12]
D. M. Tullsen, J. L. Lo, S. J. Eggers, and H. M. Levy. Supporting fine-grained synchronization on a simulataneous multithreading processor. In Proc. Int'l Symp on High-Performance Architecture (HPCA), Jan. 1999.

Cited By

View all
  • (2019)Time-predictable synchronization support with a shared scratchpad memoryMicroprocessors and Microsystems10.1016/j.micpro.2018.09.01464(34-42)Online publication date: Feb-2019
  • (2014)Instruction-based high-efficient synchronization in a many-core Network-on-Chip processor2014 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS.2014.6865604(2193-2196)Online publication date: Jun-2014
  • (2014)Measurement of the latency parameters of the Multi-BSP modelThe Journal of Supercomputing10.1007/s11227-013-1018-467:2(565-584)Online publication date: 1-Feb-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 33, Issue 4
Special issue: dasCMP'05
November 2005
130 pages
ISSN:0163-5964
DOI:10.1145/1105734
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2005
Published in SIGARCH Volume 33, Issue 4

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Time-predictable synchronization support with a shared scratchpad memoryMicroprocessors and Microsystems10.1016/j.micpro.2018.09.01464(34-42)Online publication date: Feb-2019
  • (2014)Instruction-based high-efficient synchronization in a many-core Network-on-Chip processor2014 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS.2014.6865604(2193-2196)Online publication date: Jun-2014
  • (2014)Measurement of the latency parameters of the Multi-BSP modelThe Journal of Supercomputing10.1007/s11227-013-1018-467:2(565-584)Online publication date: 1-Feb-2014
  • (2011)DSBSProceedings of the 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications10.1109/TrustCom.2011.141(1030-1037)Online publication date: 16-Nov-2011
  • (2011)Hardware synchronization for embedded multi-core processors2011 IEEE International Symposium of Circuits and Systems (ISCAS)10.1109/ISCAS.2011.5938126(2557-2560)Online publication date: May-2011
  • (2011)A bridging model for multi-core computingJournal of Computer and System Sciences10.1016/j.jcss.2010.06.01277:1(154-166)Online publication date: 1-Jan-2011
  • (2010)Handling shared variable synchronization in multi-core Network-on-Chips with distributed memory23rd IEEE International SOC Conference10.1109/SOCC.2010.5784680(467-472)Online publication date: Sep-2010
  • (2010)Supporting Efficient Synchronization in Multi-core NoCs Using Dynamic Buffer Allocation TechniqueProceedings of the 2010 IEEE Annual Symposium on VLSI10.1109/ISVLSI.2010.16(462-463)Online publication date: 5-Jul-2010
  • (2007)Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platformsProceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems10.1145/1289881.1289908(145-149)Online publication date: 30-Sep-2007

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media