Article

Free access

Synchronizing large VLSI processor arrays

Authors:

Allan L. Fisher,

H. T. KungAuthors Info & Claims

ISCA '83: Proceedings of the 10th annual international symposium on Computer architecture

Pages 54 - 58

https://doi.org/10.1145/800046.801638

Published: 13 June 1983 Publication History

PDF eReader

Abstract

Highly parallel VLSI computing structures consist of many processing elements operating simultaneously. In order for such processing elements to communicate among themselves, some provision must be made for synchronization of data transfer. The simplest means of synchronization is the use of a global clock. Unfortunately, large clocked systems can be difficult to implement because of the inevitable problem of clock skews and delays, which can be especially acute in VLSI systems as feature sizes shrink. For the near term, good engineering and technology improvements can be expected to maintain the feasibility of clocking in such systems; however, clock distribution problems crop up in any technology as systems grow. An alternative means of enforcing necessary synchronization is the use of self-timed, asynchronous schemes, at the cost of increased design complexity and hardware cost. Realizing that different circumstances call for different synchronization methods, this paper provides a spectrum of synchronization models; based on the assumptions made for each model, theoretical lower bounds on clock skew are derived, and appropriate or best-possible synchronization schemes for large processor arrays are proposed.

One set of models is based on assumptions that allow the use of a pipelined clocking scheme, where more than one clock event is propagated at a time. In this case, it is shown that even assuming that physical variations along clock lines can produce skews between wires of the same length, any one-dimensional processor array can be correctly synchronized by a global pipelined clock while enjoying desirable properties such as modularity, expandability and robustness. This result cannot be extended to two-dimensional arrays, however—the paper shows that under this assumption, it is impossible to run a clock such that the maximum clock skew between two communicating cells will be bounded by a constant as systems grow. For such cases or where pipelined clocking is unworkable, a synchronization scheme incorporating both clocked and “asynchronous” elements is proposed.

References

[1]

Aleliunas, R.and Rosenberg, A.L., "On Embedding Rectangular Grids in Square Grids," IEEE Trans. Computers, Vol. C-31, No. 9, September 1982, pp. 907-913.

Google Scholar

[2]

Bentley, J.L. and Kung, H.T., "A Tree Machine for Searching Problems," Proceedings of 1979 International Conference on Parallel Processing, IEEE, August 1979, pp. 257-266.

Google Scholar

[3]

Franklin, M.A. and Wann. D.F., "Asynchronous and Clocked Control Structures for VLSI Interconnection Networks," Proceedings of the 9th International Symposium on Computer Architecture, April 1982, pp. 50-59.

Digital Library

Google Scholar

[4]

Kung, H.T., "Why Systolic Architectures?," Computer Magazine, Vol. 15, No. 1, January 1982, pp. 37-46.

Digital Library

Google Scholar

[5]

Kung, H.T. and Leiserson, C.E., "Systolic Arrays (for VLSI)," Sparse Matrix Proceedings 1978. Duff, I.S. and Stewart, G.W., eds., Society for Industrial and Applied Mathematics, 1979, pp. 256-282, A slightly different version appears in Introduction to VLSI Systems by C.A. Mead and L.A. Conway, Addison-Wesley, 1980, Section 8.3.

Google Scholar

[6]

Kung, S.Y. and Gal-Ezer, R.J., "Synchronous vs. Asynchronous Computation in VLSI Array Processors," Proceedings of the SPIE, vol. 341, May 1982, pp. 53-65.

Crossref

Google Scholar

[7]

Lipton, R.J., Eisenstat, S.C. and DeMillo, R.A., "Space and Time Hierarchies for Classes of Control Structures and Data Structures," Journal of the ACM, Vol. 23, No. 4, October 1976, pp. 720-732.

Digital Library

Google Scholar

[8]

Mead, C.A. and Rem, M., "Cost and Performance of VLSI Computing Structures," IEEE Journal of Solid State Circuits, Vol. SC-14, No. 2, April 1979, pp. 455-462.

Crossref

Google Scholar

[9]

Paterson, M.S., Ruzzo, W.L. and Snyder. L., "Bounds on Minimax Edge Length for Complete Binary Trees," Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, ACM SIGACT, May 1981, pp. 293-299.

Digital Library

Google Scholar

[10]

Seitz, C.L., "Self-Timed VLSI Systems," Proceedings of Conference on Very Large Scale Integration: Architecture, Design, Fabrication, California Institute of Technology, January 1979, pp. 345-355.

Google Scholar

[11]

Thompson, C.D.,A Complexity Theory for VLSI, PhD dissertation, Carnegie-Mellon University, Computer Science Department, 1980.

Digital Library

Google Scholar

Cited By

View all

Yu XYung M(2005)Scheduling task-tree with additive scales on parallel/distributed machinesComputing and Combinatorics10.1007/BFb0030883(607-616)Online publication date: 20-Jun-2005
https://doi.org/10.1007/BFb0030883
Myoupo JFabret A(2005)Designing modular linear systolic arrays using dependence graph regular partitionsParallel Processing: CONPAR 92—VAPP V10.1007/3-540-55895-0_421(265-270)Online publication date: 29-May-2005
https://doi.org/10.1007/3-540-55895-0_421
Myoupo J(2005)A fully-pipelined solutions constructor for dynamic programming problemsAdvances in Computing and Information — ICCI '9110.1007/3-540-54029-6_191(421-430)Online publication date: 1-Jun-2005
https://doi.org/10.1007/3-540-54029-6_191
Show More Cited By

Index Terms

Synchronizing large VLSI processor arrays

Recommendations

Synchronizing Large VLSI Processor Arrays

Highly parallel VLSI computing structures consist of many processing elements operating simultaneously. In order for such processing elements to communicate among themselves, some provision must be made for synchronization of data transfer. The simplest ...
Synchronizing large VLSI processor arrays

Highly parallel VLSI computing structures consist of many processing elements operating simultaneously. In order for such processing elements to communicate among themselves, some provision must be made for synchronization of data transfer. The simplest ...
A Hybrid Approach for Synchronizing Clocks in Distributed Systems
Cloud Computing – CLOUD 2019
Abstract
The art of synchronizing clocks across a wide area network has got a new dimension when it comes to the reality of achieving the demand for high-accuracy synchronization; even for local or small computing systems. Before implementing any clock ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ISCA '83: Proceedings of the 10th annual international symposium on Computer architecture

June 1983

424 pages

ISBN:0897911016

DOI:10.1145/800046

ACM SIGARCH Computer Architecture News Volume 11, Issue 3
June 1983
413 pages
ISSN:0163-5964
DOI:10.1145/1067651
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 1983

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
284
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)13

Reflects downloads up to 28 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Yu XYung M(2005)Scheduling task-tree with additive scales on parallel/distributed machinesComputing and Combinatorics10.1007/BFb0030883(607-616)Online publication date: 20-Jun-2005
https://doi.org/10.1007/BFb0030883
Myoupo JFabret A(2005)Designing modular linear systolic arrays using dependence graph regular partitionsParallel Processing: CONPAR 92—VAPP V10.1007/3-540-55895-0_421(265-270)Online publication date: 29-May-2005
https://doi.org/10.1007/3-540-55895-0_421
Myoupo J(2005)A fully-pipelined solutions constructor for dynamic programming problemsAdvances in Computing and Information — ICCI '9110.1007/3-540-54029-6_191(421-430)Online publication date: 1-Jun-2005
https://doi.org/10.1007/3-540-54029-6_191
Fridman JManolakos E(2001)Distributed Memory Parallel Architecture Based on Modular Linear Arrays for 2-D Separable Transforms ComputationJournal of VLSI Signal Processing Systems10.5555/509243.281323528:3(187-203)Online publication date: 1-Jul-2001
https://dl.acm.org/doi/10.5555/509243.2813235
Mila de la Roca BRandon E(1998)Design of a parallel neural processorProceedings of the 1998 Second IEEE International Caracas Conference on Devices, Circuits and Systems. ICCDCS 98. On the 70th Anniversary of the MOSFET and 50th of the BJT. (Cat. No.98TH8350)10.1109/ICCDCS.1998.705817(109-112)Online publication date: 1998
https://doi.org/10.1109/ICCDCS.1998.705817
Myoupo JFabret A(1996)A Modular Systolic Linearization of the Warshall-Floyd AlgorithmIEEE Transactions on Parallel and Distributed Systems10.1109/71.5037697:5(449-455)Online publication date: 1-May-1996
https://dl.acm.org/doi/10.1109/71.503769
Rappoport (1994)The red-blue algorithm for dynamic programming on linear arraysProceedings of the 1994 6th IEEE Symposium on Parallel and Distributed Processing10.1109/SPDP.1994.346142(484-488)Online publication date: 26-Oct-1994
https://dl.acm.org/doi/10.1109/SPDP.1994.346142
Levison JKuroda I(1993)An asynchronous communication protocol for internode connections in a scalable processor arrayProceedings of IEEE Workshop on VLSI Signal Processing10.1109/VLSISP.1993.404456(489-497)Online publication date: 1993
https://doi.org/10.1109/VLSISP.1993.404456
Myoupo J(1993)Mapping dynamic programming onto modular linear systolic arraysDistributed Computing10.1007/BF022427056:3(165-179)Online publication date: 1-Apr-1993
https://dl.acm.org/doi/10.1007/BF02242705
Yu XGhosal DSnyder L(1992)Optimal dynamic scheduling of task tree on constant-dimensional architecturesProceedings of the fourth annual ACM symposium on Parallel algorithms and architectures10.1145/140901.140916(138-146)Online publication date: 1-Jun-1992
https://dl.acm.org/doi/10.1145/140901.140916
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Synchronizing Large VLSI Processor Arrays

Synchronizing large VLSI processor arrays

A Hybrid Approach for Synchronizing Clocks in Distributed Systems