Article

Free access

A performance evaluation of cluster architectures

Authors:

Jean-Loup BaerAuthors Info & Claims

SIGMETRICS '97: Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

Pages 237 - 247

https://doi.org/10.1145/258612.258692

Published: 01 June 1997 Publication History

Abstract

This paper investigates the performance of shared-memory cluster-based architectures where each cluster is a shared-bus multiprocessor augmented with a protocol processor maintaining cache coherence across clusters. For a given number of processors, sixteen in this study, we evaluate the performance of various cluster configurations. We also consider the impact of adding a remote shared cache in each cluster. We use Mean Value Analysis to estimate the cache miss latencies of various types and the overall execution time. The service demands of shared resources are characterized in detail by examining the sub-requests issued in resolving cache misses. In addition to the architectural system parameters and the service demands on resources, the analytical model needs parameters pertinent to applications. The latter, in particular cache miss profiles, are obtained by trace-driven simulation of three benchmarks.Our results show that without remote caches the performance of cluster-based architectures is mixed. In some configurations, the negative effects of the longer latency of inter-cluster misses and of the contention on the protocol processor are too large to counter-balance the lower contention on the data buses. For two out of the three applications best results are obtained when the system has clusters of size 2 or 4. The cluster-based architectures with remote caches consistently outperform the single bus system for all 3 applications. We also exercise the model with parameters reflecting the current trend in technology making the processor relatively faster than the bus and memory. Under these new conditions, our results show a clear performance advantage for the cluster-based architectures, with or without remote caches, over single bus systems.

References

[1]

A. Agarwal. Limits on interconnection network performance. IEEE Trans. on Parellel and Distributed Systems, 2(4):398-412, April 1991.

Digital Library

[2]

C. S. Anderson. Improving Performance of Bus- Based Multiprocessors. PhD thesis, Dept. of Computer Science, Univ. of Washington, 1995.

Digital Library

[3]

D.H. Bailey. FFT in external or hierarchical memory. J. of Supercomputing, 4(1):23-35, March 1990.

Digital Library

[4]

G. Bell. Multis: A new class of multiprocessor computers. Science, pages 462-467, April 1985.

[5]

L.N. Bhuyan, Q. Yang, and D. P. Agrawal. Performance of multiprocessor interconnection networks. Computer, 22(2):25-37, February 1989.

Digital Library

[6]

G.E. Blelloch et al. A comparison of sorting algorithms for the connection machine CM-2. In Proc. of the Symposium on Parallel Algorithms and Architecture, pages 3-16, 1991.

Digital Library

[7]

M. Galles and E. Williams. Performance optimizations, implementation, and verification of the SGI challenge multiprocessor. In Proc. of the 27th Hawaii International Conference on System Sciences. Vol. I: Architecture, pages 134-43, 1994.

[8]

J. Heinlein, K. Gharachorloo, S. Dresser, and A. Gupta. Integration of message passing and shared memory in the Stanford FLASH multiprocessor. In Proc. of 6th international Conference on Architectural Support for Programming Languages and Operating Systems, pages 38-50, 1994.

Digital Library

[9]

C. Holt et al. The effects of latency, occupancy, and bandwidth in distributed shared memory multiprocessors. Technical report, Dept. of Computer Science, Stanford Univ., 1995. CSL-TR-95-660.

Digital Library

[10]

M. Karlsson and P. Stenstrom. Performance evaluation of a cluster-based multiprocessor built from ATM switches and bus-based multiprocessor servers. In Proc. of 2rid Conf. on High-Performance Computer Architecture, pages 4-13, 1996.

Digital Library

[11]

E. D. Lazowska, J. Zahorjan, G. S. Graham, and K. C. Sevcik. Quantitative System Performance. Prentice-Hall, Inc., 1984.

Digital Library

[12]

D. Lenoski et al. The Standford DASH multiprocessor. IEEE Transactions on Computer, 25(3):63- 79, March 1992.

Digital Library

[13]

T. Lovett and R. Clapp. STING: A CC-NUMA computer system for the commercial marketplace. In Proc. of 23rd International Symposium on Computer Architecture, pages 308-317, 1996.

Digital Library

[14]

B. A. Nayfeh and K. Olukotun. Exploring the design space for a shared-cache multiprocessor. In Proc. of 21st International Symposium on Computer Architecture, pages 166-175, 1994.

Digital Library

[15]

B. A. Nayfeh, K. Olukotun, and J.P. Singh. The impact of shared-cache clustering in small-scale shared-memory multiprocessors. In Proc. of 2nd Conference on High-Performance Computer Architecture, pages 74-84, 1996.

Digital Library

[16]

A. Nowatzyk et al. The S3.mp scalable shared memory multiprocessor. In Proc. of the 1995 International Conference on Parallel Processing, Vol II, pages 1-10, 1995.

[17]

X. Qin and J.-L. Baer. A performance evalution of cluster architectures. Technical report, Dept. of Computer Science and Engineering, Univ. of Washington, 1997. UW-CSE-97-01-02.

[18]

J. P. Singh, A. Gupta, and M. Levoy. Parallel visualization algorithms: performance and architecture implications. IEEE Computer, 27(7):45-55, July 1994.

Digital Library

[19]

J. Torrellas, J. Hennessy, and T. Well. Analysis of critical architectural and program parameters in a hierarchical shared-memory multiprocessor. In Proc. o/ the 1990 A CM SIGMETRICS Conference on Measurement and Modeling o/ Computer Systems, pages 163-72, 1990.

Digital Library

[20]

J. E. Veenstra and R. J. Fowler. MINT: a front end for efficient simulation of shared-memory multiprocessors. In Proc. of the 2nd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 201- 7, 1994.

Digital Library

[21]

M. Vernon, R. Jog, and G. S. Sohi. Performance analysis of hierarchical cache-consistent multiprocessors. In Proc. of the International Seminar on Performance of Distributed and Parallel Systems, pages 111-126, 1988.

[22]

M. K. Vernon, E. D. Lazowska, and J. Zahorjan. An accurate and efficient performance analysis technique for multiprocessor snooping cacheconsistency protocols. In Proc. of 15th International Symposium on Computer Architecture, pages 308-315, 1988.

Digital Library

[23]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proc. o/ 22nd International Symposium on Computer Architecture, pages 24-36, 1995.

Digital Library

Cited By

Aguilar-Saborit JMuntés-Mulero VZuzarte CLarriba-Pey J(2007)Star join revisitedData & Knowledge Engineering10.1016/j.datak.2007.06.00863:3(997-1015)Online publication date: 1-Dec-2007
https://dl.acm.org/doi/10.1016/j.datak.2007.06.008
Cremonesi PTurrin R(2006)Performance models for hierarchical grid architecturesProceedings of the 7th IEEE/ACM International Conference on Grid Computing10.1109/ICGRID.2006.311026(278-285)Online publication date: 28-Sep-2006
https://dl.acm.org/doi/10.1109/ICGRID.2006.311026
Cremonesi PGennaro C(2002)Integrated Performance Models for SPMD Applications and MIMD ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2002.115826813:12(1320-1332)Online publication date: 1-Dec-2002
https://dl.acm.org/doi/10.1109/TPDS.2002.1158268
Show More Cited By

Index Terms

A performance evaluation of cluster architectures

Recommendations

A performance evaluation of cluster architectures

This paper investigates the performance of shared-memory cluster-based architectures where each cluster is a shared-bus multiprocessor augmented with a protocol processor maintaining cache coherence across clusters. For a given number of processors, ...
Comparative performance evaluation of cache-coherent NUMA and COMA architectures
ISCA '92: Proceedings of the 19th annual international symposium on Computer architecture

Two interesting variations of large-scale shared-memory machines that have recently emerged are cache-coherent non-uniform-memory-access machines (CC-NUMA) and cache-only memory architectures (COMA). They both have distributed main memory and use ...
Comparative performance evaluation of cache-coherent NUMA and COMA architectures
Special Issue: Proceedings of the 19th annual international symposium on Computer architecture (ISCA '92)

Two interesting variations of large-scale shared-memory machines that have recently emerged are cache-coherent non-uniform-memory-access machines (CC-NUMA) and cache-only memory architectures (COMA). They both have distributed main memory and use ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMETRICS '97: Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

June 1997

302 pages

ISBN:0897919092

DOI:10.1145/258612

Chairmen:
John Zahorjan
Univ. of Washington, Seattle
,
Albert Greenberg
AT&T Research
,
Editor:
Scott Leutenegger
Univ. of Denver, Denver, CO

ACM SIGMETRICS Performance Evaluation Review Volume 25, Issue 1
June 1997
298 pages
ISSN:0163-5999
DOI:10.1145/258623
Chairmen:
John Zahorjan
Univ. of Washington, Seattle
,
Albert Greenberg
AT&T Research
,
Editor:
Scott Leutenegger
Univ. of Denver, Denver, CO
Issue’s Table of Contents

Copyright © 1997 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMETRICS: ACM Special Interest Group on Measurement and Evaluation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1997

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SIGMETRICS97

Sponsor:

SIGMETRICS

SIGMETRICS97: ACM SIGMETRICS Conference on Measurement & Modeling of Computer Systems

June 15 - 18, 1997

Washington, Seattle, USA

Acceptance Rates

SIGMETRICS '97 Paper Acceptance Rate 25 of 130 submissions, 19%;

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
634
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)14

Reflects downloads up to 17 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Aguilar-Saborit JMuntés-Mulero VZuzarte CLarriba-Pey J(2007)Star join revisitedData & Knowledge Engineering10.1016/j.datak.2007.06.00863:3(997-1015)Online publication date: 1-Dec-2007
https://dl.acm.org/doi/10.1016/j.datak.2007.06.008
Cremonesi PTurrin R(2006)Performance models for hierarchical grid architecturesProceedings of the 7th IEEE/ACM International Conference on Grid Computing10.1109/ICGRID.2006.311026(278-285)Online publication date: 28-Sep-2006
https://dl.acm.org/doi/10.1109/ICGRID.2006.311026
Cremonesi PGennaro C(2002)Integrated Performance Models for SPMD Applications and MIMD ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2002.115826813:12(1320-1332)Online publication date: 1-Dec-2002
https://dl.acm.org/doi/10.1109/TPDS.2002.1158268
Cremonesi PGennaro C(2002)Integrated Performance Models for SPMD Applications and MIMD ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2002.101986213:7(745-757)Online publication date: 1-Jul-2002
https://dl.acm.org/doi/10.1109/TPDS.2002.1019862
Eager DSorin DVernon M(2000)AMVA techniques for high service time variabilityACM SIGMETRICS Performance Evaluation Review10.1145/345063.33941828:1(217-228)Online publication date: 1-Jun-2000
https://dl.acm.org/doi/10.1145/345063.339418
Eager DSorin DVernon MBrandwajn AKurose JNain P(2000)AMVA techniques for high service time variabilityProceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems10.1145/339331.339418(217-228)Online publication date: 17-Jun-2000
https://dl.acm.org/doi/10.1145/339331.339418
Sung Woo Chung Seong Tae Jhang Chu Shik Jhon (1998)PANDA: ring-based multiprocessor system using new snooping protocolProceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)10.1109/ICPADS.1998.741012(10-17)Online publication date: 1998
https://doi.org/10.1109/ICPADS.1998.741012

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents