Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

A performance evaluation of cluster architectures

Published: 01 June 1997 Publication History

Abstract

This paper investigates the performance of shared-memory cluster-based architectures where each cluster is a shared-bus multiprocessor augmented with a protocol processor maintaining cache coherence across clusters. For a given number of processors, sixteen in this study, we evaluate the performance of various cluster configurations. We also consider the impact of adding a remote shared cache in each cluster. We use Mean Value Analysis to estimate the cache miss latencies of various types and the overall execution time. The service demands of shared resources are characterized in detail by examining the sub-requests issued in resolving cache misses. In addition to the architectural system parameters and the service demands on resources, the analytical model needs parameters pertinent to applications. The latter, in particular cache miss profiles, are obtained by trace-driven simulation of three benchmarks.Our results show that without remote caches the performance of cluster-based architectures is mixed. In some configurations, the negative effects of the longer latency of inter-cluster misses and of the contention on the protocol processor are too large to counter-balance the lower contention on the data buses. For two out of the three applications best results are obtained when the system has clusters of size 2 or 4. The cluster-based architectures with remote caches consistently outperform the single bus system for all 3 applications. We also exercise the model with parameters reflecting the current trend in technology making the processor relatively faster than the bus and memory. Under these new conditions, our results show a clear performance advantage for the cluster-based architectures, with or without remote caches, over single bus systems.

References

[1]
A. Agarwal. Limits on interconnection network performance. IEEE Trans. on Parellel and Distributed Systems, 2(4):398-412, April 1991.
[2]
C. S. Anderson. Improving Performance of Bus- Based Multiprocessors. PhD thesis, Dept. of Computer Science, Univ. of Washington, 1995.
[3]
D.H. Bailey. FFT in external or hierarchical memory. J. of Supercomputing, 4(1):23-35, March 1990.
[4]
G. Bell. Multis: A new class of multiprocessor computers. Science, pages 462-467, April 1985.
[5]
L.N. Bhuyan, Q. Yang, and D. P. Agrawal. Performance of multiprocessor interconnection networks. Computer, 22(2):25-37, February 1989.
[6]
G.E. Blelloch et al. A comparison of sorting algorithms for the connection machine CM-2. In Proc. of the Symposium on Parallel Algorithms and Architecture, pages 3-16, 1991.
[7]
M. Galles and E. Williams. Performance optimizations, implementation, and verification of the SGI challenge multiprocessor. In Proc. of the 27th Hawaii International Conference on System Sciences. Vol. I: Architecture, pages 134-43, 1994.
[8]
J. Heinlein, K. Gharachorloo, S. Dresser, and A. Gupta. Integration of message passing and shared memory in the Stanford FLASH multiprocessor. In Proc. of 6th international Conference on Architectural Support for Programming Languages and Operating Systems, pages 38-50, 1994.
[9]
C. Holt et al. The effects of latency, occupancy, and bandwidth in distributed shared memory multiprocessors. Technical report, Dept. of Computer Science, Stanford Univ., 1995. CSL-TR-95-660.
[10]
M. Karlsson and P. Stenstrom. Performance evaluation of a cluster-based multiprocessor built from ATM switches and bus-based multiprocessor servers. In Proc. of 2rid Conf. on High-Performance Computer Architecture, pages 4-13, 1996.
[11]
E. D. Lazowska, J. Zahorjan, G. S. Graham, and K. C. Sevcik. Quantitative System Performance. Prentice-Hall, Inc., 1984.
[12]
D. Lenoski et al. The Standford DASH multiprocessor. IEEE Transactions on Computer, 25(3):63- 79, March 1992.
[13]
T. Lovett and R. Clapp. STING: A CC-NUMA computer system for the commercial marketplace. In Proc. of 23rd International Symposium on Computer Architecture, pages 308-317, 1996.
[14]
B. A. Nayfeh and K. Olukotun. Exploring the design space for a shared-cache multiprocessor. In Proc. of 21st International Symposium on Computer Architecture, pages 166-175, 1994.
[15]
B. A. Nayfeh, K. Olukotun, and J.P. Singh. The impact of shared-cache clustering in small-scale shared-memory multiprocessors. In Proc. of 2nd Conference on High-Performance Computer Architecture, pages 74-84, 1996.
[16]
A. Nowatzyk et al. The S3.mp scalable shared memory multiprocessor. In Proc. of the 1995 International Conference on Parallel Processing, Vol II, pages 1-10, 1995.
[17]
X. Qin and J.-L. Baer. A performance evalution of cluster architectures. Technical report, Dept. of Computer Science and Engineering, Univ. of Washington, 1997. UW-CSE-97-01-02.
[18]
J. P. Singh, A. Gupta, and M. Levoy. Parallel visualization algorithms: performance and architecture implications. IEEE Computer, 27(7):45-55, July 1994.
[19]
J. Torrellas, J. Hennessy, and T. Well. Analysis of critical architectural and program parameters in a hierarchical shared-memory multiprocessor. In Proc. o/ the 1990 A CM SIGMETRICS Conference on Measurement and Modeling o/ Computer Systems, pages 163-72, 1990.
[20]
J. E. Veenstra and R. J. Fowler. MINT: a front end for efficient simulation of shared-memory multiprocessors. In Proc. of the 2nd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 201- 7, 1994.
[21]
M. Vernon, R. Jog, and G. S. Sohi. Performance analysis of hierarchical cache-consistent multiprocessors. In Proc. of the International Seminar on Performance of Distributed and Parallel Systems, pages 111-126, 1988.
[22]
M. K. Vernon, E. D. Lazowska, and J. Zahorjan. An accurate and efficient performance analysis technique for multiprocessor snooping cacheconsistency protocols. In Proc. of 15th International Symposium on Computer Architecture, pages 308-315, 1988.
[23]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proc. o/ 22nd International Symposium on Computer Architecture, pages 24-36, 1995.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review
ACM SIGMETRICS Performance Evaluation Review  Volume 25, Issue 1
June 1997
298 pages
ISSN:0163-5999
DOI:10.1145/258623
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMETRICS '97: Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
    June 1997
    302 pages
    ISBN:0897919092
    DOI:10.1145/258612
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1997
Published in SIGMETRICS Volume 25, Issue 1

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)14
Reflects downloads up to 17 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media