Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/258612.258692acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
Article
Free access

A performance evaluation of cluster architectures

Published: 01 June 1997 Publication History

Abstract

This paper investigates the performance of shared-memory cluster-based architectures where each cluster is a shared-bus multiprocessor augmented with a protocol processor maintaining cache coherence across clusters. For a given number of processors, sixteen in this study, we evaluate the performance of various cluster configurations. We also consider the impact of adding a remote shared cache in each cluster. We use Mean Value Analysis to estimate the cache miss latencies of various types and the overall execution time. The service demands of shared resources are characterized in detail by examining the sub-requests issued in resolving cache misses. In addition to the architectural system parameters and the service demands on resources, the analytical model needs parameters pertinent to applications. The latter, in particular cache miss profiles, are obtained by trace-driven simulation of three benchmarks.Our results show that without remote caches the performance of cluster-based architectures is mixed. In some configurations, the negative effects of the longer latency of inter-cluster misses and of the contention on the protocol processor are too large to counter-balance the lower contention on the data buses. For two out of the three applications best results are obtained when the system has clusters of size 2 or 4. The cluster-based architectures with remote caches consistently outperform the single bus system for all 3 applications. We also exercise the model with parameters reflecting the current trend in technology making the processor relatively faster than the bus and memory. Under these new conditions, our results show a clear performance advantage for the cluster-based architectures, with or without remote caches, over single bus systems.

References

[1]
A. Agarwal. Limits on interconnection network performance. IEEE Trans. on Parellel and Distributed Systems, 2(4):398-412, April 1991.
[2]
C. S. Anderson. Improving Performance of Bus- Based Multiprocessors. PhD thesis, Dept. of Computer Science, Univ. of Washington, 1995.
[3]
D.H. Bailey. FFT in external or hierarchical memory. J. of Supercomputing, 4(1):23-35, March 1990.
[4]
G. Bell. Multis: A new class of multiprocessor computers. Science, pages 462-467, April 1985.
[5]
L.N. Bhuyan, Q. Yang, and D. P. Agrawal. Performance of multiprocessor interconnection networks. Computer, 22(2):25-37, February 1989.
[6]
G.E. Blelloch et al. A comparison of sorting algorithms for the connection machine CM-2. In Proc. of the Symposium on Parallel Algorithms and Architecture, pages 3-16, 1991.
[7]
M. Galles and E. Williams. Performance optimizations, implementation, and verification of the SGI challenge multiprocessor. In Proc. of the 27th Hawaii International Conference on System Sciences. Vol. I: Architecture, pages 134-43, 1994.
[8]
J. Heinlein, K. Gharachorloo, S. Dresser, and A. Gupta. Integration of message passing and shared memory in the Stanford FLASH multiprocessor. In Proc. of 6th international Conference on Architectural Support for Programming Languages and Operating Systems, pages 38-50, 1994.
[9]
C. Holt et al. The effects of latency, occupancy, and bandwidth in distributed shared memory multiprocessors. Technical report, Dept. of Computer Science, Stanford Univ., 1995. CSL-TR-95-660.
[10]
M. Karlsson and P. Stenstrom. Performance evaluation of a cluster-based multiprocessor built from ATM switches and bus-based multiprocessor servers. In Proc. of 2rid Conf. on High-Performance Computer Architecture, pages 4-13, 1996.
[11]
E. D. Lazowska, J. Zahorjan, G. S. Graham, and K. C. Sevcik. Quantitative System Performance. Prentice-Hall, Inc., 1984.
[12]
D. Lenoski et al. The Standford DASH multiprocessor. IEEE Transactions on Computer, 25(3):63- 79, March 1992.
[13]
T. Lovett and R. Clapp. STING: A CC-NUMA computer system for the commercial marketplace. In Proc. of 23rd International Symposium on Computer Architecture, pages 308-317, 1996.
[14]
B. A. Nayfeh and K. Olukotun. Exploring the design space for a shared-cache multiprocessor. In Proc. of 21st International Symposium on Computer Architecture, pages 166-175, 1994.
[15]
B. A. Nayfeh, K. Olukotun, and J.P. Singh. The impact of shared-cache clustering in small-scale shared-memory multiprocessors. In Proc. of 2nd Conference on High-Performance Computer Architecture, pages 74-84, 1996.
[16]
A. Nowatzyk et al. The S3.mp scalable shared memory multiprocessor. In Proc. of the 1995 International Conference on Parallel Processing, Vol II, pages 1-10, 1995.
[17]
X. Qin and J.-L. Baer. A performance evalution of cluster architectures. Technical report, Dept. of Computer Science and Engineering, Univ. of Washington, 1997. UW-CSE-97-01-02.
[18]
J. P. Singh, A. Gupta, and M. Levoy. Parallel visualization algorithms: performance and architecture implications. IEEE Computer, 27(7):45-55, July 1994.
[19]
J. Torrellas, J. Hennessy, and T. Well. Analysis of critical architectural and program parameters in a hierarchical shared-memory multiprocessor. In Proc. o/ the 1990 A CM SIGMETRICS Conference on Measurement and Modeling o/ Computer Systems, pages 163-72, 1990.
[20]
J. E. Veenstra and R. J. Fowler. MINT: a front end for efficient simulation of shared-memory multiprocessors. In Proc. of the 2nd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 201- 7, 1994.
[21]
M. Vernon, R. Jog, and G. S. Sohi. Performance analysis of hierarchical cache-consistent multiprocessors. In Proc. of the International Seminar on Performance of Distributed and Parallel Systems, pages 111-126, 1988.
[22]
M. K. Vernon, E. D. Lazowska, and J. Zahorjan. An accurate and efficient performance analysis technique for multiprocessor snooping cacheconsistency protocols. In Proc. of 15th International Symposium on Computer Architecture, pages 308-315, 1988.
[23]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proc. o/ 22nd International Symposium on Computer Architecture, pages 24-36, 1995.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '97: Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
June 1997
302 pages
ISBN:0897919092
DOI:10.1145/258612
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1997

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMETRICS97
Sponsor:

Acceptance Rates

SIGMETRICS '97 Paper Acceptance Rate 25 of 130 submissions, 19%;
Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)14
Reflects downloads up to 17 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2007)Star join revisitedData & Knowledge Engineering10.1016/j.datak.2007.06.00863:3(997-1015)Online publication date: 1-Dec-2007
  • (2006)Performance models for hierarchical grid architecturesProceedings of the 7th IEEE/ACM International Conference on Grid Computing10.1109/ICGRID.2006.311026(278-285)Online publication date: 28-Sep-2006
  • (2002)Integrated Performance Models for SPMD Applications and MIMD ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2002.115826813:12(1320-1332)Online publication date: 1-Dec-2002
  • (2002)Integrated Performance Models for SPMD Applications and MIMD ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2002.101986213:7(745-757)Online publication date: 1-Jul-2002
  • (2000)AMVA techniques for high service time variabilityACM SIGMETRICS Performance Evaluation Review10.1145/345063.33941828:1(217-228)Online publication date: 1-Jun-2000
  • (2000)AMVA techniques for high service time variabilityProceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems10.1145/339331.339418(217-228)Online publication date: 17-Jun-2000
  • (1998)PANDA: ring-based multiprocessor system using new snooping protocolProceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)10.1109/ICPADS.1998.741012(10-17)Online publication date: 1998

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media