Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Modeling cost/performance of a parallel computer simulator

Published: 01 January 1997 Publication History
First page of PDF

References

[1]
AGARWAL, A., HOROWITZ, M., AND HENNESSY, J. 1989. An analytical cache model. ACM Trans. Comput. Syst. 7, 2 (May), 184-215.
[2]
AGARWAL, A., SITES, R. L., AND HOROWITZ, M. 1986. ATUM: A new technique for capturing address traces using microcode. In Proceedings of the 13th Annual International Symposium on Computer Architecture (June), 119-127.
[3]
AYANI, R. 1989. A parallel simulation scheme based on the distance between objects. In Proceedings of the SCS Multiconference on Distributed Simulation (March), 113-118.
[4]
BAILEY, D., BARTON, J., LASINSKI, T., AND SIMON, H. 1991. The nas parallel benchmarks. Tech. Rep. RNR-91-002 Revision 2 (Aug.), Ames Research Center.
[5]
BOOTHE, B. 1992. Fast accurate simulation of large shared memory multiprocessors. Tech. Rep. CSD 92/682 (Jan.), Computer Science Division (EECS), University of California at Berkeley.
[6]
BREWER, E. A., DELLAROCAS, C. N., COLBROOK, A., AND WEIHL, W. 1991. Proteus: A highperformance parallel-architecture simulator. Tech. Rep. MIT/LCS/TR-516 (Sept.), MIT Laboratory for Computer Science.
[7]
CMELIK, R. F. AND KEPPEL, D. 1994. Shade: A fast instruction-set simulator for execution profiling. In Proceedings of the 1994 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems (May).
[8]
COVINGTON, R., MADALA, S., MEHTA, V., JUMP, J., AND SINCLAIR, J. 1988. The Rice parallel processing testbed. In Proceedings of the 1988 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (May), 4-11.
[9]
DAVIS, H., GOLDSCHMIDT, S. R., AND HENNESSY, J. 1991. Multiprocessor simulation and tracing using Tango. In Proceedings of the 1991 International Conference on Parallel Processing (Vol. II Software) (Aug.), II99-107.
[10]
EGGERS, S. J., KEPPEL, D. R., KOLDINGER, E. J., AND LEVY, H. M. 1990. Techniques for efficient inline tracing on a shared-memory multiprocessor. In Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (May), 37-47.
[11]
FALSAFI, B. AND WOOD, D.A. 1994. Cost/performance of a parallel computer simulator. In Proceedings of the Eighth Workshop on Parallel and Distributed Simulation (PADS '94) (July).
[12]
FUJIMOTO, R.M. 1983. Simon: A simulator of multicomputer networks. Tech. Rep. UCB/ CSD 83/137, ERL, University of California, Berkeley.
[13]
FUJIMOTO, R. M. 1990. Parallel discrete event simulation. Commun. ACM 33, 10 (Oct.), 30 -53.
[14]
HILL, M. D., LARUS, J. R., REINHARDT, S. K., AND WOOD, D.A. 1993. Cooperative shared memory: Software and hardware for scalable multiprocessors. ACM Trans. Comput. Syst. 11, 4 (Nov.), 300-318. Earlier version appeared in Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V).
[15]
HILLIS, W. D. AND TUCKER, L.W. 1993. The CM-5 connection machine: A scalable supercomputer. Commun. ACM 36, 11 (Nov.), 31-40.
[16]
JAIN, R. 1991. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. John Wiley & Sons, New York.
[17]
JASWINDER, P., SINGH, J. L. H. AND GUPTA, A. 1993. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer 26, 7 (July), 42-50.
[18]
KRUSKAL, C. P. AND WEISS, A. 1985. Allocating independent subtasks on parallel processors. IEEE Trans. Softw. Eng. 11, 10 (Oct.), 1001-1016.
[19]
LARUS, J. R. AND SCHNARR, E. 1995. Eel: Machine-independent executable editing. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation (PLDI) (June), 291-300.
[20]
LAZOWSKA, E. D., ZAHORJAN, J., GRAHAM, G. S., AND SEVCIK, K.C. 1984. Quantitative System Performance: Computer System Analysis Using Queueing Network Models. Prentice-Hall, Englewood Cliffs, NJ.
[21]
LEISERSON, C. E., ABUHAMDEH, Z. S., DOUGLAS, D. C., FEYNMAN, C. R., GANMUKHI, M. N., HILL, J. V., HILLIS, W. D., KUSZMAUL, B. C., PIERRE, M. A. S., WELLS, D. S., WONG, M. C., YANG, S.-W., AND ZAK, R. 1993. The network architecture of the connection machine CM-5. In Proceedings of the Fifth ACM Symposium on Parallel Algorithms and Architectures (SPAA) (July).
[22]
LI, K. AND HUDAK, P. 1989. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst. 7, 4 (Nov.), 321-359.
[23]
LIN, J.-J. 1992. Efficient parallel simulation for designing multiprocessor system. Ph.D. Thesis, University of Michigan, Ann Arbor.
[24]
LUBACHEVSKY, B. D. 1989. Efficient distributed event-driven simulations of multiple-loop networks. Commun. ACM 32, 2 (Jan.), 111-123.
[25]
MENDELSON, A., THIEBAUT, D., AND PRADHAN, D. 1990a. Modeling of live and true sharing in multi-cache memory systems. In Proceedings of the 1990 International Conference on Parallel Processing (Vol. I Architecture) (1990).
[26]
MENDELSON, A., THIEBAUT, D., AND PRADHAN, D. 1990b. Modeling live and dead lines in cache memory systems. Tech. Rep. TR-90-CSE-14, Department of Electrical and Computer Engineering, University of Massachusetts.
[27]
MISRA, J. 1986. Distributed-discrete event simulation. ACM Comput. Surv. 18, 1 (March), 39-65.
[28]
NICOL, D. 1992. Conservative parallel simulation of priority class queueing networks. IEEE Trans. Parallel Distrib. Syst. 3, 3 (May), 398-412.
[29]
PRZYBYLSKI, S.A. 1994. New DRAM Technologies: A Comprehensive Analysis of the New Architectures. MicroDesign Resources.
[30]
REIDENBACH, E. 1993. CHALLENGE Server Periodic Table. Silicon Graphics Computer Systems.
[31]
REINHARDT, S. K., FALSAFI, B., AND WOOD, D.A. 1993a. Kernel support for the Wisconsin Wind Tunnel. In Proceedings of the Usenix Symposium on Microkernels and Other Kernel Architectures (Sept.).
[32]
REINHARDT, S. K., HILL, M. D., LARUS, J. R., LEBECK, A. R., LEWIS, J. C., AND WOOD, D. A. 1993b. The Wisconsin Wind Tunnel: Virtual prototyping of parallel computers. In Proceedings of the 1993 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems (May), 48-60.
[33]
ROESSLER, B. 1996. Personal communication.
[34]
ROSENBLUM, M., HERROD, S. A., WITCHEL, E., AND GUPTA, A. 1995. Fast and accurate multiprocessor simulation: The simos approach. IEEE Parallel and Distrib. Technol. 3, 4 (Fall).
[35]
SEMICONDUCTOR INDUSTRY ASSOCIATION. 1994. The national technology roadmap for semiconductors, http://www.sematech.org/public/roadmap/.
[36]
SINGH, J. P., WEBER, W.-D., AND GUPTA, A. 1992. Splash: Stanford parallel applications for shared memory. Comput. Arch. News 20, 1 (March), 5-44.
[37]
SPEC. 1990. Spec benchmark suite release 1.0.
[38]
STEINMAN, J. S. 1992. Speedes: A multiple-synchronization environment for parallel discrete-event simulation. Int. J. Comput. Simul. 2, 251-286.
[39]
THIEBAUT, D. AND STONE, H.S. 1987. Footprints in the cache. ACM Trans. Comput. Syst. 5, 4 (Nov.), 305-329.
[40]
UHLIG, R., NAGLE, D., MUDGE, T., AND SECHREST, S. 1994. Tapeworm ii: A new method for measuring OS effects on memory architecture performance. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI) (Oct.), 132-144.
[41]
WOOD, D. A., CHANDRA, S., FALSAFI, B., HILL, M. D., LARUS, J. R., LEBECK, A. R., LEWIS, J. C., MUKHERJEE, S. S., PALACHARLA, S., AND REINHARDT, S.K. 1993. Mechanisms for cooperative shared memory. In Proceedings of the 20th Annual International Symposium on Computer Architecture (May), 156-168. Also appeared in CMG Trans., Spring 1994.
[42]
WOOD, D. A. AND HILL, M.D. 1995. Cost-effective parallel computing. IEEE Comput. 28, 2 (Feb.), 69-72.

Cited By

View all

Recommendations

Reviews

Guenter Haring

The cost/performance ratio of simulating a hypothetical target parallel computer at system level using a commercial host parallel computer is examined. The authors give a comprehensive description of the performance model used in previous papers. They develop an analytic performance model of the Wisconsin Wind Tunnel (WWT) to accurately predict the simulation speedup, which is the ratio of the time for simulating N target nodes on p host nodes and on one host node. The WWT simulates cache-coherent shared-memory machines on a message-passing CM-5 using direct execution, which exploits the commonality between the instruction sets of the simulated and the host computer. In WWT, the interprocess interactions are managed by dividing program execution into lock-step quanta. The model, described in the main part of the paper, predicts the running time of a quantum. To accurately model the WWT's performance, the running time is broken down on the critical path into four components: the sum of the processing times, the direct overheads, the indirect overheads, and the quantum synchronization time. The variability in event processing time is modeled using a slightly modified version of Kruskal and Weiss's model for fork-join parallel programs based on standard analysis-of-variance techniques. WWT uses separate address space for each target node, causing direct context switch overhead, the frequency of which is modeled by the maximum of binomial random variables. To predict the interference of multiple targets in the host cache and TLB (indirect overhead), a generalization of Thie`baut and Stone's footprint model is used, by allowing sharing among address spaces of multiple (more than two) processes. The performance model extracts parameters from a fully parallel simulation of a specific target system and uses them to predict the performance of the same simulation running on a different number of host nodes. The model was validated using five different benchmarks. A more detailed explanation of the observed results would have improved this part of the paper. The part on modeling cost/performance starts by describing the assumptions about how the target and the simulating system scale, and their consequences for the model. Then the authors introduce simple cost models for uniprocessors, small-scale bus-based shared-memory multiprocessors, and large-scale parallel supercomputers. These models allow them to perform a detailed cost/performance analysis by varying the number of host processors and the number of target nodes per host node. With this kind of analysis, the optimum number K=N/p of target nodes simulated per host node can be found. K is essentially independent of p . Furthermore, the authors show under which conditions the parallel simulation is not only faster but more cost-effective than sequential simulation. In the examples used, the authors give the cost-effectiveness for target systems that have more than 32 nodes.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 1997
Published in TOMACS Volume 7, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. conservative discrete-event simulation
  2. memory systems
  3. performance analysis
  4. shared-memory multiprocessors

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)9
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)SNN Simulation Performance Prediction: A Nonempirical MethodJournal of Circuits, Systems and Computers10.1142/S021812662250183331:10Online publication date: 30-Mar-2022
  • (2017)Thread Data Sharing in CacheACM SIGPLAN Notices10.1145/3155284.301875952:8(103-115)Online publication date: 26-Jan-2017
  • (2017)Cache Exclusivity and SharingACM Transactions on Architecture and Code Optimization10.1145/313443714:4(1-26)Online publication date: 14-Nov-2017
  • (2017)Thread Data Sharing in CacheProceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3018743.3018759(103-115)Online publication date: 26-Jan-2017
  • (2016)The gig economyXRDS: Crossroads, The ACM Magazine for Students10.1145/301349623:2(38-41)Online publication date: 15-Dec-2016
  • (2016)A Probabilistic Lifestyle-Based Trajectory Model for Social Strength Inference from Human Trajectory DataACM Transactions on Information Systems10.1145/294806435:1(1-28)Online publication date: 3-Sep-2016
  • (2016)Diversity, Serendipity, Novelty, and CoverageACM Transactions on Interactive Intelligent Systems10.1145/29267207:1(1-42)Online publication date: 19-Dec-2016
  • (2016)TapirACM SIGOPS Operating Systems Review10.1145/2883591.288360249:2(51-56)Online publication date: 20-Jan-2016
  • (2016)The communication design of WeChatCommunication Design Quarterly10.1145/2875501.28755034:1(23-35)Online publication date: 8-Jan-2016
  • (2014)Performance Metrics and Models for Shared CacheJournal of Computer Science and Technology10.1007/s11390-014-1460-729:4(692-712)Online publication date: 4-Jul-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media