Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Bounds on the speedup and efficiency of partial synchronization in parallel processing systems

Published: 03 January 1995 Publication History

Abstract

In this paper, we derive bounds on the speedup and efficiency of applications that schedule tasks on a set of parallel processors. We assume that the application runs an algorithm that consists of N iterations and before starting its i+1st iteration, a processor must wait for data (i.e., synchronize) calculated in the ith iteration by a subset of the other processors of the system. Processing times and interconnections between iterations are modeled by random variables with possibly deterministic distributions. Scientific applications consisting of iterations of recursive equations are examples of such applications that can be modeled within this formulation. We consider the efficiency of applications and show that, although efficiency decreases with an increase in the number of processors, it has a nonzero limit when the number of processors increases to infinity. We obtain a lower bound for the efficiency by solving an equation that depends on the distribution of task service times and the expected number of tasks needed to be synchronized. We also show that the lower bound is approached if the topology of the processor graph is ldquo;spread-out,” a notion we define in the paper.

References

[1]
~AMDAHL, G.M. 1967. Validity of the single processor approach to achieving large scale com- ~puting capabilities. AFIPS Conference Proceedings, vol. 30. Thompson, Washington D.C., pp. ~483-485.
[2]
~AXELROD, T. 1986. Effects of synchronization barriers on multiprocessor performance. Parallel ~Comput. 3, 129-140.
[3]
~BACCELLI, F., COHEN, G., OSLDER, G., AND QUADRAT, J. 1992. Synchronization and Linearity ~and Algebra for Discrete Et,ent Systems. Wiley, New York.
[4]
~BACCELLI, F. AND LIU, Z. 1990. On the execution of parallel programs on multiprocessor ~systems--A queueing theory approach. J. ACM. 37, 373-414.
[5]
~BACCELLI, F., AND MAKOWSKI, m.M. 1989. Queueing models for systems with synchronization ~constraints. Proc. IEEE, 77, 1, 138-161.
[6]
~BIGGINS, J.D. 1976. The first and last birth problem for a multitype age-dependent branching ~processes. Adu. Appl. Prob., 8, 446-459.
[7]
~BOKHARI, S. H. 1981. On the mapping problem. IEEE Trans. Computers C-30, 3, Mar. ~207-214.
[8]
~BROCHARD, L. 1989a. Scalability, granularity and parallelism of numerical algorithms. IBM ~Res. Rep. RC 14786 (June).
[9]
~BROCHARD, L. 1989b. Efficiency of some parallel numerical algorithms on distributed systems. ~Parallel Comput. 12, 21-44.
[10]
~BROCHARD, C., PROST, J. P., AND FAURIE, F. 1989. Synchronization and load unbalance effects ~of parallel iterative algorithms. In Proceedings of the 1989 International Conference on Parallel ~Processing. pp. III-153-III-160. ~synchronization in parallel processing systems. IBM RC 16474. Jan.
[11]
~CYTRON, R. 1985. Useful parallelism in a multiprocessing environment. In Proceedings of the ~1985 Parallel Processing Conference.
[12]
~DAHLQUIST, G. AND BJORCK, i. 1974. Numerical Methods. Prentice-Hall, Englewood Cliffs, ~N.J.
[13]
~DAVID, H.A. 1980. Order Statistics. Wiley, New York.
[14]
~DUBOIS, M. AND BRIGGS, F. 1982. Performance of synchronized iterative processes in multipro- ~cessor systems. IEEE Trans. Softw. Eng. SE-8, 4, (July) 419-431.
[15]
~DUBOIS, M., SCHEURICH, C., AND BRIGGS, F. 1988. Synchronization, coherence, and event ~ordering in multiprocessors. IEEE Computer, 21 (Feb.) 9-21.
[16]
~EAGER, D. L., ZAHORJAN, J., AND LAZOWSKA, E.D. 1989. Speedup versus efficiency in parallel ~systems. IEEE Trans. Comput., (Mar.), 408-423.
[17]
~FLATT, H. 1984. A simple model for parallel processing. IEEE Computer, 17, (Nov.), 95.
[18]
~FLATT, H. AND KENNEDY, K. 1989. Performance of parallel processors. Parallel Computing. 12, ~1-20.
[19]
~CANNON, D. B. AND VAN ROSENDALE, J. 1984. On the impact of communication complexity on ~the design of parallel numerical algorithms. IEEE Trans. Computers. C-33, 12, (Dec.). 1180-1194.
[20]
~GREENBAUM, A. 1989. Synchronization costs on multiprocessors. Parallel Comput. lO, 3-14.
[21]
~GUSTAFSON, J.L. 1988. Reevaluating Amdahl's law. Commun. ACM31, 5 (May), 532-533.
[22]
~HACK, J. 1989. On the promise of general purpose parallel computing. Parallel Comput. 10, ~261-275.
[23]
~LAVENBERG, S.S. 1983. Computer Performance Modeling Handbook. Academic Press, Orlando, ~Fla.
[24]
~KINGMAN, J. F.C. 1975. The first birth problem for an age-dependent branching process. Ann. ~Prob. 3, 5, 790-801.
[25]
~LINT, B. AND AGERWALA, T. t981. Communication issues in the design and analysis of parallel ~algorithms. IEEE Trans. Softw. Eng. SE-7, 2, (Mar.), 174-181.
[26]
~MARSHALL, A. W. AND OLKIN, I. 1979. Inequalities: Theory of Majorization and Its Applica- ~tions. Academic Press, Orlando, Fla.
[27]
~NELSON, R. 1990. A performance of a general parallel processing model. Perf. Eval. Review ~18, 1, 13-26.
[28]
~NELSON, R., TANTAWI, g. N. 1988. Approximate analysis of fork/join synchronization in ~parallel queues. IEEE Trans. Computetx 037, 739-743.
[29]
~NELSON, R., TOWSLEY, D., TANTAWI, A.N. 1988. Performance analysis of parallel processing ~systems. IEEE Trans. Softw. Eng. 14, 4 (Apr.), 532-540.
[30]
~PFISTER, G. F. AND NORTON, V.A. 1985. Hot spot contention and combining in multistage ~interconnection networks. IEEE Trans. Computers. C-34, 10, (Oct.), 943-948.
[31]
~ORTEGA, J. M., AND VOIGHT, R. 1985. Solution of Partial Differential Equations on Vector and ~Parallel Computers. SIAM, New York.
[32]
~ROBINSON, J. 1979. Some analysis techniques for asynchronous multiprocessor algorithms. ~IEEE Trans. Softw. Eng. Se-5, 1 (Jan.), 24-31.
[33]
~Ross, S. 1983. Stochastic Processes. Wiley, New York.
[34]
~STONE, H. 1987. High Performance Computer Architectures. Addison-Wesley, Reading, Pa.
[35]
~STOYAN, D. 1983. Comparison Methods for Queues and Other Stochastic Models. Wiley, New ~York.
[36]
~VARADHAN, S. R.S. 1984. Large Deviations and,4pplications. Society for Industrial and Applied ~Mathematics.

Cited By

View all
  • (2024)An efficient 2-D flood inundation modelling based on a data-driven approachJournal of Hydrology: Regional Studies10.1016/j.ejrh.2024.10174152(101741)Online publication date: Apr-2024
  • (2015)Modeling, Profiling, and Debugging the Energy Consumption of Mobile DevicesACM Computing Surveys10.1145/284072348:3(1-40)Online publication date: 22-Dec-2015
  • (2015)Data-driven Human Mobility ModelingACM Computing Surveys10.1145/284072248:3(1-39)Online publication date: 22-Dec-2015
  • Show More Cited By

Index Terms

  1. Bounds on the speedup and efficiency of partial synchronization in parallel processing systems

      Recommendations

      Reviews

      Wai Sum Lai

      Chang and Nelson improve our understanding of the fundamental limitations on the speedup achievable by executing iterative algorithms in parallel on multiple processors. This paper focuses on determining the fundamental limitations of task synchronization as a result of randomness in task execution time. Prior analyses were mainly based on deterministic computation models and thus ignored the impact of stochastic effects on machine performance. The model in this paper allows for the overlapping of synchronization time with task execution, thereby capturing the overhead due to resource sharing more accurately. Moreover, it can analyze partial synchronization. This is a difficult problem because, under partial synchronization, the random epochs in which different processors start iterations are highly correlated. The mathematical approach used by the authors to account for these dependencies is noteworthy. This paper contains two major theoretical results. First, it shows that, for applications requiring only partial synchronization, there is a nonzero lower bound on the efficiency as the number of processors approaches infinity. Second, in the problem of mapping tasks onto processors for execution, it shows that the structure of the processor task graph can have a significant effect on the attainable efficiency—the more spread out this graph, the lower the efficiency. This has implications for the development of task graphs for parallel algorithms.

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Journal of the ACM
      Journal of the ACM  Volume 42, Issue 1
      Jan. 1995
      289 pages
      ISSN:0004-5411
      EISSN:1557-735X
      DOI:10.1145/200836
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 January 1995
      Published in JACM Volume 42, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. large deviations theory
      2. synchronization

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)55
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)An efficient 2-D flood inundation modelling based on a data-driven approachJournal of Hydrology: Regional Studies10.1016/j.ejrh.2024.10174152(101741)Online publication date: Apr-2024
      • (2015)Modeling, Profiling, and Debugging the Energy Consumption of Mobile DevicesACM Computing Surveys10.1145/284072348:3(1-40)Online publication date: 22-Dec-2015
      • (2015)Data-driven Human Mobility ModelingACM Computing Surveys10.1145/284072248:3(1-39)Online publication date: 22-Dec-2015
      • (2014)Software-Defined Networking Paradigms in Wireless Networks: A SurveyACM Computing Surveys10.1145/265569047:2(1-11)Online publication date: 12-Nov-2014
      • (2014)Analysis of Fork/Join and Related Queueing SystemsACM Computing Surveys10.1145/262891347:2(1-71)Online publication date: 25-Aug-2014
      • (2013)Distributed Computing Design Methods for Multicore Application ProgrammingAdvanced Materials Research10.4028/www.scientific.net/AMR.756-759.1295756-759(1295-1299)Online publication date: Sep-2013
      • (2010)Analysis of Delays Caused by Local SynchronizationSIAM Journal on Computing10.1137/08072309039:8(3860-3884)Online publication date: 1-Dec-2010
      • (2006)A performance analysis of local synchronizationProceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures10.1145/1148109.1148154(254-260)Online publication date: 30-Jul-2006
      • (2005)On the throughput of multicasting with incremental forward error correctionIEEE Transactions on Information Theory10.1109/TIT.2004.84263451:3(900-918)Online publication date: 1-Mar-2005
      • (1996)On the exponentiality of stochastic linear systems under the max-plus algebraIEEE Transactions on Automatic Control10.1109/9.53368041:8(1182-1188)Online publication date: Jan-1996
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media