Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

The Queue-Read Queue-Write PRAM Model: : Accounting for Contention in Parallel Algorithms

Published: 01 January 1998 Publication History

Abstract

This paper introduces the queue-read queue-write ({\sc qrqw}) parallel random access machine ({\sc pram}) model, which permits concurrent reading and writing to shared-memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to this work there were no formal complexity models that accounted for the contention to memory locations, despite its large impact on the performance of parallel programs. The {\sc qrqw pram} model reflects the contention properties of most commercially available parallel machines more accurately than either the well-studied {\sc crcw pram} or {\sc erew pram} models: the {\sc crcw} model does not adequately penalize algorithms with high contention to shared-memory locations, while the {\sc erew} model is too strict in its insistence on zero contention at each step.
The {\sc qrqw pram} is strictly more powerful than the {\sc erew pram}. This paper shows a separation of $\sqrt{\log n}$ between the two models, and presents faster and more efficient {\sc qrqw} algorithms for several basic problems, such as linear compaction, leader election, and processor allocation. Furthermore, we present a work-preserving emulation of the {\sc qrqw pram} with only logarithmic slowdown on Valiant's {\sc bsp} model, and hence on hypercube-type noncombining networks, even when latency, synchronization, and memory granularity overheads are taken into account. This matches the best-known emulation result for the {\sc erew pram}, and considerably improves upon the best-known efficient emulation for the {\sc crcw pram} on such networks. Finally, the paper presents several lower bound results for this model, including lower bounds on the time required for broadcasting and for leader election.

References

[1]
F. Abolhassan, J. Keller, and W. J. Paul, On the cost‐effectiveness of PRAMs, in Proc. 3rd IEEE Symp. on Parallel and Distributed Processing, Dallas, TX, 1991, pp. 2–9.
[2]
R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith, The Tera computer system, in Proc. 1990 International Conf. on Supercomputing, Amsterdam, The Netherlands, 1990, pp. 1–6.
[3]
Y. Aumann and M. O. Rabin, Clock construction in fully asynchronous parallel systems and PRAM simulation, in Proc. 33rd IEEE Symp. on Foundations of Computer Science, Pittsburgh, PA, 1992, pp. 147–156.
[4]
P. Beame, M. Kik, and M. Kutyłowski, Information broadcasting by exclusive‐write PRAMs, Parallel Process. Lett., 4 (1994), pp. 159–169.
[5]
G. Bell, Ultracomputers: A teraflop before its time, Comm. Assoc. Comput. Mach., 35 (1992), pp. 26–47.
[6]
G. E. Blelloch, Scans as primitive parallel operations, IEEE Trans. Comput., C‐38 (1989), pp. 1526–1538.
[7]
G. E. Blelloch, Prefix sums and their applications, in A Synthesis of Parallel Algorithms, J. H. Reif, ed., Morgan‐Kaufmann, San Mateo, CA, 1993, pp. 35–60.
[8]
G. E. Blelloch, S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha, Implementation of a portable nested data‐parallel language, in Proc. 4th ACM SIGPLAN Symp. on Principles and Practices of Parallel Programming, San Diego, CA, 1993, pp. 102–111.
[9]
, Graph theory with applications to algorithms and computer science, Proceedings of the fifth international conference held at Western Michigan University, Kalamazoo, Mich., June 4–8, 1984, A Wiley‐Interscience Publication, John Wiley & Sons Inc., 1985, 0–0, xv+810
[10]
R. P. Brent, The parallel evaluation of general arithmetic expressions, J. Assoc. Comput. Mach., 21 (1974), pp. 201–208.
[11]
R. Cole and O. Zajicek, The APRAM: Incorporating asynchrony into the PRAM model, in Proc. 1st ACM Symp. on Parallel Algorithms and Architectures, Santa Fe, NM, 1989, pp. 169–178.
[12]
Stephen Cook, Cynthia Dwork, Rüdiger Reischuk, Upper and lower time bounds for parallel random access machines without simultaneous writes, SIAM J. Comput., 15 (1986), 87–97
[13]
D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. vonEicken, LogP: Towards a realistic model of parallel computation, in Proc. 4th ACM SIGPLAN Symp. on Principles and Practices of Parallel Programming, San Diego, CA, 1993, pp. 1–12.
[14]
R. Cypher, Valiant’s Maximum Algorithm with Sequential Memory Accesses, Tech. Rep. TR 88‐03‐08, Department of Computer Science, University of Washington, Seattle, WA, 1988.
[15]
W. J. Dally, J. S. Keen, and M. D. Noakes, The J‐Machine architecture and evaluation, in Proc. 1993 IEEE Compcon Spring, San Francisco, CA, 1993, pp. 183–188.
[16]
S. R. Dickey and R. Kenner, Hardware combining and scalability, in Proc. 4th ACM Symp. on Parallel Algorithms and Architectures, San Diego, CA, 1992, pp. 296–305.
[17]
Martin Dietzfelbinger, Mirosław Kutyłowski, Rüdiger Reischuk, Exact lower time bounds for computing Boolean functions on CREW PRAMs, J. Comput. System Sci., 48 (1994), 231–254
[18]
R. Drefenstedt and D. Schmidt, On the physical design of butterfly networks for PRAMs, in Proc. 4th IEEE Symp. on the Frontiers of Massively Parallel Computation, McLean, VA, 1992, pp. 202–209.
[19]
Cynthia Dwork, Maurice Herlihy, Orli Waarts, Contention in shared memory algorithms, J. ACM, 44 (1997), 779–805
[20]
Faith Fich, Mirosław Kowaluk, Mirosław Kutyłowski, Krzysztof Loryś, Prabhakar Ragde, Retrieval of scattered information by EREW, CREW, and CRCW PRAMs, Comput. Complexity, 5 (1995), 113–131
[21]
F. Fich, F. Meyer auf der Heide, P. Ragde, and A. Wigderson, One, two, three,..., infinity: Lower bounds for parallel computation, in Proc. 17th ACM Symp. on Theory of Computing, Providence, RI, 1985, pp. 48–58.
[22]
Steven Fortune, James Wyllie, Parallelism in random access machines, ACM, New York, 1978, 114–118
[23]
S. Frank, H. Burkhardt III, and J. Rothnie, The KSR1: Bridging the gap between shared memory and MPPs, in Proc. 1993 IEEE Compcon Spring, San Francisco, CA, 1993, pp. 285–294.
[24]
P. B. Gibbons, A more practical PRAM model, in Proc. 1st ACM Symp. on Parallel Algorithms and Architectures, Santa Fe, NM, 1989, pp. 158–168. Full version in The Asynchronous PRAM: A Semi‐synchronous Model for Shared Memory MIMD Machines, Ph.D. thesis, U. C. Berkeley, CA, 1989.
[25]
P. B. Gibbons, Asynchronous PRAM algorithms, in A Synthesis of Parallel Algorithms, J. H. Reif, ed., Morgan‐Kaufmann, San Mateo, CA, 1993, pp. 957–997.
[26]
P. B. Gibbons, Y. Matias, and V. Ramachandran, QRQW: Accounting for Concurrency in PRAMs and Asynchronous PRAMs, Tech. Rep., AT&T Bell Laboratories, Murray Hill, NJ, 1993.
[27]
P. B. Gibbons, Y. Matias, and V. Ramachandran, Efficient low‐contention parallel algorithms, in Proc. 6th ACM Symp. on Parallel Algorithms and Architectures, Cape May, NJ, 1994, pp. 236–247.
[28]
P. B. Gibbons, Y. Matias, and V. Ramachandran, The QRQW PRAM: Accounting for contention in parallel algorithms, in Proc. 5th ACM‐SIAM Symp. on Discrete Algorithms, Arlington, VA, 1994, pp. 638–648.
[29]
P. B. Gibbons, Y. Matias, and V. Ramachandran, Efficient low‐contention parallel algorithms, J. Comput. System Sci., 53 (1996), pp. 417–442.
Special issue devoted to selected papers from the 1994 ACM Symp. on Parallel Algorithms and Architectures.
[30]
P. B. Gibbons, Y. Matias, and V. Ramachandran, The queue‐read queue‐write asynchronous PRAM model, Theoret. Comput. Sci., 196 (1998), pp. 3–29.
Special issue devoted to selected papers from EURO‐PAR’96.
[31]
Joseph Gil, Yossi Matias, Fast hashing on a PRAM—designing by expectation, ACM, New York, 1991, 271–280
[32]
J. Gil and Y. Matias, An effective load balancing policy for geometric decaying algorithms, J. Parallel Distributed Comput., 36 (1996), pp. 185–188.
[33]
Joseph Gil, Yossi Matias, Uzi Vishkin, Towards a theory of nearly constant time parallel algorithms, IEEE Comput. Soc. Press, Los Alamitos, CA, 1991, 698–710
[34]
L. A. Goldberg, Y. Matias, and S. Rao, An optical simulation of shared memory, in Proc. 6th ACM Symp. on Parallel Algorithms and Architectures, Cape May, NJ, 1994, pp. 257–267.
[35]
M. Goodrich, Using approximation algorithms to design parallel algorithms that may ignore processor allocation, in Proc. 32nd IEEE Symp. on Foundations of Computer Science, San Juan, Puerto Rico, 1991, pp. 711–722.
[36]
A. Greenberg, On the time complexity of broadcast communication schemes, in Proc. 14th ACM Symp. on Theory of Computing, San Francisco, CA, 1982, pp. 354–364.
[37]
W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58 (1963), pp. 13–30.
[38]
IBM Corporation, IBM Scalable POWERparallel Systems 9076 SP2 and Enhancements for SP1, 1994. Hardware announcement.
[39]
J. JáJá, An Introduction to Parallel Algorithms, Addison–Wesley, Reading, MA, 1992.
[40]
Richard Karp, Vijaya Ramachandran, Parallel algorithms for shared‐memory machines, Elsevier, Amsterdam, 1990, 869–941
[41]
R. E. Kessler and J. L. Schwarzmeier, CRAY T3D: A new dimension for Cray research, in Proc. 1993 IEEE Compcon Spring, San Francisco, CA, 1993, pp. 176–182.
[42]
Richard Ladner, Michael Fischer, Parallel prefix computation, J. Assoc. Comput. Mach., 27 (1980), 831–838
[43]
F. T. Leighton, Methods for message routing in parallel machines, in Proc. 24th ACM Symp. on Theory of Computing, Victoria, British Columbia, Canada, 1992, pp. 77–96. Invited paper.
[44]
C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong, S.‐W. Yang, and R. Zak, The network architecture of the Connection Machine CM‐5, in Proc. 4th ACM Symp. on Parallel Algorithms and Architectures, San Diego, CA, 1992, pp. 272–285.
[45]
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy, The directory‐based cache coherence protocol for the DASH multiprocessor, in Proc. 17th International Symp. on Computer Architecture, Seattle, WA, 1990, pp. 148–159.
[46]
D. Lenoski, J. Laudon, K. Gharachorloo, W.‐D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam, The Stanford DASH multiprocessor, IEEE Comput., 25 (1992), pp. 63–79.
[47]
P. Liu, W. Aiello, and S. Bhatt, An atomic model for message‐passing, in Proc. 5th ACM Symp. on Parallel Algorithms and Architectures, Velen, Germany, 1993, pp. 154–163.
[48]
Philip MacKenzie, Vijaya Ramachandran, ERCW PRAMs and optical communication, Theoret. Comput. Sci., 196 (1998), 153–180
[49]
P. D. MacKenzie, private communication, Austin, TX, 1994.
[50]
Charles Martel, Arvin Park, Ramesh Subramonian, Work‐optimal asynchronous algorithms for shared memory parallel computers, SIAM J. Comput., 21 (1992), 1070–1099
[51]
MasPar Computer Corporation, MasPar System Overview, document 9300‐0100, revision A3, 749 North Mary Avenue, Sunnyvale, CA 94086, Mar. 1991.
[52]
Y. Matias, Highly Parallel Randomized Algorithmics, Ph.D. thesis, Tel Aviv University, Israel, 1992.
[53]
Y. Matias and U. Vishkin, Converting high probability into nearly‐constant time—with applications to parallel hashing, in Proc. 23rd ACM Symp. on Theory of Computing, New Orleans, LA, 1991, pp. 307–316.
[54]
N. Nishimura, Asynchronous shared memory parallel computation, in Proc. 2nd ACM Symp. on Parallel Algorithms and Architectures, Crete, Greece, 1990, pp. 76–84.
[55]
G. F. Pfister and V. A. Norton, “Hot spot” contention and combining in multistage interconnection networks, IEEE Trans. Comput., C‐34 (1985), pp. 943–948.
[56]
L. Prechelt, Measurements of MasPar MP‐1216A Communication Operations, Tech. Rep., Institut für Programmstrukturen und Datenorganisation, Universität Karlsruhe, Karlsruhe, Germany, 1992.
[57]
A. G. Ranade, Fluent parallel computation, Ph.D. thesis, Department of Computer Science, Yale University, New Haven, CT, 1989.
[58]
J. H. Reif, ed., A Synthesis of Parallel Algorithms, Morgan‐Kaufmann, San Mateo, CA, 1993.
[59]
M. Schmidt‐Voigt, Efficient parallel communication with the nCUBE 2S processor, Parallel Comput., 20 (1994), pp. 509–530.
[60]
L. Snyder, Type architecture, shared memory and the corollary of modest potential, Annual Review of CS, I (1986), pp. 289–317.
[61]
L. G. Valiant, A bridging model for parallel computation, Commun. Assoc. Comput. Mach., 33 (1990), pp. 103–111.
[62]
L. Valiant, General purpose parallel architectures, Elsevier, Amsterdam, 1990, 943–971
[63]
L. G. Valiant, A Combining Mechanism for Parallel Computers, Tech. Rep. TR‐24‐92, Harvard University, Cambridge, MA, 1992.
[64]
A. Yao, Probabilistic computations: Towards a unified measure of complexity, in Proc. 18th IEEE Symp. on Foundations of Computer Science, Providence, RI, 1977, pp. 222–227.

Cited By

View all
  • (2022)Performance Analysis and Modelling of Concurrent Multi-access Data StructuresProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538578(333-344)Online publication date: 11-Jul-2022
  • (2020)Improving the Space-Time Efficiency of Matrix Multiplication AlgorithmsWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409404(1-10)Online publication date: 17-Aug-2020
  • (2019)New Performance Modeling Methods for Parallel Data Processing ApplicationsACM Transactions on Modeling and Computer Simulation10.1145/330968429:3(1-24)Online publication date: 18-Jun-2019
  • Show More Cited By

Index Terms

  1. The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image SIAM Journal on Computing
    SIAM Journal on Computing  Volume 28, Issue 2
    1998
    453 pages

    Publisher

    Society for Industrial and Applied Mathematics

    United States

    Publication History

    Published: 01 January 1998

    Author Tags

    1. 68Q05
    2. 68Q22
    3. 68Q25

    Author Tags

    1. models of parallel computation
    2. parallel algorithms
    3. \sc pram
    4. memory contention
    5. work-time framework

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Performance Analysis and Modelling of Concurrent Multi-access Data StructuresProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538578(333-344)Online publication date: 11-Jul-2022
    • (2020)Improving the Space-Time Efficiency of Matrix Multiplication AlgorithmsWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409404(1-10)Online publication date: 17-Aug-2020
    • (2019)New Performance Modeling Methods for Parallel Data Processing ApplicationsACM Transactions on Modeling and Computer Simulation10.1145/330968429:3(1-24)Online publication date: 18-Jun-2019
    • (2018)Parallel Working-Set Search StructuresProceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures10.1145/3210377.3210390(321-332)Online publication date: 11-Jul-2018
    • (2017)ReferencesShared-Memory Parallelism Can Be Simple, Fast, and Scalable10.1145/3018787.3018804Online publication date: 9-Jun-2017
    • (2017)Conclusion and Future WorkShared-Memory Parallelism Can Be Simple, Fast, and Scalable10.1145/3018787.3018803Online publication date: 9-Jun-2017
    • (2017)Parallel Wavelet Tree ConstructionShared-Memory Parallelism Can Be Simple, Fast, and Scalable10.1145/3018787.3018802Online publication date: 9-Jun-2017
    • (2017)Parallel Lempel-Ziv FactorizationShared-Memory Parallelism Can Be Simple, Fast, and Scalable10.1145/3018787.3018801Online publication date: 9-Jun-2017
    • (2017)Parallel Computation of Longest Common PrefixesShared-Memory Parallelism Can Be Simple, Fast, and Scalable10.1145/3018787.3018800Online publication date: 9-Jun-2017
    • (2017)Parallel Cartesian Tree and Suffix Tree ConstructionShared-Memory Parallelism Can Be Simple, Fast, and Scalable10.1145/3018787.3018799Online publication date: 9-Jun-2017
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media