research-article

The Queue-Read Queue-Write PRAM Model: : Accounting for Contention in Parallel Algorithms

Authors:

Phillip B. Gibbons,

Vijaya RamachandranAuthors Info & Claims

SIAM Journal on Computing, Volume 28, Issue 2

Pages 733 - 769

https://doi.org/10.1137/S009753979427491

Published: 01 January 1998 Publication History

Abstract

This paper introduces the queue-read queue-write ({\sc qrqw}) parallel random access machine ({\sc pram}) model, which permits concurrent reading and writing to shared-memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to this work there were no formal complexity models that accounted for the contention to memory locations, despite its large impact on the performance of parallel programs. The {\sc qrqw pram} model reflects the contention properties of most commercially available parallel machines more accurately than either the well-studied {\sc crcw pram} or {\sc erew pram} models: the {\sc crcw} model does not adequately penalize algorithms with high contention to shared-memory locations, while the {\sc erew} model is too strict in its insistence on zero contention at each step.

The {\sc qrqw pram} is strictly more powerful than the {\sc erew pram}. This paper shows a separation of $\sqrt{\log n}$ between the two models, and presents faster and more efficient {\sc qrqw} algorithms for several basic problems, such as linear compaction, leader election, and processor allocation. Furthermore, we present a work-preserving emulation of the {\sc qrqw pram} with only logarithmic slowdown on Valiant's {\sc bsp} model, and hence on hypercube-type noncombining networks, even when latency, synchronization, and memory granularity overheads are taken into account. This matches the best-known emulation result for the {\sc erew pram}, and considerably improves upon the best-known efficient emulation for the {\sc crcw pram} on such networks. Finally, the paper presents several lower bound results for this model, including lower bounds on the time required for broadcasting and for leader election.

References

[1]

F. Abolhassan, J. Keller, and W. J. Paul, On the cost‐effectiveness of PRAMs, in Proc. 3rd IEEE Symp. on Parallel and Distributed Processing, Dallas, TX, 1991, pp. 2–9.

[2]

R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith, The Tera computer system, in Proc. 1990 International Conf. on Supercomputing, Amsterdam, The Netherlands, 1990, pp. 1–6.

[3]

Y. Aumann and M. O. Rabin, Clock construction in fully asynchronous parallel systems and PRAM simulation, in Proc. 33rd IEEE Symp. on Foundations of Computer Science, Pittsburgh, PA, 1992, pp. 147–156.

[4]

P. Beame, M. Kik, and M. Kutyłowski, Information broadcasting by exclusive‐write PRAMs, Parallel Process. Lett., 4 (1994), pp. 159–169.

[5]

G. Bell, Ultracomputers: A teraflop before its time, Comm. Assoc. Comput. Mach., 35 (1992), pp. 26–47.

Digital Library

[6]

G. E. Blelloch, Scans as primitive parallel operations, IEEE Trans. Comput., C‐38 (1989), pp. 1526–1538.

[7]

G. E. Blelloch, Prefix sums and their applications, in A Synthesis of Parallel Algorithms, J. H. Reif, ed., Morgan‐Kaufmann, San Mateo, CA, 1993, pp. 35–60.

[8]

G. E. Blelloch, S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha, Implementation of a portable nested data‐parallel language, in Proc. 4th ACM SIGPLAN Symp. on Principles and Practices of Parallel Programming, San Diego, CA, 1993, pp. 102–111.

[9]

, Graph theory with applications to algorithms and computer science, Proceedings of the fifth international conference held at Western Michigan University, Kalamazoo, Mich., June 4–8, 1984, A Wiley‐Interscience Publication, John Wiley & Sons Inc., 1985, 0–0, xv+810

[10]

R. P. Brent, The parallel evaluation of general arithmetic expressions, J. Assoc. Comput. Mach., 21 (1974), pp. 201–208.

Digital Library

[11]

R. Cole and O. Zajicek, The APRAM: Incorporating asynchrony into the PRAM model, in Proc. 1st ACM Symp. on Parallel Algorithms and Architectures, Santa Fe, NM, 1989, pp. 169–178.

[12]

Stephen Cook, Cynthia Dwork, Rüdiger Reischuk, Upper and lower time bounds for parallel random access machines without simultaneous writes, SIAM J. Comput., 15 (1986), 87–97

Digital Library

[13]

D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. vonEicken, LogP: Towards a realistic model of parallel computation, in Proc. 4th ACM SIGPLAN Symp. on Principles and Practices of Parallel Programming, San Diego, CA, 1993, pp. 1–12.

[14]

R. Cypher, Valiant’s Maximum Algorithm with Sequential Memory Accesses, Tech. Rep. TR 88‐03‐08, Department of Computer Science, University of Washington, Seattle, WA, 1988.

[15]

W. J. Dally, J. S. Keen, and M. D. Noakes, The J‐Machine architecture and evaluation, in Proc. 1993 IEEE Compcon Spring, San Francisco, CA, 1993, pp. 183–188.

[16]

S. R. Dickey and R. Kenner, Hardware combining and scalability, in Proc. 4th ACM Symp. on Parallel Algorithms and Architectures, San Diego, CA, 1992, pp. 296–305.

[17]

Martin Dietzfelbinger, Mirosław Kutyłowski, Rüdiger Reischuk, Exact lower time bounds for computing Boolean functions on CREW PRAMs, J. Comput. System Sci., 48 (1994), 231–254

Digital Library

[18]

R. Drefenstedt and D. Schmidt, On the physical design of butterfly networks for PRAMs, in Proc. 4th IEEE Symp. on the Frontiers of Massively Parallel Computation, McLean, VA, 1992, pp. 202–209.

[19]

Cynthia Dwork, Maurice Herlihy, Orli Waarts, Contention in shared memory algorithms, J. ACM, 44 (1997), 779–805

Digital Library

[20]

Faith Fich, Mirosław Kowaluk, Mirosław Kutyłowski, Krzysztof Loryś, Prabhakar Ragde, Retrieval of scattered information by EREW, CREW, and CRCW PRAMs, Comput. Complexity, 5 (1995), 113–131

Digital Library

[21]

F. Fich, F. Meyer auf der Heide, P. Ragde, and A. Wigderson, One, two, three,..., infinity: Lower bounds for parallel computation, in Proc. 17th ACM Symp. on Theory of Computing, Providence, RI, 1985, pp. 48–58.

[22]

Steven Fortune, James Wyllie, Parallelism in random access machines, ACM, New York, 1978, 114–118

[23]

S. Frank, H. Burkhardt III, and J. Rothnie, The KSR1: Bridging the gap between shared memory and MPPs, in Proc. 1993 IEEE Compcon Spring, San Francisco, CA, 1993, pp. 285–294.

[24]

P. B. Gibbons, A more practical PRAM model, in Proc. 1st ACM Symp. on Parallel Algorithms and Architectures, Santa Fe, NM, 1989, pp. 158–168. Full version in The Asynchronous PRAM: A Semi‐synchronous Model for Shared Memory MIMD Machines, Ph.D. thesis, U. C. Berkeley, CA, 1989.

[25]

P. B. Gibbons, Asynchronous PRAM algorithms, in A Synthesis of Parallel Algorithms, J. H. Reif, ed., Morgan‐Kaufmann, San Mateo, CA, 1993, pp. 957–997.

[26]

P. B. Gibbons, Y. Matias, and V. Ramachandran, QRQW: Accounting for Concurrency in PRAMs and Asynchronous PRAMs, Tech. Rep., AT&T Bell Laboratories, Murray Hill, NJ, 1993.

[27]

P. B. Gibbons, Y. Matias, and V. Ramachandran, Efficient low‐contention parallel algorithms, in Proc. 6th ACM Symp. on Parallel Algorithms and Architectures, Cape May, NJ, 1994, pp. 236–247.

[28]

P. B. Gibbons, Y. Matias, and V. Ramachandran, The QRQW PRAM: Accounting for contention in parallel algorithms, in Proc. 5th ACM‐SIAM Symp. on Discrete Algorithms, Arlington, VA, 1994, pp. 638–648.

[29]

P. B. Gibbons, Y. Matias, and V. Ramachandran, Efficient low‐contention parallel algorithms, J. Comput. System Sci., 53 (1996), pp. 417–442.

Digital Library

Special issue devoted to selected papers from the 1994 ACM Symp. on Parallel Algorithms and Architectures.

[30]

P. B. Gibbons, Y. Matias, and V. Ramachandran, The queue‐read queue‐write asynchronous PRAM model, Theoret. Comput. Sci., 196 (1998), pp. 3–29.

Digital Library

Special issue devoted to selected papers from EURO‐PAR’96.

[31]

Joseph Gil, Yossi Matias, Fast hashing on a PRAM—designing by expectation, ACM, New York, 1991, 271–280

[32]

J. Gil and Y. Matias, An effective load balancing policy for geometric decaying algorithms, J. Parallel Distributed Comput., 36 (1996), pp. 185–188.

Digital Library

[33]

Joseph Gil, Yossi Matias, Uzi Vishkin, Towards a theory of nearly constant time parallel algorithms, IEEE Comput. Soc. Press, Los Alamitos, CA, 1991, 698–710

[34]

L. A. Goldberg, Y. Matias, and S. Rao, An optical simulation of shared memory, in Proc. 6th ACM Symp. on Parallel Algorithms and Architectures, Cape May, NJ, 1994, pp. 257–267.

[35]

M. Goodrich, Using approximation algorithms to design parallel algorithms that may ignore processor allocation, in Proc. 32nd IEEE Symp. on Foundations of Computer Science, San Juan, Puerto Rico, 1991, pp. 711–722.

[36]

A. Greenberg, On the time complexity of broadcast communication schemes, in Proc. 14th ACM Symp. on Theory of Computing, San Francisco, CA, 1982, pp. 354–364.

[37]

W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58 (1963), pp. 13–30.

[38]

IBM Corporation, IBM Scalable POWERparallel Systems 9076 SP2 and Enhancements for SP1, 1994. Hardware announcement.

[39]

J. JáJá, An Introduction to Parallel Algorithms, Addison–Wesley, Reading, MA, 1992.

[40]

Richard Karp, Vijaya Ramachandran, Parallel algorithms for shared‐memory machines, Elsevier, Amsterdam, 1990, 869–941

[41]

R. E. Kessler and J. L. Schwarzmeier, CRAY T3D: A new dimension for Cray research, in Proc. 1993 IEEE Compcon Spring, San Francisco, CA, 1993, pp. 176–182.

[42]

Richard Ladner, Michael Fischer, Parallel prefix computation, J. Assoc. Comput. Mach., 27 (1980), 831–838

Digital Library

[43]

F. T. Leighton, Methods for message routing in parallel machines, in Proc. 24th ACM Symp. on Theory of Computing, Victoria, British Columbia, Canada, 1992, pp. 77–96. Invited paper.

[44]

C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong, S.‐W. Yang, and R. Zak, The network architecture of the Connection Machine CM‐5, in Proc. 4th ACM Symp. on Parallel Algorithms and Architectures, San Diego, CA, 1992, pp. 272–285.

[45]

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy, The directory‐based cache coherence protocol for the DASH multiprocessor, in Proc. 17th International Symp. on Computer Architecture, Seattle, WA, 1990, pp. 148–159.

[46]

D. Lenoski, J. Laudon, K. Gharachorloo, W.‐D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam, The Stanford DASH multiprocessor, IEEE Comput., 25 (1992), pp. 63–79.

Digital Library

[47]

P. Liu, W. Aiello, and S. Bhatt, An atomic model for message‐passing, in Proc. 5th ACM Symp. on Parallel Algorithms and Architectures, Velen, Germany, 1993, pp. 154–163.

[48]

Philip MacKenzie, Vijaya Ramachandran, ERCW PRAMs and optical communication, Theoret. Comput. Sci., 196 (1998), 153–180

Digital Library

[49]

P. D. MacKenzie, private communication, Austin, TX, 1994.

[50]

Charles Martel, Arvin Park, Ramesh Subramonian, Work‐optimal asynchronous algorithms for shared memory parallel computers, SIAM J. Comput., 21 (1992), 1070–1099

Digital Library

[51]

MasPar Computer Corporation, MasPar System Overview, document 9300‐0100, revision A3, 749 North Mary Avenue, Sunnyvale, CA 94086, Mar. 1991.

[52]

Y. Matias, Highly Parallel Randomized Algorithmics, Ph.D. thesis, Tel Aviv University, Israel, 1992.

[53]

Y. Matias and U. Vishkin, Converting high probability into nearly‐constant time—with applications to parallel hashing, in Proc. 23rd ACM Symp. on Theory of Computing, New Orleans, LA, 1991, pp. 307–316.

[54]

N. Nishimura, Asynchronous shared memory parallel computation, in Proc. 2nd ACM Symp. on Parallel Algorithms and Architectures, Crete, Greece, 1990, pp. 76–84.

[55]

G. F. Pfister and V. A. Norton, “Hot spot” contention and combining in multistage interconnection networks, IEEE Trans. Comput., C‐34 (1985), pp. 943–948.

[56]

L. Prechelt, Measurements of MasPar MP‐1216A Communication Operations, Tech. Rep., Institut für Programmstrukturen und Datenorganisation, Universität Karlsruhe, Karlsruhe, Germany, 1992.

[57]

A. G. Ranade, Fluent parallel computation, Ph.D. thesis, Department of Computer Science, Yale University, New Haven, CT, 1989.

[58]

J. H. Reif, ed., A Synthesis of Parallel Algorithms, Morgan‐Kaufmann, San Mateo, CA, 1993.

[59]

M. Schmidt‐Voigt, Efficient parallel communication with the nCUBE 2S processor, Parallel Comput., 20 (1994), pp. 509–530.

Digital Library

[60]

L. Snyder, Type architecture, shared memory and the corollary of modest potential, Annual Review of CS, I (1986), pp. 289–317.

[61]

L. G. Valiant, A bridging model for parallel computation, Commun. Assoc. Comput. Mach., 33 (1990), pp. 103–111.

Digital Library

[62]

L. Valiant, General purpose parallel architectures, Elsevier, Amsterdam, 1990, 943–971

[63]

L. G. Valiant, A Combining Mechanism for Parallel Computers, Tech. Rep. TR‐24‐92, Harvard University, Cambridge, MA, 1992.

[64]

A. Yao, Probabilistic computations: Towards a unified measure of complexity, in Proc. 18th IEEE Symp. on Foundations of Computer Science, Providence, RI, 1977, pp. 222–227.

Cited By

Rukundo AAtalar ATsigas PAgrawal KLee I(2022)Performance Analysis and Modelling of Concurrent Multi-access Data StructuresProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538578(333-344)Online publication date: 11-Jul-2022
https://dl.acm.org/doi/10.1145/3490148.3538578
Tang Y(2020)Improving the Space-Time Efficiency of Matrix Multiplication AlgorithmsWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409404(1-10)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3409390.3409404
Bhimani JMi NLeeser MYang Z(2019)New Performance Modeling Methods for Parallel Data Processing ApplicationsACM Transactions on Modeling and Computer Simulation10.1145/330968429:3(1-24)Online publication date: 18-Jun-2019
https://dl.acm.org/doi/10.1145/3309684
Show More Cited By

Index Terms

The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms

Index terms have been assigned to the content through auto-classification.

Recommendations

The Queue-Read Queue-Write Asynchronous PRAM Model
The Queue-Read Queue-Write PRAM Model: Accounting for Contention in ParallelAlgorithms
Work-Time Optimal k-Merge Algorithms on the PRAM

For 2 k n, the k-merge problem is to merge a collection of k sorted sequences of total length n into a new sorted sequence. The k-merge problem is fundamental as it provides a common generalization of both merging and sorting. The main contribution of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SIAM Journal on Computing

SIAM Journal on Computing Volume 28, Issue 2

1998

453 pages

ISSN:0097-5397

Issue’s Table of Contents

Copyright © 1998 Society for Industrial and Applied Mathematics.

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 1998

Author Tags

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Rukundo AAtalar ATsigas PAgrawal KLee I(2022)Performance Analysis and Modelling of Concurrent Multi-access Data StructuresProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538578(333-344)Online publication date: 11-Jul-2022
https://dl.acm.org/doi/10.1145/3490148.3538578
Tang Y(2020)Improving the Space-Time Efficiency of Matrix Multiplication AlgorithmsWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409404(1-10)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3409390.3409404
Bhimani JMi NLeeser MYang Z(2019)New Performance Modeling Methods for Parallel Data Processing ApplicationsACM Transactions on Modeling and Computer Simulation10.1145/330968429:3(1-24)Online publication date: 18-Jun-2019
https://dl.acm.org/doi/10.1145/3309684
Agrawal KGilbert SLim WScheideler CFineman J(2018)Parallel Working-Set Search StructuresProceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures10.1145/3210377.3210390(321-332)Online publication date: 11-Jul-2018
https://dl.acm.org/doi/10.1145/3210377.3210390
Shun J(2017)ReferencesShared-Memory Parallelism Can Be Simple, Fast, and Scalable10.1145/3018787.3018804Online publication date: 9-Jun-2017
https://dl.acm.org/doi/10.1145/3018787.3018804
Shun J(2017)Conclusion and Future WorkShared-Memory Parallelism Can Be Simple, Fast, and Scalable10.1145/3018787.3018803Online publication date: 9-Jun-2017
https://dl.acm.org/doi/10.1145/3018787.3018803
Shun J(2017)Parallel Wavelet Tree ConstructionShared-Memory Parallelism Can Be Simple, Fast, and Scalable10.1145/3018787.3018802Online publication date: 9-Jun-2017
https://dl.acm.org/doi/10.1145/3018787.3018802
Shun J(2017)Parallel Lempel-Ziv FactorizationShared-Memory Parallelism Can Be Simple, Fast, and Scalable10.1145/3018787.3018801Online publication date: 9-Jun-2017
https://dl.acm.org/doi/10.1145/3018787.3018801
Shun J(2017)Parallel Computation of Longest Common PrefixesShared-Memory Parallelism Can Be Simple, Fast, and Scalable10.1145/3018787.3018800Online publication date: 9-Jun-2017
https://dl.acm.org/doi/10.1145/3018787.3018800
Shun J(2017)Parallel Cartesian Tree and Suffix Tree ConstructionShared-Memory Parallelism Can Be Simple, Fast, and Scalable10.1145/3018787.3018799Online publication date: 9-Jun-2017
https://dl.acm.org/doi/10.1145/3018787.3018799
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents