Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/125826.125898acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article
Free access

A new approach for automatic parallelization of blocked linear Algebra computations

Published: 01 August 1991 Publication History
First page of PDF

References

[1]
M. Annaratone, E. Amould, T. Gross, H. T. Kung, M. Lam, O. Menzilcioglu, and J. A. Webb. The Warp Computer: Architecture, Implementation, and Performance. IEEE Transactions on Computers, C-36 (12): 1523-1538, December 1987.
[2]
S. Borkar, R. Cohn, G. Cox, S. Gleason, T. Gross, H. T. Kung, M. Lain, B. Moore, C. Peterson, J. Pieper, L. Rankin, P. S. Tseng, J. Sutton, J. Urbanski, and LWebb. iWarp: An integrated solution to high-speed parallel computing. In Proceedings of the Supercomputing Conference, pages 330--339, November 1988.
[3]
S.Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J.Webb, "Supporting systolic and memory communication in iWarp, in" Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 70--81, Seattle, WA, May 1990.
[4]
M. Dayde and I. Duff. Level 3 BLAS in LU Factorization on the CRAY-2, ETA-10P, and IBM3090-200/VF. The international Journal of Supercomputer Applications, 3(2):40--70. 1989.
[5]
M. Dayde and I. Duff. Use of parallel level 3 BLAS in LU Factorization on three vector multiprocessors the Alliant FX/80, the CRAY2, and the IBM 3090 VF. in Proceedings of the 1990 International Conference on SUPERCOM- PUTING, pages 82--95, Amsterdam, The Netherlands, June 1990.
[6]
J. Dongarra, J. DU Croz, I. Duff, and S.Hammarling,. A set of level 3 basic linear algebra subprograrns.ACM Transactions on Mathematical Software, 16 (1): 1-17, March 1990.
[7]
J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson. An extended set of fortran basic linear algebra subprograms. ACM Transactions on Mathematical Software, 14 (1):1-17, March 1988.
[8]
J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, and D. Sorensen. Prospectus for the development of a linear algebra library for high-performance computers. Technical Report ANL-MCS-TM-97, Argonne National Lab, September 1987.
[9]
J. Dongarra and S. Ostrouchov. Lapack block factorization algorithms on the intel IPSC/860. Technical Report CS-90- 115, Computer Science Department, University of Tennessee, October 1990.
[10]
W. Gentleman and H. T. Kung. Matrix triangularization by systolic arrays. In Proceedings of SPIE Symposium, Vol. 298, Real-Time Signal Processing IV, pages 19-26, August 1981.
[11]
K. Gallivan, W. Jalby, U. Meier, and A. Sameh. Impact of hierarchical memory systems on linear algebra algorithm design. The International Journal of Supercomputer Applications,3(2):40--70, 1989.
[12]
S. Hiranandani, K. Kennedy, and C. Tseng. Compiler support for machine independent parallel programming in Fortran D. Technical Report TR90-149, Dept. of Computer Science, Rice University, February 1991.
[13]
H. T. Kung,. Why Systolic Architectures?. Computer Magazine, 15 (1): 37-46, January 1982.
[14]
H.T. Kung and C. Leiserson. Systolic Arrays (for VLSI). In Sparse Matrix Proceedings 1978, edited by I. S. Duff and G. W. Stewart. A slightly different version appears in Introduction to VLSI Systems by C. A. Mead and L. A. Conway, Addison-Wesley, 1980, Section 8.3, pp. 37-46.
[15]
C. Lawson, R. Hanson, R. Kincaid, and F. Krogh. Basic linear algebra subprograms for fortran usage. ACM Transactions on Mathematical Software, 16: 308--323, 1979.
[16]
H. Ribas.Automatic Generation of Systolic Programs from Nested Loops. Ph.D. thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University, June 1990.
[17]
P.S. Tseng. A Parallelizing Compiler for Distributed Memory Parallel Computers, Ph.D. thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University, May 1989.

Cited By

View all
  • (2017)Optimization of Triangular and Banded Matrix Operations Using 2d-Packed LayoutsACM Transactions on Architecture and Code Optimization10.1145/316201614:4(1-19)Online publication date: 18-Dec-2017
  • (2012)FPGA-Based High-Performance and Scalable Block LU Decomposition ArchitectureIEEE Transactions on Computers10.1109/TC.2011.2461:1(60-72)Online publication date: 1-Jan-2012
  • (2012)A High Performance and Memory Efficient LU Decomposer on FPGAsIEEE Transactions on Computers10.1109/TC.2010.27861:3(366-378)Online publication date: 1-Mar-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing
August 1991
920 pages
ISBN:0897914597
DOI:10.1145/125826
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1991

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SC '91
Sponsor:

Acceptance Rates

Supercomputing '91 Paper Acceptance Rate 83 of 215 submissions, 39%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)16
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Optimization of Triangular and Banded Matrix Operations Using 2d-Packed LayoutsACM Transactions on Architecture and Code Optimization10.1145/316201614:4(1-19)Online publication date: 18-Dec-2017
  • (2012)FPGA-Based High-Performance and Scalable Block LU Decomposition ArchitectureIEEE Transactions on Computers10.1109/TC.2011.2461:1(60-72)Online publication date: 1-Jan-2012
  • (2012)A High Performance and Memory Efficient LU Decomposer on FPGAsIEEE Transactions on Computers10.1109/TC.2010.27861:3(366-378)Online publication date: 1-Mar-2012
  • (2006)On the PVM computations of transitive closure and algebraic path problemsRecent Advances in Parallel Virtual Machine and Message Passing Interface10.1007/BFb0056593(338-345)Online publication date: 2-Jun-2006
  • (2004)Performance directed energy management for main memory and disksACM SIGOPS Operating Systems Review10.1145/1037949.102442538:5(271-283)Online publication date: 7-Oct-2004
  • (2004)Heat-and-runACM SIGOPS Operating Systems Review10.1145/1037949.102442438:5(260-270)Online publication date: 7-Oct-2004
  • (2004)Locality phase predictionACM SIGOPS Operating Systems Review10.1145/1037949.102441438:5(165-176)Online publication date: 7-Oct-2004
  • (2004)Deconstructing storage arraysACM SIGOPS Operating Systems Review10.1145/1037949.102440138:5(59-71)Online publication date: 7-Oct-2004
  • (2002)Observations on Parallel Computation of Transitive and Max-Closure ProblemsRecent Advances in Parallel Virtual Machine and Message Passing Interface10.1007/3-540-45825-5_37(217-225)Online publication date: 18-Sep-2002
  • (2001)PVM Computation of the Transitive Closure: The Dependency Graph ApproachRecent Advances in Parallel Virtual Machine and Message Passing Interface10.1007/3-540-45417-9_35(249-256)Online publication date: 11-Sep-2001
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media