Article

Free access

A new approach for automatic parallelization of blocked linear Algebra computations

Authors:

H. T. Kung,

Jaspal SubhlokAuthors Info & Claims

Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing

Pages 122 - 129

https://doi.org/10.1145/125826.125898

Published: 01 August 1991 Publication History

PDF eReader

References

[1]

M. Annaratone, E. Amould, T. Gross, H. T. Kung, M. Lam, O. Menzilcioglu, and J. A. Webb. The Warp Computer: Architecture, Implementation, and Performance. IEEE Transactions on Computers, C-36 (12): 1523-1538, December 1987.

Digital Library

Google Scholar

[2]

S. Borkar, R. Cohn, G. Cox, S. Gleason, T. Gross, H. T. Kung, M. Lain, B. Moore, C. Peterson, J. Pieper, L. Rankin, P. S. Tseng, J. Sutton, J. Urbanski, and LWebb. iWarp: An integrated solution to high-speed parallel computing. In Proceedings of the Supercomputing Conference, pages 330--339, November 1988.

Digital Library

Google Scholar

[3]

S.Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J.Webb, "Supporting systolic and memory communication in iWarp, in" Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 70--81, Seattle, WA, May 1990.

Digital Library

Google Scholar

[4]

M. Dayde and I. Duff. Level 3 BLAS in LU Factorization on the CRAY-2, ETA-10P, and IBM3090-200/VF. The international Journal of Supercomputer Applications, 3(2):40--70. 1989.

Google Scholar

[5]

M. Dayde and I. Duff. Use of parallel level 3 BLAS in LU Factorization on three vector multiprocessors the Alliant FX/80, the CRAY2, and the IBM 3090 VF. in Proceedings of the 1990 International Conference on SUPERCOM- PUTING, pages 82--95, Amsterdam, The Netherlands, June 1990.

Digital Library

Google Scholar

[6]

J. Dongarra, J. DU Croz, I. Duff, and S.Hammarling,. A set of level 3 basic linear algebra subprograrns.ACM Transactions on Mathematical Software, 16 (1): 1-17, March 1990.

Digital Library

Google Scholar

[7]

J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson. An extended set of fortran basic linear algebra subprograms. ACM Transactions on Mathematical Software, 14 (1):1-17, March 1988.

Digital Library

Google Scholar

[8]

J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, and D. Sorensen. Prospectus for the development of a linear algebra library for high-performance computers. Technical Report ANL-MCS-TM-97, Argonne National Lab, September 1987.

Google Scholar

[9]

J. Dongarra and S. Ostrouchov. Lapack block factorization algorithms on the intel IPSC/860. Technical Report CS-90- 115, Computer Science Department, University of Tennessee, October 1990.

Digital Library

Google Scholar

[10]

W. Gentleman and H. T. Kung. Matrix triangularization by systolic arrays. In Proceedings of SPIE Symposium, Vol. 298, Real-Time Signal Processing IV, pages 19-26, August 1981.

Google Scholar

[11]

K. Gallivan, W. Jalby, U. Meier, and A. Sameh. Impact of hierarchical memory systems on linear algebra algorithm design. The International Journal of Supercomputer Applications,3(2):40--70, 1989.

Google Scholar

[12]

S. Hiranandani, K. Kennedy, and C. Tseng. Compiler support for machine independent parallel programming in Fortran D. Technical Report TR90-149, Dept. of Computer Science, Rice University, February 1991.

Google Scholar

[13]

H. T. Kung,. Why Systolic Architectures?. Computer Magazine, 15 (1): 37-46, January 1982.

Digital Library

Google Scholar

[14]

H.T. Kung and C. Leiserson. Systolic Arrays (for VLSI). In Sparse Matrix Proceedings 1978, edited by I. S. Duff and G. W. Stewart. A slightly different version appears in Introduction to VLSI Systems by C. A. Mead and L. A. Conway, Addison-Wesley, 1980, Section 8.3, pp. 37-46.

Google Scholar

[15]

C. Lawson, R. Hanson, R. Kincaid, and F. Krogh. Basic linear algebra subprograms for fortran usage. ACM Transactions on Mathematical Software, 16: 308--323, 1979.

Digital Library

Google Scholar

[16]

H. Ribas.Automatic Generation of Systolic Programs from Nested Loops. Ph.D. thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University, June 1990.

Digital Library

Google Scholar

[17]

P.S. Tseng. A Parallelizing Compiler for Distributed Memory Parallel Computers, Ph.D. thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University, May 1989.

Digital Library

Google Scholar

Cited By

View all

Baroudi TSeghir RLoechner V(2017)Optimization of Triangular and Banded Matrix Operations Using 2d-Packed LayoutsACM Transactions on Architecture and Code Optimization10.1145/316201614:4(1-19)Online publication date: 18-Dec-2017
https://dl.acm.org/doi/10.1145/3162016
Jaiswal MChandrachoodan N(2012)FPGA-Based High-Performance and Scalable Block LU Decomposition ArchitectureIEEE Transactions on Computers10.1109/TC.2011.2461:1(60-72)Online publication date: 1-Jan-2012
https://dl.acm.org/doi/10.1109/TC.2011.24
Wu GDou YSun JPeterson G(2012)A High Performance and Memory Efficient LU Decomposer on FPGAsIEEE Transactions on Computers10.1109/TC.2010.27861:3(366-378)Online publication date: 1-Mar-2012
https://dl.acm.org/doi/10.1109/TC.2010.278
Show More Cited By

Index Terms

Recommendations

Properties of blocked linear systems

This paper presents a systematic study on the properties of blocked linear systems that have resulted from blocking discrete-time linear time invariant systems. The main idea is to explore the relationship between the blocked and the unblocked systems. ...
Hybrid Approach for Parallelization of Sequential Code with Function Level and Block Level Parallelization
PARELEC '06: Proceedings of the international symposium on Parallel Computing in Electrical Engineering

Automatic parallelization of a sequential code is about finding parallel segments in the code and executing these segments parallely by sending them to different computers in a grid. Basically, parallel segments in the code can be found by doing block ...
Axiomatizing the algebra of net computations and processes
Abstract
Descriptions of concurrent behaviors in terms of partial orderings (callednonsequential processes or simplyprocesses in Petri net theory) have been recognized as superior when information about distribution in space, about causal dependency or ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing

August 1991

920 pages

ISBN:0897914597

DOI:10.1145/125826

Conference Chair:
Ray Elliott

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1991

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SC '91

Sponsor:

SIGARCH
IEEE-CS

SC '91: International Conference for High Performance Computing, Networking, Storage and Analysis

November 18 - 22, 1991

New Mexico, Albuquerque, USA

Acceptance Rates

Supercomputing '91 Paper Acceptance Rate 83 of 215 submissions, 39%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
358
Total Downloads

Downloads (Last 12 months)69
Downloads (Last 6 weeks)14

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Baroudi TSeghir RLoechner V(2017)Optimization of Triangular and Banded Matrix Operations Using 2d-Packed LayoutsACM Transactions on Architecture and Code Optimization10.1145/316201614:4(1-19)Online publication date: 18-Dec-2017
https://dl.acm.org/doi/10.1145/3162016
Jaiswal MChandrachoodan N(2012)FPGA-Based High-Performance and Scalable Block LU Decomposition ArchitectureIEEE Transactions on Computers10.1109/TC.2011.2461:1(60-72)Online publication date: 1-Jan-2012
https://dl.acm.org/doi/10.1109/TC.2011.24
Wu GDou YSun JPeterson G(2012)A High Performance and Memory Efficient LU Decomposer on FPGAsIEEE Transactions on Computers10.1109/TC.2010.27861:3(366-378)Online publication date: 1-Mar-2012
https://dl.acm.org/doi/10.1109/TC.2010.278
Chan KGibbons APias MRytter W(2006)On the PVM computations of transitive closure and algebraic path problemsRecent Advances in Parallel Virtual Machine and Message Passing Interface10.1007/BFb0056593(338-345)Online publication date: 2-Jun-2006
https://doi.org/10.1007/BFb0056593
Li XLi ZDavid FZhou PZhou YAdve SKumar S(2004)Performance directed energy management for main memory and disksACM SIGOPS Operating Systems Review10.1145/1037949.102442538:5(271-283)Online publication date: 7-Oct-2004
https://dl.acm.org/doi/10.1145/1037949.1024425
Gomaa MPowell MVijaykumar T(2004)Heat-and-runACM SIGOPS Operating Systems Review10.1145/1037949.102442438:5(260-270)Online publication date: 7-Oct-2004
https://dl.acm.org/doi/10.1145/1037949.1024424
Shen XZhong YDing C(2004)Locality phase predictionACM SIGOPS Operating Systems Review10.1145/1037949.102441438:5(165-176)Online publication date: 7-Oct-2004
https://dl.acm.org/doi/10.1145/1037949.1024414
Denehy TBent JPopovici FArpaci-Dusseau AArpaci-Dusseau R(2004)Deconstructing storage arraysACM SIGOPS Operating Systems Review10.1145/1037949.102440138:5(59-71)Online publication date: 7-Oct-2004
https://dl.acm.org/doi/10.1145/1037949.1024401
Pagourtzis APotapov IRytter W(2002)Observations on Parallel Computation of Transitive and Max-Closure ProblemsRecent Advances in Parallel Virtual Machine and Message Passing Interface10.1007/3-540-45825-5_37(217-225)Online publication date: 18-Sep-2002
https://doi.org/10.1007/3-540-45825-5_37
Pagourtzis APotapov IRytter W(2001)PVM Computation of the Transitive Closure: The Dependency Graph ApproachRecent Advances in Parallel Virtual Machine and Message Passing Interface10.1007/3-540-45417-9_35(249-256)Online publication date: 11-Sep-2001
https://doi.org/10.1007/3-540-45417-9_35
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Properties of blocked linear systems

Hybrid Approach for Parallelization of Sequential Code with Function Level and Block Level Parallelization

Axiomatizing the algebra of net computations and processes