short-paper

Cache-oblivious Matrix Multiplication for Exact TU Factorisation

Authors:

Fatima K. Abu Salem,

Mira Al ArabAuthors Info & Claims

PASCO 2017: Proceedings of the International Workshop on Parallel Symbolic Computation

Article No.: 13, Pages 1 - 10

https://doi.org/10.1145/3115936.3115949

Published: 23 July 2017 Publication History

Abstract

We present a cache-oblivious adaptation of matrix multiplication to be incorporated in the parallel TU decomposition for rectangular matrices over finite fields, based on the Morton-hybrid space-filling curve representation. To realise this, we introduce the concepts of alignment and containment of sub-matrices under the Morton-hybrid layout. We redesign the decompositions within the recursive matrix multiplication to force the base case to avoid all jumps in address space, at the expense of extra recursive matrix multiplication (MM) calls. We show that the resulting cache oblivious adaptation has low span, and our experiments demonstrate that its sequential evaluation order demonstrates significant improvement in run-time, despite the recursion overhead. We also observe orders of magnitude reductions in cache misses, which promises to yield a highly I/O efficient multithreaded deployment of this algorithm on parallel machines with private or shared caches.

References

[1]

F. K. Abu Salem and M. Al Arab. "Comparative study of space filling curves for cache oblivious TU Decomposition", extd. report, http://arxiv.org/abs/1612.06069

[2]

M. D. Adams and D. S. Wise. "Fast additions on masked integers", in SIGPLAN Not., 41(5):39--45, 2006.

Digital Library

[3]

M. D. Adams and D. S. Wise. "Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms", in MSPC '06, 41--50, ACM Press, 2006.

Digital Library

[4]

G. Blelloch, P. B. Gibbons, and H.-V. Simhadri. "Low depth cache-oblivious algorithms", in SPAA 2010, pp. 189--199, ACM Press, 2010.

Digital Library

[5]

S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. "A portable programming interface for performance evaluation on modern processors", in Int. J. High Perform. Comput. Appl., 14(3):189--204, 2000.

Digital Library

[6]

N. Chen, N. Wang, and B. Shi. "A new algorithm for encoding and decoding the hilbert order", in Softw. Pract. Exper., 37(8):897--908, 2007.

Digital Library

[7]

T. Cormen, C. Leiserson, R. Rivest and C. Stein. Introduction to Algorithms.

[8]

J.G. Dumas and J.L. Roch. "A parallel block algorithm for the exact triangulization of rectangular matrices", in SPAA 2001, pp. 324--325, ACM Press, 2001.

Digital Library

[9]

Jean-Guillaume Dumas and Jean-Louis Roch. "A parallel block algorithm for exact triangulizations", in Parallel Comp., 28 (11), 1531--1548, 2002.

Digital Library

[10]

Jeremy D. Frens and David S. Wise. "QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism", in PPoPP '03, pp. 144--154.

Digital Library

[11]

O. H. Ibarra, S. Moran, and L. E. Rosier, "A note on the parallel complexity of computing the rank of order n matrices", in Inf. Proc. Lett., 11(4-5):162--162, 1980.

[12]

O. H. Ibarra, S. Moran, and R. Hui, "A generalization of the fast LUP matrix decomposition algorithm and applications", in J. of Algs., 3(1):45--56, 1982.

[13]

X. Liu and G. Schrack. "Encoding and decoding the Hilbert order", in Softw. Pract. Exper., 26(12):1335--1346, 1996.

Digital Library

[14]

P. Merkey. "Z-ordering and UPC", online, Michigan Tech. Univ., June 2003.

[15]

H. Niederreiter. "Factorization of polynomials and some linear algebra problems over finite fields", in Lin. Alg. Appl., vol. 192, pp. 301--328, 1993.

[16]

H. Niederreiter. "A new efficient factorization algorithm for polynomials over small finite fields", in App. Alg.in Eng. Comm. Comp., vol. 4, pp. 81--87, 1993.

Digital Library

[17]

H. Niederreiter and R. Göttfert. "Factorization of polynomials over finite fields and characteristic sequences", in J. Symb. Comput., 16(5):401--412, 1993.

Digital Library

[18]

R. Raman and D. Stephen Wise. "Converting to and from dilated integers", in IEEE Trans. Comp., 57(4):567--573, 2008.

Digital Library

Index Terms

Cache-oblivious Matrix Multiplication for Exact TU Factorisation
1. Computing methodologies
  1. Symbolic and algebraic manipulation
    1. Symbolic and algebraic algorithms
2. Mathematics of computing
  1. Mathematical software

Recommendations

Cache-Oblivious Algorithms

This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: ...
Cache-Oblivious Sparse Matrix-Vector Multiplication by Using Sparse Matrix Partitioning Methods

In this article, we introduce a cache-oblivious method for sparse matrix-vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a recursive hypergraph-based sparse matrix partitioning scheme so that the ...
Matrix multiplication: a case study of enhanced data cache utilization

Modern machines present two challenges to algorithm engineers and compiler writers: They have superscalar, super-pipelined structure, and they have elaborate memory subsystems specifically designed to reduce latency and increase bandwidth. Matrix ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

PASCO 2017: Proceedings of the International Workshop on Parallel Symbolic Computation

July 2017

91 pages

ISBN:9781450352888

DOI:10.1145/3115936

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Heriot-Watt University: Heriot-Watt University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

PASCO 2017

PASCO 2017: International Workshop on Parallel Symbolic Computation

July 23 - 24, 2017

Kaiserslautern, Germany

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
60
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten