Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3115936.3115949acmotherconferencesArticle/Chapter ViewAbstractPublication PagesissacConference Proceedingsconference-collections
short-paper

Cache-oblivious Matrix Multiplication for Exact TU Factorisation

Published: 23 July 2017 Publication History

Abstract

We present a cache-oblivious adaptation of matrix multiplication to be incorporated in the parallel TU decomposition for rectangular matrices over finite fields, based on the Morton-hybrid space-filling curve representation. To realise this, we introduce the concepts of alignment and containment of sub-matrices under the Morton-hybrid layout. We redesign the decompositions within the recursive matrix multiplication to force the base case to avoid all jumps in address space, at the expense of extra recursive matrix multiplication (MM) calls. We show that the resulting cache oblivious adaptation has low span, and our experiments demonstrate that its sequential evaluation order demonstrates significant improvement in run-time, despite the recursion overhead. We also observe orders of magnitude reductions in cache misses, which promises to yield a highly I/O efficient multithreaded deployment of this algorithm on parallel machines with private or shared caches.

References

[1]
F. K. Abu Salem and M. Al Arab. "Comparative study of space filling curves for cache oblivious TU Decomposition", extd. report, http://arxiv.org/abs/1612.06069
[2]
M. D. Adams and D. S. Wise. "Fast additions on masked integers", in SIGPLAN Not., 41(5):39--45, 2006.
[3]
M. D. Adams and D. S. Wise. "Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms", in MSPC '06, 41--50, ACM Press, 2006.
[4]
G. Blelloch, P. B. Gibbons, and H.-V. Simhadri. "Low depth cache-oblivious algorithms", in SPAA 2010, pp. 189--199, ACM Press, 2010.
[5]
S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. "A portable programming interface for performance evaluation on modern processors", in Int. J. High Perform. Comput. Appl., 14(3):189--204, 2000.
[6]
N. Chen, N. Wang, and B. Shi. "A new algorithm for encoding and decoding the hilbert order", in Softw. Pract. Exper., 37(8):897--908, 2007.
[7]
T. Cormen, C. Leiserson, R. Rivest and C. Stein. Introduction to Algorithms.
[8]
J.G. Dumas and J.L. Roch. "A parallel block algorithm for the exact triangulization of rectangular matrices", in SPAA 2001, pp. 324--325, ACM Press, 2001.
[9]
Jean-Guillaume Dumas and Jean-Louis Roch. "A parallel block algorithm for exact triangulizations", in Parallel Comp., 28 (11), 1531--1548, 2002.
[10]
Jeremy D. Frens and David S. Wise. "QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism", in PPoPP '03, pp. 144--154.
[11]
O. H. Ibarra, S. Moran, and L. E. Rosier, "A note on the parallel complexity of computing the rank of order n matrices", in Inf. Proc. Lett., 11(4-5):162--162, 1980.
[12]
O. H. Ibarra, S. Moran, and R. Hui, "A generalization of the fast LUP matrix decomposition algorithm and applications", in J. of Algs., 3(1):45--56, 1982.
[13]
X. Liu and G. Schrack. "Encoding and decoding the Hilbert order", in Softw. Pract. Exper., 26(12):1335--1346, 1996.
[14]
P. Merkey. "Z-ordering and UPC", online, Michigan Tech. Univ., June 2003.
[15]
H. Niederreiter. "Factorization of polynomials and some linear algebra problems over finite fields", in Lin. Alg. Appl., vol. 192, pp. 301--328, 1993.
[16]
H. Niederreiter. "A new efficient factorization algorithm for polynomials over small finite fields", in App. Alg.in Eng. Comm. Comp., vol. 4, pp. 81--87, 1993.
[17]
H. Niederreiter and R. Göttfert. "Factorization of polynomials over finite fields and characteristic sequences", in J. Symb. Comput., 16(5):401--412, 1993.
[18]
R. Raman and D. Stephen Wise. "Converting to and from dilated integers", in IEEE Trans. Comp., 57(4):567--573, 2008.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
PASCO 2017: Proceedings of the International Workshop on Parallel Symbolic Computation
July 2017
91 pages
ISBN:9781450352888
DOI:10.1145/3115936
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Heriot-Watt University: Heriot-Watt University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cache-oblivious Algorithms
  2. Finite Fields
  3. Locality of reference
  4. Matrix Multiplication
  5. Morton-hybrid Layout
  6. Space-filling Curves
  7. TU Decomposition

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

PASCO 2017

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 60
    Total Downloads
  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media