Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2021
Processor-Aware Cache-Oblivious Algorithms✱
ICPP '21: Proceedings of the 50th International Conference on Parallel ProcessingArticle No.: 55, Pages 1–10https://doi.org/10.1145/3472456.3472506Frigo et al. proposed an ideal cache model and a recursive technique to design sequential cache-efficient algorithms in a cache-oblivious fashion. Ballard et al. pointed out that it is a fundamental open problem to extend the technique to an arbitrary ...
- extended-abstractJuly 2020
Balanced Partitioning of Several Cache-Oblivious Algorithms
SPAA '20: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and ArchitecturesPages 575–577https://doi.org/10.1145/3350755.3400214Frigo et al. proposed an ideal cache model and a recursive cache-oblivious technique to design sequential cache-efficient algorithms in an oblivious fashion. Ballard et al. pointed out that it is a fundamental open problem to extend the technique to an ...
Cache-oblivious High-performance Similarity Join
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataPages 87–104https://doi.org/10.1145/3299869.3319859A similarity join combines vectors based on a distance condition. Typically, such algorithms apply a filter step (by indexing or sorting) and then refine pairs of candidate vectors. In this paper, we propose to refine the pairs in an order defined by a ...
- research-articleJanuary 2018
Cache-Oblivious Buffer Heap and Cache-Efficient Computation of Shortest Paths in Graphs
ACM Transactions on Algorithms (TALG), Volume 14, Issue 1Article No.: 1, Pages 1–33https://doi.org/10.1145/3147172We present the buffer heap, a cache-oblivious priority queue that supports Delete-Min, Delete, and a hybrid Insert/Decrease-Key operation in O(1/B log2 N/M) amortized block transfers from main memory, where M and B are the (unknown) cache size and block ...
- research-articleOctober 2017
Autogen: Automatic Discovery of Efficient Recursive Divide-8-Conquer Algorithms for Solving Dynamic Programming Problems
- Rezaul Chowdhury,
- Pramod Ganapathi,
- Stephen Tschudi,
- Jesmin Jahan Tithi,
- Charles Bachmeier,
- Charles E. Leiserson,
- Armando Solar-Lezama,
- Bradley C. Kuszmaul,
- Yuan Tang
ACM Transactions on Parallel Computing (TOPC), Volume 4, Issue 1Article No.: 4, Pages 1–30https://doi.org/10.1145/3125632We present Autogen—an algorithm that for a wide class of dynamic programming (DP) problems automatically discovers highly efficient cache-oblivious parallel recursive divide-and-conquer algorithms from inefficient iterative descriptions of DP ...
-
- research-articleJuly 2017
Provably Efficient Scheduling of Cache-oblivious Wavefront Algorithms
SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and ArchitecturesPages 339–350https://doi.org/10.1145/3087556.3087586Iterative wavefront algorithms for evaluating dynamic programming recurrences exploit optimal parallelism but show poor cache performance. Tiled-iterative wavefront algorithms achieve optimal cache complexity and high parallelism but are cache-aware and ...
- posterJanuary 2017
POSTER: Provably Efficient Scheduling of Cache-Oblivious Wavefront Algorithms
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 435–436https://doi.org/10.1145/3018743.3019031Standard cache-oblivious recursive divide-and-conquer algorithms for evaluating dynamic programming recurrences have optimal serial cache complexity but often have lower parallelism compared with iterative wavefront algorithms due to artificial ...
Also Published in:
ACM SIGPLAN Notices: Volume 52 Issue 8 - research-articleJune 2016
Anti-Persistence on Persistent Storage: History-Independent Sparse Tables and Dictionaries
- Michael A. Bender,
- Jonathan W. Berry,
- Rob Johnson,
- Thomas M. Kroeger,
- Samuel McCauley,
- Cynthia A. Phillips,
- Bertrand Simon,
- Shikha Singh,
- David Zage
PODS '16: Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsPages 289–302https://doi.org/10.1145/2902251.2902276We present history-independent alternatives to a B-tree, the primary indexing data structure used in databases. A data structure is history independent (HI) if it is impossible to deduce any information by examining the bit representation of the data ...
- research-articleFebruary 2016
AUTOGEN: automatic discovery of cache-oblivious parallel recursive algorithms for solving dynamic programs
- Rezaul Chowdhury,
- Pramod Ganapathi,
- Jesmin Jahan Tithi,
- Charles Bachmeier,
- Bradley C. Kuszmaul,
- Charles E. Leiserson,
- Armando Solar-Lezama,
- Yuan Tang
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingArticle No.: 10, Pages 1–12https://doi.org/10.1145/2851141.2851167We present AUTOGEN---an algorithm that for a wide class of dynamic programming (DP) problems automatically discovers highly efficient cache-oblivious parallel recursive divide-and-conquer algorithms from inefficient iterative descriptions of DP ...
Also Published in:
ACM SIGPLAN Notices: Volume 51 Issue 8 - ArticleMay 2015
High-Performance Energy-Efficient Recursive Dynamic Programming with Matrix-Multiplication-Like Flexible Kernels
IPDPS '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing SymposiumPages 303–312https://doi.org/10.1109/IPDPS.2015.107Dynamic Programming (DP) problems arise in wide range of application areas spanning from logistics to computational biology. In this paper, we show how to obtain high-performing parallel implementations for a class of Problems by reducing them to highly ...
- research-articleJune 2014
The input/output complexity of triangle enumeration
PODS '14: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsPages 224–233https://doi.org/10.1145/2594538.2594552We consider the well-known problem of enumerating all triangles of an undirected graph. Our focus is on determining the input/output (I/O) complexity of this problem. Let E be the number of edges, M<E the size of internal memory, and B the block size. ...
- articleJanuary 2014
High-Level Strategies for Parallel Shared-Memory Sparse Matrix-Vector Multiplication
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 25, Issue 1Pages 116–125https://doi.org/10.1109/TPDS.2013.31The sparse matrix-vector multiplication is an important computational kernel, but is hard to efficiently execute even in the sequential case. The problems--namely low arithmetic intensity, inefficient cache use, and limited memory bandwidth--are ...
- ArticleMay 2012
NUMA Aware Iterative Stencil Computations on Many-Core Systems
IPDPS '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing SymposiumPages 461–473https://doi.org/10.1109/IPDPS.2012.50Temporal blocking in iterative stencil computations allows to surpass the performance of peak system bandwidth that holds for a single stencil computation. However, the effectiveness of temporal blocking depends strongly on the tiling scheme, which must ...
- research-articleJanuary 2012
Cache-Oblivious Algorithms
ACM Transactions on Algorithms (TALG), Volume 8, Issue 1Article No.: 4, Pages 1–22https://doi.org/10.1145/2071379.2071383This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: ...
- research-articleFebruary 2011
Data management for SSDs for large-scale interactive graphics applications
I3D '11: Symposium on Interactive 3D Graphics and GamesPages 175–182https://doi.org/10.1145/1944745.1944775Solid state drives (SSDs) are emerging as an alternative storage medium to HDDs. SSDs have performance characteristics (e.g., fast random reads) that are very different from those of HDDs. Because of the high performance of SSDs, there are increasingly ...
- research-articleSeptember 2010
Binary Mesh Partitioning for Cache-Efficient Visualization
IEEE Transactions on Visualization and Computer Graphics (ITVC), Volume 16, Issue 5Pages 815–828https://doi.org/10.1109/TVCG.2010.19One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO ...
- posterMay 2010
Porting existing cache-oblivious linear algebra HPC modules to larrabee architecture
CF '10: Proceedings of the 7th ACM international conference on Computing frontiersPages 91–92https://doi.org/10.1145/1787275.1787298Cache-obliviousness represents an important but relatively new concept for cache optimization. As cache-oblivious algorithms perform well on architectures with arbitrary cache configurations, the programming effort required for porting and optimizing ...
- articleJuly 2009
Cache-Oblivious Sparse Matrix-Vector Multiplication by Using Sparse Matrix Partitioning Methods
SIAM Journal on Scientific Computing (SISC), Volume 31, Issue 4Pages 3128–3154https://doi.org/10.1137/080733243In this article, we introduce a cache-oblivious method for sparse matrix-vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a recursive hypergraph-based sparse matrix partitioning scheme so that the ...
- ArticleJune 2007
EaseDB: a cache-oblivious in-memory query processor
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of dataPages 1064–1066https://doi.org/10.1145/1247480.1247607We propose to demonstrate EaseDB, the first cache-oblivious queryprocessor for in-memory relational query processing. The cache-oblivious notion from the theory community refers to the property that no parameters in an algorithm or a data structure need ...
- ArticleNovember 2006
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics
SC '06: Proceedings of the 2006 ACM/IEEE conference on SupercomputingPages 78–eshttps://doi.org/10.1145/1188455.1188538Dynamic programming has been one of the most efficient approaches to sequence analysis and structure prediction in biology. However, their performance is limited due to the drastic increase in both the number of biological data and variety of the ...