Deshmukh S, Yokota R and Bosilca G.
(2023). Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors. ACM Transactions on Mathematical Software. 49:3. (1-29). Online publication date: 30-Sep-2023.https://doi.org/10.1145/3595178