article

Optimal multistream sequential prefetching in a shared cache

Authors:

Binny S. Gill,

Luis Angel D. BathenAuthors Info & Claims

ACM Transactions on Storage (TOS), Volume 3, Issue 3

Pages 10 - es

https://doi.org/10.1145/1288783.1288789

Published: 01 October 2007 Publication History

Get Access

Abstract

Prefetching is a widely used technique in modern data storage systems. We study the most widely used class of prefetching algorithms known as sequential prefetching. There are two problems that plague the state-of-the-art sequential prefetching algorithms: (i) cache pollution, which occurs when prefetched data replaces more useful prefetched or demand-paged data, and (ii) prefetch wastage, which happens when prefetched data is evicted from the cache before it can be used.

A sequential prefetching algorithm can have a fixed or adaptive degree of prefetch and can be either synchronous (when it can prefetch only on a miss) or asynchronous (when it can also prefetch on a hit). To capture these distinctions we define four classes of prefetching algorithms: fixed synchronous (FS), fixed asynchronous (FA), adaptive synchronous (AS), and adaptive asynchronous (AsynchA). We find that the relatively unexplored class of AsynchA algorithms is in fact the most promising for sequential prefetching. We provide a first formal analysis of the criteria necessary for optimal throughput when using an AsynchA algorithm in a cache shared by multiple steady sequential streams. We then provide a simple implementation called AMP (adaptive multistream prefetching) which adapts accordingly, leading to near-optimal performance for any kind of sequential workload and cache size.

Our experimental setup consisted of an IBM xSeries 345 dual processor server running Linux using five SCSI disks. We observe that AMP convincingly outperforms all the contending members of the FA, FS, and AS classes for any number of streams and over all cache sizes. As anecdotal evidence, in an experiment with 100 concurrent sequential streams and varying cache sizes, AMP surpasses the FA, FS, and AS algorithms by 29--172%, 12--24%, and 21--210%, respectively, while outperforming OBL by a factor of 8. Even for complex workloads like SPC1-Read, AMP is consistently the best-performing algorithm. For the SPC2 video-on-demand workload, AMP can sustain at least 25% more streams than the next best algorithm. Furthermore, for a workload consisting of short sequences, where optimality is more elusive, AMP is able to outperform all the other contenders in overall performance.

Finally, we implemented AMP in the state-of-the-art enterprise storage system, the IBM system storage DS8000 series. We demonstrated that AMP dramatically improves performance for common sequential and batch processing workloads and delivers up to a twofold increase in the sequential read capacity.

References

[1]

Cao, P., Felten, E. W., Karlin, A. R., and Li, K. 1995. A study of integrated prefetching and caching strategies. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. 188--197.

Abstract

References

Cited By

Index Terms

Recommendations

Sequential Hardware Prefetching in Shared-Memory Multiprocessors

ZPP: A Dynamic Technique to Eliminate Cache Pollution in NoC based MPSoCs

Performance evaluation of the fixed sequential prefetching on a bus-based multiprocessor: preliminary results

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations