Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Optimal multistream sequential prefetching in a shared cache

Published: 01 October 2007 Publication History

Abstract

Prefetching is a widely used technique in modern data storage systems. We study the most widely used class of prefetching algorithms known as sequential prefetching. There are two problems that plague the state-of-the-art sequential prefetching algorithms: (i) cache pollution, which occurs when prefetched data replaces more useful prefetched or demand-paged data, and (ii) prefetch wastage, which happens when prefetched data is evicted from the cache before it can be used.
A sequential prefetching algorithm can have a fixed or adaptive degree of prefetch and can be either synchronous (when it can prefetch only on a miss) or asynchronous (when it can also prefetch on a hit). To capture these distinctions we define four classes of prefetching algorithms: fixed synchronous (FS), fixed asynchronous (FA), adaptive synchronous (AS), and adaptive asynchronous (AsynchA). We find that the relatively unexplored class of AsynchA algorithms is in fact the most promising for sequential prefetching. We provide a first formal analysis of the criteria necessary for optimal throughput when using an AsynchA algorithm in a cache shared by multiple steady sequential streams. We then provide a simple implementation called AMP (adaptive multistream prefetching) which adapts accordingly, leading to near-optimal performance for any kind of sequential workload and cache size.
Our experimental setup consisted of an IBM xSeries 345 dual processor server running Linux using five SCSI disks. We observe that AMP convincingly outperforms all the contending members of the FA, FS, and AS classes for any number of streams and over all cache sizes. As anecdotal evidence, in an experiment with 100 concurrent sequential streams and varying cache sizes, AMP surpasses the FA, FS, and AS algorithms by 29--172%, 12--24%, and 21--210%, respectively, while outperforming OBL by a factor of 8. Even for complex workloads like SPC1-Read, AMP is consistently the best-performing algorithm. For the SPC2 video-on-demand workload, AMP can sustain at least 25% more streams than the next best algorithm. Furthermore, for a workload consisting of short sequences, where optimality is more elusive, AMP is able to outperform all the other contenders in overall performance.
Finally, we implemented AMP in the state-of-the-art enterprise storage system, the IBM system storage DS8000 series. We demonstrated that AMP dramatically improves performance for common sequential and batch processing workloads and delivers up to a twofold increase in the sequential read capacity.

References

[1]
Cao, P., Felten, E. W., Karlin, A. R., and Li, K. 1995. A study of integrated prefetching and caching strategies. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. 188--197.
[2]
Chen, T. -F. and Baer, J. -L. 1992. Reducing memory latency via non-blocking and prefetching caches. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating System (ASPLOS). SIGPLAN Not. 27, 9, 51--61.
[3]
Chen, T. -F. and Baer, J. -L. 1995. Effective hardware based data prefetching for high-performance processors. IEEE Trans. Compute. 44, 5, 609--623.
[4]
Cortes, T. and Labarta, J. 1999. Linear aggressive prefetching: A way to increase the performance of cooperative caches. In Proceedings of the Joint International Parallel Processing Symposium and IEEE Symposium on Parallel and Distributed Processing. San Juan, PR, 45--54.
[5]
Curewitz, K. M., Krishnan, P., and Vitter, J.S. 1993. Practical prefetching via data compression. ACM SIGMOD Rec. 22, 2, 257--266.
[6]
Dahlgren, F., Dubois, M., and Stenström, P. 1993. Fixed and adaptive sequential prefetching in shared memory multiprocessors. In Proceedings of the International Conference on Parallel Processing (ICPP), 56--63.
[7]
Dahlgren, F. and Stenström, P. 1996. Evaluation of hardware-based stride and sequential prefetching in shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst. 7, 4, 385--398.
[8]
David Callahan, K. K. and Porterfield, A. 1991. Software prefetching. In ACM SIGARCH Comput. Archit. News. 19, 2, ACM Press, New York, 40--52.
[9]
Fu, J. W. C. and Patel, J. H. 1991. Data prefetching in multiprocessor vector cache memories. In Proceedings of the 18th Annual International Symposium on Computer Architecture, Toronto, Intario, Canada, 54--63.
[10]
Gill, B. S. and Modha, D.S. 2005a. SARC: Sequential prefetching in adaptive replacement cache. In Proceedings of the USENIX Annual Technical Conference. 293--308.
[11]
Gill, B. S. and Modha, D. S. 2005b. WOW: Wide ordering of writes---Combining spatial and temporal locality in non-volatile caches. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST). 129--142.
[12]
Gornish, E. H., Granston, E. D., and Veidenbaum, A. V. 1990. Compiler-Directed data prefetching in multiprocessors with memory hierarchies. In Proceedings International Conference on Supercomputing ACM SIGARCH Comput. Archite. News. 18, 3, 354--368.
[13]
Griffioen, J. and Appleton, R. 1994. Reducing file system latency using a predictive approach. In Proceedings of the Conference, USENIX Summer. 197--207.
[14]
Grimsrud, K. S., Archibald, J. K., and Nelson, B. E. 1993. Multiple prefetch adaptive disk caching. IEEE Trans. Knowl. Data Eng. 5, 1, 88--103.
[15]
Harizopoulos, S., Harizakis, C., and Triantafillou, P. 2000. Hierarchical caching and prefetching for high performance continuous media servers with smart disks. IEEE Concurrency 8, 3, 16--22.
[16]
IBM Redbooks. 2007. IBM System Storage DS8000 Series: Architecture and Implementation. http://www.redbooks.ibm.com/redbooks/pdfs/sg246786.pdf.
[17]
Jain, P., Devadas, S., and Rudolph, L. 2001. Controlling cache pollution in prefetching with software-assisted cache replacement. Tech. Rep. CSG-462. Massachusetts Institute of Technology.
[18]
Joseph, D. and Grunwald, D. 1999. Prefetching using Markov predictors. IEEE Trans. Comput. 48, 2, 121--133.
[19]
Kallahalla, M. and Varman, P. J. 2002. Pc-Opt: Optimal offline prefetching and caching for parallel I/O systems. IEEE Trans. Comput. 51, 11, 1333--1344.
[20]
Kimbrel, T. and Karlin, A. R. 1996. Near-Optimal parallel prefetching and caching. In Proceedings of the (FOCS), IEEE Symposium on Foundations of Computer Science. 540--549.
[21]
Kimbrel, T., Tomkins, A., Patterson, R. H., Bershad, B., Cao, P., Felten, E., Gibson, G., Karlin, A. R., and Li, K. 1996. A trace-driven comparison of algorithms for parallel prefetching and caching. In Proceedings of the Symposium on Operating Systems Design and Implementation, USENIX Association, 19--34.
[22]
Kotz, D. and Ellis, C. S. 1991. Practical prefetching techniques for parallel file systems. In Proceedings of the 1st International Conference on Parallel and Distributed Information Systems. IEEE Computer Society, 182--189.
[23]
Kroeger, T. M., Long, D. D. E., and Mogul, J. C. 1997. Exploring the bounds of web latency reduction from caching and prefetching. In USENIX Symposium on Internet Technologies and Systems.
[24]
Lee, R. L., Yew, P. -C., and Lawrie, D. H. 1987. Data prefetching in shared memory multiprocessors. In Proceedings of the International Conference on Parallel Processing (ICPP). 28--31.
[25]
Lei, H. and Duchamp, D. 1997. An analytical approach to file prefetching. In the USENIX Annual Technical Conference, Anaheim, CA.
[26]
Lipasti, M. H., Schmidt, W. J., Kunkel, S. R., and Roediger, R. R. 1995. SPAID: Software prefetching in pointer- and call-intensive environments. In Proceedings of the 28th Annual IEEE/ACM International Symposium on Microarchitecture, 231--236.
[27]
Luk, C. -K. and Mowry, T. C. 1996. Compiler-Based prefetching for recursive data structures. In Architectural Support for Programming Languages and Operating Systems. 222--233.
[28]
McNutt, B. and Johnson, S. 2001. A standard test of I/O cache. In Proceedings of the Computer Measurements Group's International Conference.
[29]
Metcalf, C. 1993. Data prefetching: A cost/performance analysis. Area Exam, MIT, October.
[30]
Mowry, T. and Gupta, A. 1991. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. J. Parallel Distrib. Comput. 12, 2, 87--106.
[31]
Patterson, R. H., Gibson, G. A., Ginting, E., Stodolsky, D., and Zelenka, J. 1995. Informed prefetching and caching. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP). 79--95.
[32]
Reungsang, P., Park, S. K., Jeong, S. -W., Roh, H. -L., and Lee, G. 2001. Reducing cache pollution of prefetching in a small data cache. In Proceedings of the IEEE International Conference on Computer Design (ICCD). 530--533.
[33]
Rogers, A. and Li, K. 1992. Software support for speculative loads. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), SIGPLAN Not. 27, 9, 38--50.
[34]
Roth, A., Moshovos, A., and Sohi, G. S. 1998. Dependence based prefetching for linked data structures. ACM SIG-PLAN Not. 33, 11, 115--126.
[35]
Smith, A. J. 1982. Cache memories. ACM Comput. Surv. 14, 3, 473--530.
[36]
Storage Performance Council. 2006a. SPC Benchmark-1: Specification, version 1.10.1. www.storageperformance.org.
[37]
Storage Performance Council. 2006b. SPC Benchmark-2: Specification, version 1.2. www.storageperformance.org.
[38]
Tcheun, M. K., Yoon, H., and Maeng, S. R. 1997. An adaptive sequential prefetching scheme in shared-memory multiprocessors. In ICPP. 306--313.
[39]
Vitter, J. S. and Krishnan, P. 1996. Optimal prefetching via data compression. J. ACM 43, 5, 771--793.

Cited By

View all
  • (2021)Analysis of NVMe-SSD to passthrough GPU data transfer in virtualized systemsProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454023(172-185)Online publication date: 7-Apr-2021
  • (2019)BPPProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337904(1-10)Online publication date: 5-Aug-2019
  • (2018)Fair bandwidth allocating and strip-aware prefetching for concurrent read streams and striped RAIDs in distributed file systemsThe Journal of Supercomputing10.1007/s11227-018-2396-474:8(3904-3932)Online publication date: 1-Aug-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 3, Issue 3
October 2007
183 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/1288783
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2007
Published in TOS Volume 3, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Adaptive prefetching
  2. asynchronous prefetching
  3. cache pollution
  4. degree of prefetch
  5. fixed prefetching
  6. multistream read
  7. optimal prefetching
  8. prefetch wastage
  9. prestaging
  10. sequential prefetching
  11. synchronous prefetching
  12. trigger distance

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Analysis of NVMe-SSD to passthrough GPU data transfer in virtualized systemsProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454023(172-185)Online publication date: 7-Apr-2021
  • (2019)BPPProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337904(1-10)Online publication date: 5-Aug-2019
  • (2018)Fair bandwidth allocating and strip-aware prefetching for concurrent read streams and striped RAIDs in distributed file systemsThe Journal of Supercomputing10.1007/s11227-018-2396-474:8(3904-3932)Online publication date: 1-Aug-2018
  • (2018)APSThe Journal of Supercomputing10.1007/s11227-018-2333-674:6(2870-2902)Online publication date: 1-Jun-2018
  • (2017)MithrilProceedings of the 2017 Symposium on Cloud Computing10.1145/3127479.3131210(66-79)Online publication date: 24-Sep-2017
  • (2017)An Efficient Hardware Prefetcher Exploiting the Prefetch Potential of Long-Stride Access Pattern on Virtual Address2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC)10.1109/ISPA/IUCC.2017.00017(48-57)Online publication date: Dec-2017
  • (2016)BASSProceedings of the Seventh ACM Symposium on Cloud Computing10.1145/2987550.2987557(169-181)Online publication date: 5-Oct-2016
  • (2014)Automated Control of Aggressive Prefetching for HTTP Streaming Video ServersProceedings of International Conference on Systems and Storage10.1145/2611354.2611371(1-11)Online publication date: 30-Jun-2014
  • (2013)An Efficient Schema for Cloud Systems Based on SSD Cache TechnologyMathematical Problems in Engineering10.1155/2013/1097812013(1-9)Online publication date: 2013
  • (2013)Adaptive Prefetching Scheme for Storage System in Multi-Application EnvironmentIEEE Transactions on Magnetics10.1109/TMAG.2013.225215849:6(2762-2767)Online publication date: Jun-2013
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media