Abstract
Hot data identification is crucial for many applications though few investigations have examined the subject. All existing studies focus almost exclusively on frequency. However, effectively identifying hot data requires equally considering recency and frequency. Moreover, previous studies make hot data decisions at the data block level. Such a fine-grained decision fits particularly well for flash-based storage because its random access achieves performance comparable with its sequential access. However, hard disk drives (HDDs) have a significant performance disparity between sequential and random access. Therefore, unlike flash-based storage, exploiting asymmetric HDD access performance requires making a coarse-grained decision. This paper proposes a novel hot data identification scheme adopting multiple bloom filters to efficiently characterize recency as well as frequency. Consequently, it not only consumes 50% less memory and up to 58% less computational overhead, but also lowers false identification rates up to 65% compared with a state-of-the-art scheme. Moreover, we apply the scheme to a next generation HDD technology, i.e., Shingled Magnetic Recording (SMR), to verify its effectiveness. For this, we design a new hot data identification based SMR drive with a coarse-grained decision. The experiments demonstrate the importance and benefits of accurate hot data identification, thereby improving the proposed SMR drive performance by up to 42%.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Wang J G, Lo E, Yiu M L, Tong J C, Wang G, Liu X G. The impact of solid state drive on search engine cache management. In Proc. the 36th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, July28-August 1, 2013, pp.693-702.
Wang J G, Park D, Kee Y S, Papakonstantinou Y, Swanson S. SSD in-storage computing for list intersection. In Proc. the 12th Int. Workshop on Data Management on New Hardware, June 26-July1, 2016, Article No. 4.
Park D, Debnath B, Du D. CFTL: A convertible flash translation layer adaptive to data access patterns. In Proc. the ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems, June 2010, pp.365-366.
Park D, Wang J G, Kee Y S. In-storage computing for Hadoop MapReduce framework: Challenges and possibilities. IEEE Trans. Computers, 2016 PP(99). https://doi.org/10.1109/TC.2016.2595566
Park D, Debnath B, Du D H C. A dynamic switching flash translation layer based on page-level mapping. IEICE Trans. Information and Systems, 2016, E99-D(6): 1502-1511
Gray J. Tape is dead disk is tape flash is disk RAM locality is king. December 2006. http://signallake.com/innovation/Flash_is_Good.pdf, Dec. 2017.
Martin J. Is tiered storage obsolete? Yes and no! November 2013. https://www.computerworld.com/article/2474599/data-center/is-tiered-storage-obsolete–yes-andno-.html, January 2018.
Tagawa I, Williams M. High density data-storage using shingled-write. In Proc. the IEEE Int. Magnetics Conf., March 2009.
Kasiraj P, New R M H, De Souza J C, Williams M L. System and method for writing data to dedicated bands of a hard disk drive: US 7490212. http://www.freepatentsonline.com/7490212.html, Dec. 2017.
Gibson G, Polte M. Directions for shingled-write and two dimensional magnetic recording system architectures: Synergies with solid-state disks. Carnegie Mellon University Parallel Data Lab Technical Report, CMU-PDL-09-104, 2009. http://www.doc88.com/p-1866949816678.html, Dec. 2017.
HGST. HGST delivers world’s first 10TB enterprise HDD for active archive applications. June 2015. http://www.hgst.com/company/media-room/press-releases/HGST-Delivers-Worlds-First-10TB-Enterprise-HDD-for-Active-Archive-Applications, Dec. 2017.
SMR. Seagate, breaking capacity barriers with seagate shingled magnetic recording. Aug. 2013. http://www.seagate.com/tech-insights/breaking-areal-density-barriers-with-seagate-smr-master-ti/, Dec. 2017.
Sanvido M, Bandic Z, Cassuto Y, De Souza J, Guyot C, Harayama T. Distributed field self-test for shingled magnetic recording drives: US 8599507, http://www.freepatentsonline.com/8599507.html, Dec. 2017.
Chang L P, Kuo T W. Efficient management for large-scale FlashMemory storage systems with resource conservation. ACM Trans. Storage, 2005, 1(4): 381-418.
Debnath B, Subramanya S, Du D, Lilja D J. Large Block CLOCK (LB-CLOCK): A write caching algorithm for solid state disks. In Proc. IEEE Int. Symp. Modeling Analysis & Simulation of Computer and Telecommunication Systems, September 2009.
Kim H, Ahn S. BPLRU: A buffer management scheme for improving random writes in flash storage. In Proc. the 6th USENIX Conf. File and Storage Technologies, February 2008, Article No. 16.
Levandoski J J, Larson P Å, Stoica R. Identifying hot and cold data in main-memory databases. In Proc. the 29th IEEE Int. Conf. Data Engineering, April 2013, pp.26-37.
Chang Y H, Hsieh J W, Kuo T W. Endurance enhancement of FlashMemory storage systems: An efficient static wear leveling design. In Proc. the 44th ACM/IEEE Design Automation Conf., June 2007, pp.212-217.
Boboila S, Desnoyers P. Write endurance in flash drives: Measurements and analysis. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010.
Soundararajan G, Prabhakaran V, Balakrishnan M, Wobber T. Extending SSD lifetimes with disk-based write caches. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010.
Nath S, Kansal A. FlashDB: Dynamic self-tuning database for NAND flash. In Proc. the 6th Int. Symp. Information Processing in Sensor Networks, April 2007, pp.410-419.
Sun G Y, Joo Y, Chen Y B, Niu D M, Xie Y, Chen Y R, Li H. A hybrid solid-state storage architecture for the performance, energy consumption and lifetime improvement. In Proc. the 16th Int. Symp. High Performance Computer Architecture, January 2010,
Chang L P. Hybrid solid-state disks: Combining heterogeneous NAND flash in large SSDs. In Proc. the Asia and South Pacific Design Automation Conf., March 2008, pp.428-433.
Lin C I, Park D, He W P, Du D H C. H-SWD: Incorporating hot data identification into shingled write disks. In Proc. the 20th Int. Symp. Modeling Analysis and Simulation of Computer and Telecommunication Systems, August 2012, pp.321-330.
Jones S N, Amer A, Miller E L, Long D D E, Pitchumani R, Strong C R. Classifying data to reduce long term data movement in shingled write disks. In Proc. the 31st Symp. Mass Storage Systems and Technologies, May 2015.
Chiang M L, Lee P C H, Chang R C. Managing flash memory in personal communication devices. In Proc. IEEE International Symp. Consumer Electronics, December 1997, pp.177-182.
Chang L P, Kuo T W. An adaptive striping architecture for flash memory storage systems of embedded systems. In Proc. IEEE Real-Time and Embedded Technology and Applications Symp., September 2002, pp.187-196.
Hsieh J W, Kuo T W, Chang L P. Efficient identification of hot data for flash memory storage systems. ACM Trans. Storage, 2006, 2(1): 22-40.
Park D, Du D H C. Hot data identification for flash-based storage systems using multiple bloom filters. In Proc. the 27th IEEE Symp. Mass Storage Systems and Technologies, May 2011.
Bloom B H. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 1970, 13(7): 422-426.
Dharmapurikar S, Krishnamurthy P, Taylor D E. Longest prefix matching using bloom filters. IEEE/ACM Trans. Networking, 2006, 14(2): 397-409.
Kryder M H, Gage E C, McDaniel T W, Challener W A, Rottmayer R E, Ju G P, Hsia Y T, Erden M F. Heat assisted magnetic recording. Proceedings of the IEEE, 2008, 96(11): 1810-1835.
Challener W A, Peng C, Itagi A V, Karns D, Peng Y, Yang X, Zhu X, Gokemeijer N J, Hsia Y T, Ju G, Rottmayer R E, Seigler M A, Gage E C. The road to HAMR. In Proc. Asia-Pacific Magnetic Recording Conf., January 2009.
Rottmayer R E, Batra S, Buechel D, Challener W A, Hohlfeld J, Kubota Y, Li L, Lu B, Mihalcea C, Mountfield K, Pelhos K, Peng C, Rausch T, Seigler M A, Weller D, Yang X M. Heat-assisted magnetic recording. IEEE Trans. Magnetics, 2006, 42(10): 2417-2421.
Dobisz E A, Bandic Z Z, Wu T W, Albrecht T. Patterned media: Nanofabrication challenges of future disk drives. Proceedings of the IEEE, 2008, 96(11): 1836-1846.
Kikitsu A, Kamata Y, Sakurai M, Naito K. Recent progress of patterned media. IEEE Trans. Magnetics, 2007, 43(9): 3685-3688.
Zhang S H, Chai K S, Cai K, Chen B J, Qin Z L, Foo S M. Write failure analysis for bit-patterned-media recording and its impact on read channel modeling. IEEE Trans. Magnetics, 2010, 46(6): 1363-1365.
Amer A, Holliday J, Long D D E, Miller E L, Paris J F, Schwarz T. Data management and layout for shingled magnetic recording. IEEE Trans. Magnetics, 2011, 47(10): 3691-3697.
Greaves S, Kanai Y, Muraoka H. Shingled recording for 2-3 Tbit/in2. IEEE Trans. Magnetics, 2009, 45(10): 3823-3829.
Amer A, Long D D E, Miller E L, Paris J F, Schwarz S J T. Design issues for a shingled write disk system. In Proc. the 26th Symp. Mass Storage Systems and Technologies, May 2010.
Cassuto Y, Sanvido M A A, Guyot C, Hall D R, Bandic Z Z. Indirection systems for shingled-recording disk drives. In Proc. the 26th Symp. Mass Storage Systems and Technologies, May 2010.
Feldman T. Host-aware SMR. Nov. 2014. http://open-zfs.org/w/images/2/2a/Host-Aware_SMR-Tim_Feldman.pdf, Dec. 2017.
Feldman T, Gibson G. Shingled magnetic recording: Areal density increase requires new data management. The Magazine of USENIX & SAGE, 2013, 38(3): 22-30.
Aghayev A, Desnoyers P. Skylight: A window on shingled disk operation. In Proc. the 13th USENIX Conf. File and Storage Technologies, February 2015, pp.135-149.
Wu F G, Yang M C, Fan Z Q, Zhang B Q, Ge X Z, Du D. Evaluating host aware SMR drives. In Proc. the 8th USENIX Workshop on Hot Topics in Storage and File Systems, June 2016, pp.31-35.
Narayanan D, Donnelly A, Rowstron A. Write off-loading: Practical power management for enterprise storage. In Proc. the 6th USENIX Conf. File and Storage Technologies, February 2008, pp.253-267.
Russinovich M. DiskMon for Windows v2.01, 2006. https://technet.microsoft.com/enus/sysinternals/diskmon.aspx, Jan. 2018.
Acknowledgment
We would like to thank David Schwaderer (Samsung Semiconductor Inc., USA) for his valuable comments and proofreading.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(PDF 328 kb)
Rights and permissions
About this article
Cite this article
Park, D., He, W. & Du, D.H.C. Hot Data Identification with Multiple Bloom Filters: Block-Level Decision vs I/O Request-Level Decision. J. Comput. Sci. Technol. 33, 79–97 (2018). https://doi.org/10.1007/s11390-018-1809-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-018-1809-4