Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3358807.3358841guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Cognitive SSD: a deep learning engine for in-storage data retrieval

Published: 10 July 2019 Publication History

Abstract

Data analysis and retrieval is a widely-used component in existing artificial intelligence systems. However, each request has to go through each layer across the I/O stack, which moves tremendous irrelevant data between secondary storage, DRAM, and the on-chip cache. This leads to high response latency and rising energy consumption. To address this issue, we propose Cognitive SSD, an energy-efficient engine for deep learning based unstructured data retrieval. In Cognitive SSD, a flash-accessing accelerator named DLG-x is placed by the side of flash memory to achieve near-data deep learning and graph search. Such functions of in-SSD deep learning and graph search are exposed to the users as library APIs via NVMe command extension. Experimental results on the FPGA-based prototype reveal that the proposed Cognitive SSD reduces latency by 69.9% on average in comparison with CPU based solutions on conventional SSDs, and it reduces the overall system power consumption by up to 34.4% and 63.0% respectively when compared to CPU and GPU based solutions that deliver comparable performance.

References

[1]
ab - Apache HTTP server benchmarking tool - Apache HTTP Server Version 2.4. http://httpd.apache.org/docs/2.4/programs/ab.html.
[2]
The Cognitive SSD. https://github.com/Cognitive-SSD.
[3]
The Cognitive SSD Platform. http://cognitivessd.vicp.io:10110/.
[4]
Flexible I/O tester. https://fio.readthedocs.io/en/latest/fio_doc.html#moral-license.
[5]
NGINX. https://www.nginx.com/.
[6]
Nvm express. https://nvmexpress.org/.
[7]
The OpenSSD Project. http://openssd.io.
[8]
Vivado. https://www.xilinx.com/support/download.html.
[9]
Learning multiple layers of features from tiny images. Technical report, 2009. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.222.9220&rep=rep1&type=pdf.
[10]
The biggest data challenges that you might not even know you have, May 2016. https://www.ibm.com/blogs/watson/2016/05/biggest-data-challenges-might-not-even-know/.
[11]
Micron nand flash. page 239, 2017. https://www.micron.com/products/nand-flash.
[12]
ZC706 Evaluation Board for the Zynq-7000 XC7z045 SoC User Guide (UG954). page 115, 2018. https://www.xilinx.com/support/documentation/boards_and_kits/zc706/ug954-zc706-eval-board-xc7z045-ap-soc.pdf.
[13]
David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. Fawn: A fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP '09, pages 1-14, New York, NY, USA, 2009. ACM.
[14]
R. Balasubramonian, J. Chang, T. Manning, J. H. Moreno, R. Murphy, R. Nair, and S. Swanson. Near-data processing: Insights from a micro-46 workshop. IEEE Micro, 34(4):36-42, July 2014. https://ieeexplore.ieee.org/document/6871738.
[15]
Matias Bjørling, Javier González, and Philippe Bonnet. Lightnvm: The linux open-channel SSD subsystem. In 15th USENIX Conference on File and Storage Technologies (FAST'17), pages 359-374, 2017. https://www.usenix.org/conference/fast17/ technical-sessions/presentation/bjorling.
[16]
S. Boboila, Y. Kim, S. S. Vazhkudai, P. Desnoyers, and G. M. Shipman. Active flash: Out-of-core data analytics on flash storage. In 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), pages 1-12, April 2012. https://ieeexplore.ieee.org/document/6232366.
[17]
Deng Cai. A revisit of hashing algorithms for approximate nearest neighbor search. CoRR, abs/1612.07545, 2016. http://arxiv.org/abs/1612.07545.
[18]
Adrian M. Caulfield, Arup De, Joel Coburn, Todor I. Mollow, Rajesh K. Gupta, and Steven Swanson. Moneta: A high-performance storage array architecture for nextgeneration, non-volatile memories. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '43, pages 385-395, Washington, DC, USA, 2010. IEEE Computer Society.
[19]
Intel IT Center. Big data 101: Unstructured data analytics. page 4. https://www.intel.com/content/www/us/en/big-data/unstructured-data-analytics-paper.html.
[20]
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 269-284, New York, NY, USA, 2014. ACM.
[21]
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 43rd International Symposium on Computer Architecture, ISCA '16, pages 367-379, Piscataway, NJ, USA, 2016. IEEE Press.
[22]
Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, and Gregory R. Ganger. Active disk meets flash: A case for intelligent ssds. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS '13, pages 91-102, New York, NY, USA, 2013. ACM.
[23]
Hyeokjun Choe, Seil Lee, Seongsik Park, Sei Joon Kim, Eui-Young Chung, and Sungroh Yoon. Near-data processing for machine learning. CoRR, abs/1610.02273, 2016. http://arxiv.org/abs/1610.02273.
[24]
Arup De, Maya Gokhale, Rajesh Gupta, and Steven Swanson. Minerva: Accelerating data analysis in next-generation ssds. In Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM '13, pages 9-16, Washington, DC, USA, 2013. IEEE Computer Society.
[25]
Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. Query processing on smart ssds: Opportunities and challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 1221-1230, New York,NY, USA, 2013. ACM.
[26]
Cong Fu and Deng Cai. EFANNA : An extremely fast approximate nearest neighbor search algorithm based on knn graph. CoRR, abs/1609.07228, 2016. http://arxiv.org/abs/1609.07228.
[27]
Cong Fu, Changxu Wang, and Deng Cai. Fast approximate nearest neighbor search with navigating spreadingout graphs. CoRR, abs/1707.00143, 2017. http://arxiv.org/abs/1707.00143.
[28]
Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 35(12):2916-2929, December 2013.
[29]
Gregory Griffin, Alex Holub, and Pietro Perona. Caltech-256 Object Category Dataset, March 2007. http://resolver.caltech.edu/CaltechAUTHORS:CNS-TR-2007-001.
[30]
Jaeseung Ha. crow: Crow is very fast and easy to use C++ micro web framework, June 2018. https://github.com/ipkn/crow.
[31]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22Nd ACM International Conference on Multimedia, MM '14, pages 675-678, New York, NY, USA, 2014. ACM.
[32]
Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, and Arvind. Bluedbm: An appliance for big data analytics. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture, ISCA '15, pages 1-13, New York, NY, USA, 2015. ACM.
[33]
Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, and Arvind. Grafboost: Using accelerated flash storage for external graph analytics. In Proceedings of the 45th Annual International Symposium on Computer Architecture, ISCA '18, pages 411-424, Piscataway, NJ, USA, 2018. IEEE Press.
[34]
Yangwook Kang, Yang-Suk Kee, Ethan L. Miller, and Chanik Park. Enabling cost-effective data processing with smart ssd. 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST), pages 1- 12, 2013. ftp://ftp.cse.ucsc.edu/pub/darrell/kang-msst13.pdf.
[35]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84-90, May 2017.
[36]
Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Wenjie Zhang, and Xuemin Lin. Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement (v1.0). CoRR, abs/1610.02455, 2016. http://arxiv.org/abs/1610.02455.
[37]
Wu-Jun Li, Sheng Wang, and Wang-Cheng Kang. Feature learning based deep supervised hashing with pairwise labels. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI' 16, pages 1711-1717. AAAI Press, 2016. http://dl.acm.org/citation.cfm?id=3060832.3060860.
[38]
K. Lin, H. Yang, J. Hsiao, and C. Chen. Deep learning of binary hash codes for fast image retrieval. In 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 27-35, June 2015. http://ieeexplore.ieee.org/document/7301269/.
[39]
V. E. Liong, Jiwen Lu, Gang Wang, P. Moulin, and Jie Zhou. Deep hashing for compact binary codes learning. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2475- 2483, June 2015. http://ieeexplore.ieee.org/document/7298862/.
[40]
H. Liu, R. Wang, S. Shan, and X. Chen. Deep supervised hashing for fast image retrieval. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2064-2072, June 2016. http://ieeexplore.ieee.org/document/7780596/.
[41]
Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. Octopus: an rdma-enabled distributed persistent memory file system. In 2017 USENIX Annual Technical Conference (USENIX ATC'17), pages 773- 785, 2017. https://www.usenix.org/conference/atc17/technical-sessions/presentation/lu.
[42]
Youyou Lu, Jiwu Shu, and Weimin Zheng. Extending the lifetime of flash-based storage through reducing write amplification from file systems. In Presented as part of the 11th USENIX Conference on File and Storage Technologies (FAST'13), pages 257-270, 2013. https://www.usenix.org/conference/fast13/technical-sessions/presentation/lu_youyou.
[43]
Jian Ouyang, Shiding Lin, Zhenyu Hou, Peng Wang, Yong Wang, and Guangyu Sun. Active ssd design for energy-efficiency improvement of web-scale data analysis. In Proceedings of the 2013 International Symposium on Low Power Electronics and Design, ISLPED '13, pages 286-291, Piscataway, NJ, USA, 2013. IEEE Press. http://dl.acm.org/citation.cfm?id=2648668.2648739.
[44]
Jian Ouyang, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, and Yuanzheng Wang. Sdf: Softwaredefined flash for web-scale internet storage systems. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 471-484, 2014.
[45]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision, 115(3):211-252, December 2015.
[46]
Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. Willow: A userprogrammable ssd. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 67-80, Berkeley, CA, USA, 2014. USENIX Association. http://dl.acm.org/citation.cfm?id=2685048.2685055.
[47]
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014. http://arxiv.org/abs/1409.1556.
[48]
Yongseok Son, Nae Young Song, Hyuck Han, Hyeonsang Eom, and Heon Young Yeom. A user-level file system for fast storage devices. In Proceedings of the 2014 International Conference on Cloud and Autonomic Computing, ICCAC '14, pages 258-264, Washington, DC, USA, 2014. IEEE Computer Society.
[49]
Lili Song, Ying Wang, Yinhe Han, Xin Zhao, Bosheng Liu, and Xiaowei Li. C-brain: A deep learning accelerator that tames the diversity of cnns through adaptive data-level parallelization. In Proceedings of the 53rd Annual Design Automation Conference, DAC '16, pages 123:1-123:6, New York, NY, USA, 2016. ACM.
[50]
Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter J. Desnoyers, and Yan Solihin. Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In Proceedings of the 11th USENIX Conference on File and Storage Technologies, FAST'13, pages 119-132, Berkeley, CA, USA, 2013. USENIX Association. http://dl.acm.org/citation.cfm?id=2591272.2591286.
[51]
Devesh Tiwari, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Simona Boboila, and Peter J. Desnoyers. Reducing data movement costs using energy efficient, active computation on ssd. In Proceedings of the 2012 USENIX Conference on Power-Aware Computing and Systems, HotPower'12, pages 4-4, Berkeley, CA, USA, 2012. USENIX Association. http://dl.acm.org/citation.cfm?id=2387869.2387873.
[52]
Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, and Steven Swanson. Morpheus: Creating application objects efficiently for heterogeneous computing. SIGARCH Comput. Archit. News, 44(3):53-65, June 2016.
[53]
Jianguo Wang, Dongchul Park, Yang-Suk Kee, Yannis Papakonstantinou, and Steven Swanson. Ssd in-storage computing for list intersection. In Proceedings of the 12th International Workshop on Data Management on New Hardware, DaMoN '16, pages 4:1-4:7, 2016.
[54]
Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. Hashing for Similarity Search: A Survey. arXiv:1408.2927 [cs], August 2014. http://arxiv.org/abs/1408.2927.
[55]
Ying Wang, Jie Xu, Yinhe Han, Huawei Li, and Xiaowei Li. Deepburning: Automatic generation of fpga-based learning accelerators for the neural network family. In USENIX Association 2019 USENIX Annual Technical Conference 409 Proceedings of the 53rd Annual Design Automation Conference, DAC '16, pages 110:1-110:6, New York, NY, USA, 2016. ACM.
[56]
Louis Woods, Zsolt István, and Gustavo Alonso. Ibex: An intelligent storage engine with support for advanced sql offloading. Proc. VLDB Endow., 7(11):963-974, July 2014.
[57]
Jianxiong Xiao, Krista A. Ehinger, James Hays, Antonio Torralba, and Aude Oliva. Sun database: Exploring a large collection of scene categories. Int. J. Comput. Vision, 119(1):3-22, August 2016.
[58]
Huei-Fang Yang, Kevin Lin, and Chu-Song Chen. Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell., 40(2):437-451, February 2018.
[59]
Jiacheng Zhang, Jiwu Shu, and Youyou Lu. Parafs: A log-structured file system to exploit the internal parallelism of flash devices. In 2016 USENIX Annual Technical Conference (USENIX ATC'16), pages 87-100, 2016. https://www.usenix.org/conference/atc16/technical-sessions/presentation/zhang.
[60]
Jie Zhang, Miryeong Kwon, Donghyun Gouk, Sungjoon Koh, Changlim Lee, Mohammad Alian, Myoungjun Chun, Mahmut Taylan Kandemir, Nam Sung Kim, Jihong Kim, and Myoungsoo Jung. Flashshare: Punching through server storage stack from kernel to firmware for ultra-low latency ssds. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI'18, pages 477-492, Berkeley, CA, USA, 2018. USENIX Association. http://dl.acm.org/citation.cfm?id=3291168.3291203.
[61]
Liang Zheng, Yi Yang, and Qi Tian. SIFT meets CNN: A decade survey of instance retrieval. CoRR, abs/1608.01807, 2016. http://arxiv.org/abs/1608.01807.

Cited By

View all
  • (2020)NetTLPProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388253(141-156)Online publication date: 25-Feb-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
USENIX ATC '19: Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference
July 2019
1076 pages
ISBN:9781939133038

Sponsors

  • VMware
  • Nutanix: Nutanix
  • NSF
  • Facebook: Facebook
  • ORACLE: ORACLE

Publisher

USENIX Association

United States

Publication History

Published: 10 July 2019

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)NetTLPProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388253(141-156)Online publication date: 25-Feb-2020

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media