Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3173162.3173167acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Liquid Silicon-Monona: A Reconfigurable Memory-Oriented Computing Fabric with Scalable Multi-Context Support

Published: 19 March 2018 Publication History

Abstract

With the recent trend of promoting Field-Programmable Gate Arrays (FPGAs) to first-class citizens in accelerating compute-intensive applications in networking, cloud services and artificial intelligence, FPGAs face two major challenges in sustaining competitive advantages in performance and energy efficiency for diverse cloud workloads: 1) limited configuration capability for supporting light-weight computations/on-chip data storage to accelerate emerging search-/data-intensive applications. 2) lack of architectural support to hide reconfiguration overhead for assisting virtualization in a cloud computing environment. In this paper, we propose a reconfigurable memory-oriented computing fabric, namely Liquid Silicon-Monona (L-Si), enabled by emerging nonvolatile memory technology i.e. RRAM, to address these two challenges. Specifically, L-Si addresses the first challenge by virtue of a new architecture comprising a 2D array of physically identical but functionally-configurable building blocks. It, for the first time, extends the configuration capabilities of existing FPGAs from computation to the whole spectrum ranging from computation to data storage. It allows users to better customize hardware by flexibly partitioning hardware resources between computation and memory, greatly benefiting emerging search- and data-intensive applications. To address the second challenge, L-Si provides scalable multi-context architectural support to minimize reconfiguration overhead for assisting virtualization. In addition, we provide compiler support to facilitate the programming of applications written in high-level programming languages (e.g. OpenCL) and frameworks (e.g. TensorFlow, MapReduce) while fully exploiting the unique architectural capability of L-Si. Our evaluation results show L-Si achieves 99.6% area reduction, 1.43× throughput improvement and 94.0% power reduction on search-intensive benchmarks, as compared with the FPGA baseline. For neural network benchmarks, on average, L-Si achieves 52.3× speedup, 113.9× energy reduction and 81% area reduction over the FPGA baseline. In addition, the multi-context architecture of L-Si reduces the context switching time to - 10ns, compared with an off-the-shelf FPGA (∼100ms), greatly facilitating virtualization.

References

[1]
Michael Adler, Kermin E Fleming, Angshuman Parashar, Michael Pellauer, and Joel Emer. 2011. Leap scratchpads: automatic memory and cache management for reconfigurable logic Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays. ACM, 25--28.
[2]
Jasmin Ajanovic. 2008. PCI Express*(PCIe*) 3.0 Accelerator Features. Intel Corporation (2008), 10.
[3]
Amazon. 2016. Amazon EC2 F1 Instances. https://aws.amazon.com/ec2/instance-types/f1/. (2016).
[4]
Mikhail Asiatici, Nithin George, Kizheppatt Vipin, Suhaib A Fahmy, and Paolo Ienne. 2017. Virtualized execution runtime for FPGA accelerators in the cloud. Ieee Access Vol. 5 (2017), 1900--1910.
[5]
ASU. {n. d.}. Predictive Technology Model (PTM). http://ptm.asu.edu/. (. {n. d.}).
[6]
V Baena-Lecuyer, MA Aguirre, A Torralba, Leopoldo Garc'ıa Franquelo, and Julio Faura. 1999. Decoder-driven switching matrices in multicontext fpgas: area reduction and their effect on routability. In Circuits and Systems, 1999. ISCAS'99. Proceedings of the 1999 IEEE International Symposium on, Vol. Vol. 1. IEEE, 463--466.
[7]
Mahdi Nazm Bojnordi and Engin Ipek. 2016. Memristive boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning. In High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on. IEEE, 1--13.
[8]
Jeremy Buhler. 2001. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, Vol. 17, 5 (2001), 419--428.
[9]
Stuart Byma, J Gregory Steffan, Hadi Bannazadeh, Alberto Leon Garcia, and Paul Chow. 2014. Fpgas in the cloud: Booting virtualized hardware accelerators with openstack Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on. IEEE, 109--116.
[10]
Adrian M Caulfield, Eric S Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, et al. 2016. A cloud-scale acceleration architecture. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1--13.
[11]
Douglas Chang and Malgorzata Marek-Sadowska. 1999. Partitioning sequential circuits on dynamically reconfigurable FPGAs. IEEE Trans. Comput. Vol. 48, 6 (1999), 565--578.
[12]
Meng-Fan Chang, Chien-Chen Lin, Albert Lee, Chia-Chen Kuo, Geng-Hau Yang, Hsiang-Jen Tsai, Tien-Fu Chen, Shyh-Shyuan Sheu, Pei-Ling Tseng, Heng-Yuan Lee, et al. 2015. 17.5 A 3T1R nonvolatile TCAM using MLC ReRAM with Sub-1ns search time Solid-State Circuits Conference-(ISSCC), 2015 IEEE International. IEEE, 1--3.
[13]
An Chen. 2013. A comprehensive crossbar array model with solutions for line resistance and nonlinear device characteristics. IEEE Transactions on Electron Devices Vol. 60, 4 (2013), 1318--1326.
[14]
Fei Chen, Yi Shan, Yu Zhang, Yu Wang, Hubertus Franke, Xiaotao Chang, and Kun Wang. 2014 c. Enabling FPGAs in the cloud. In Proceedings of the 11th ACM Conference on Computing Frontiers. ACM, 3.
[15]
Hong-Yu Chen, Stefano Brivio, Che-Chia Chang, Jacopo Frascaroli, Tuo-Hung Hou, Boris Hudec, Ming Liu, Hangbing Lv, Gabriel Molas, Joon Sohn, et al. 2017. Resistive random access memory (RRAM) technology: From material, device, selector, 3D integration to bottom-up fabrication. Journal of Electroceramics (2017), 1--18.
[16]
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014 a. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning ACM Sigplan Notices, Vol. Vol. 49. ACM, 269--284.
[17]
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014 b. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622.
[18]
Yi-Chung Chen, Wenhua Wang, Hai Li, and Wei Zhang. 2012. Non-volatile 3D stacking RRAM-based FPGA. In Field Programmable Logic and Applications (FPL), 2012 22nd International Conference on. IEEE, 367--372.
[19]
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 367--379.
[20]
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 27--39.
[21]
Eric S Chung, James C Hoe, and Ken Mai. 2011. CoRAM: an in-fabric memory architecture for FPGA-based computing Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays. ACM, 97--106.
[22]
Jason Cong and Bingjun Xiao. 2011. mrFPGA: A novel FPGA architecture with memristor-based reconfiguration Proceedings of the 2011 IEEE/ACM International Symposium on Nanoscale Architectures. IEEE Computer Society, 1--8.
[23]
Jason Cong and Bingjun Xiao. 2014. FPGA-RPI: A novel FPGA architecture with RRAM-based programmable interconnects. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 22, 4 (2014), 864--877.
[24]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations Advances in Neural Information Processing Systems. 3123--3131.
[25]
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to
[26]
1 or-1. arXiv preprint arXiv:1602.02830 (2016).
[27]
André DeHon. 1996. DPGA utilization and application. In Proceedings of the 1996 ACM fourth international symposium on Field-programmable gate arrays. ACM, 115--121.
[28]
André M. DeHon. 2013. Location, Location, Location: The Role of Spatial Locality in Asymptotic Energy Minimization FPGA. ACM, New York, NY, USA, 137--146.
[29]
Paul Dlugosch, Dave Brown, Paul Glendenning, Michael Leventhal, and Harold Noyes. 2014. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Transactions on Parallel and Distributed Systems, Vol. 25, 12 (2014), 3088--3098.
[30]
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor ACM SIGARCH Computer Architecture News, Vol. Vol. 43. ACM, 92--104.
[31]
James P Durbano and Fernando E Ortiz. 2004. FPGA-based acceleration of the 3D finite-difference time-domain method Field-Programmable Custom Computing Machines, 2004. FCCM 2004. 12th Annual IEEE Symposium on. IEEE, 156--163.
[32]
Steven K Esser, Paul A Merolla, John V Arthur, Andrew S Cassidy, Rathinakumar Appuswamy, Alexander Andreopoulos, David J Berg, Jeffrey L McKinstry, Timothy Melano, Davis R Barch, et al. 2016. Convolutional networks for fast, energy-efficient neuromorphic computing. Proceedings of the National Academy of Sciences (2016), 201604850.
[33]
Suhaib A Fahmy, Kizheppatt Vipin, and Shanker Shreejith. 2015. Virtualized FPGA accelerators for efficient cloud computing Cloud Computing Technology and Science (CloudCom), 2015 IEEE 7th International Conference on. IEEE, 430--435.
[34]
Amin Farmahini-Farahani, Jung Ho Ahn, Katherine Morrow, and Nam Sung Kim. 2015. Drama: An architecture for accelerated processing near memory. IEEE Computer Architecture Letters Vol. 14, 1 (2015), 26--29.
[35]
Pierre-Emmanuel Gaillardon, Davide Sacchetto, Shashikanth Bobba, Yusuf Leblebici, and Giovanni De Micheli. 2012. GMS: Generic memristive structure for non-volatile FPGAs VLSI and System-on-Chip, 2012 (VLSI-SoC), IEEE/IFIP 20th International Conference on. IEEE, 94--98.
[36]
Yijin Guan, Hao Liang, Ningyi Xu, Wenqiang Wang, Shaoshuai Shi, Xi Chen, Guangyu Sun, Wei Zhang, and Jason Cong. 2017. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates. In Field-Programmable Custom Computing Machines (FCCM), 2017 IEEE 25th Annual International Symposium on. IEEE, 152--159.
[37]
Qing Guo, Xiaochen Guo, Yuxin Bai, and Engin Ipek. 2011. A resistive TCAM accelerator for data-intensive computing Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 339--350.
[38]
Qing Guo, Xiaochen Guo, Ravi Patel, Engin Ipek, and Eby G Friedman. 2013. Ac-dimm: associative computing with stt-mram. In ACM SIGARCH Computer Architecture News, Vol. Vol. 41. ACM, 189--200.
[39]
Robert J Halstead, Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Sameh Asaad, and Balakrishna Iyer. 2013. Accelerating join operation for relational databases with FPGAs Field-Programmable Custom Computing Machines (FCCM), 2013 IEEE 21st Annual International Symposium on. IEEE, 17--20.
[40]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 243--254.
[41]
Kejie Huang, Yajun Ha, Rong Zhao, Akash Kumar, and Yong Lian. 2014. A low active leakage and high reliability phase change memory (PCM) based non-volatile FPGA storage element. IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 61, 9 (2014), 2605--2613.
[42]
Intel. {n. d.} b. Intel QuickPath Interconnect. http://www.intel.com/content/www/us/en/io/quickpath-technology/quickpath-technology-general.html. (. {n. d.}).
[43]
Intel. 2017. Intel Collaborates with Alibaba Cloud to Help Customers Accelerate Business Applications. (2017).
[44]
W. Jiang. 2013. Scalable Ternary Content Addressable Memory implementation using FPGAs ANCS. 71--82.
[45]
Sung Hyun Jo, Tanmay Kumar, Sundar Narayanan, Wei D Lu, and Hagop Nazarian. 2014. 3D-stackable crossbar resistive memory based on field assisted superlinear threshold (FAST) selector. In Electron Devices Meeting (IEDM), 2014 IEEE International. IEEE, 6--7.
[46]
Akifumi Kawahara, Ryotaro Azuma, Yuuichirou Ikeda, Ken Kawai, Yoshikazu Katoh, Yukio Hayakawa, Kiyotaka Tsuji, Shinichi Yoneda, Atsushi Himeno, Kazuhiko Shimakawa, et al. 2013. An 8 Mb multi-layered cross-point ReRAM macro with 443 MB/s write throughput. IEEE Journal of Solid-State Circuits Vol. 48, 1 (2013), 178--185.
[47]
Daisuke Kawakami, Yuichro Shibata, and Hideharu Amano. 2001. A prototype chip of multicontext FPGA with DRAM for Virtual Hardware Proceedings of the 2001 Asia and South Pacific Design Automation Conference. ACM, 17--18.
[48]
Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 380--392.
[49]
Minje Kim and Paris Smaragdis. 2016. Bitwise neural networks. arXiv preprint arXiv:1601.06071 (2016).
[50]
Oliver Knodel and Rainer G Spallek. 2015. RC3E: provision and management of reconfigurable hardware accelerators in a cloud environment. arXiv preprint arXiv:1508.06843 (2015).
[51]
Brian Kulis and Kristen Grauman. 2009. Kernelized locality-sensitive hashing for scalable image search Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2130--2137.
[52]
Yahya Lakys, Weisheng Zhao, Jacques-Olivier Klein, and Claude Chappert. 2012. MRAM crossbar based configurable logic block. In Circuits and Systems (ISCAS), 2012 IEEE International Symposium on. IEEE, 2945--2948.
[53]
Myoung-Jae Lee et al. 2011. A fast, high-endurance and scalable non-volatile memory device made from asymmetric Ta2O5-x/TaO2-x bilayer structures. Nature materials, Vol. 10, 8 (2011), 625--630.
[54]
Jing Li, Robert K Montoye, Masatoshi Ishii, and Leland Chang. 2014. 1 Mb 0.41 μm^2$ 2T-2R cell nonvolatile TCAM with two-bit encoding and clocked self-referenced sensing. IEEE Journal of Solid-State Circuits Vol. 49, 4 (2014), 896--907.
[55]
Zhiyuan Li, Katherine Compton, and Scott Hauck. 2000. Configuration caching management techniques for reconfigurable computing Field-Programmable Custom Computing Machines, 2000 IEEE Symposium on. IEEE, 22--36.
[56]
Young Yang Liauw, Zhiping Zhang, Wanki Kim, Abbas El Gamal, and S Simon Wong. 2012. Nonvolatile 3D-FPGA with monolithically stacked RRAM-based configuration memory Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International. IEEE, 406--408.
[57]
Chien-Chen Lin, Jui-Yu Hung, Wen-Zhang Lin, Chieh-Pu Lo, Yen-Ning Chiang, Hsiang-Jen Tsai, Geng-Hau Yang, Ya-Chin King, Chrong Jung Lin, Tien-Fu Chen, et al. 2016. 7.4 A 256b-wordlength ReRAM-based TCAM with 1ns search-time and 14× improvement in efficiency-density product using 2.5 T1R cell Solid-State Circuits Conference (ISSCC), 2016 IEEE International. IEEE, 136--137.
[58]
Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. Pudiannao: A polyvalent machine learning accelerator ACM SIGARCH Computer Architecture News, Vol. Vol. 43. ACM, 369--381.
[59]
Jason Luu et al. 2014. VTR 7.0: Next generation architecture and CAD system for FPGAs. TRETS, Vol. 7, 2 (2014), 6.
[60]
Jason Luu, Ian Kuon, Peter Jamieson, Ted Campbell, Andy Ye, Wei Mark Fang, Kenneth Kent, and Jonathan Rose. 2011. VPR 5.0: FPGA cad and architecture exploration tools with single-driver routing, heterogeneity and process scaling. ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol. 4, 4 (2011), 32.
[61]
Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2007. Multi-probe LSH: efficient indexing for high-dimensional similarity search Proceedings of the 33rd international conference on Very large data bases. VLDB Endowment, 950--961.
[62]
Rafael Maestre, Milagros Fernandez, Fadi J Kurdahi, Nader Bagherzadeh, and Hartej Singh. 2000. Configuration management in multi-context reconfigurable systems for simultaneous performance and power optimizations. Proceedings of the 13th international symposium on System synthesis. IEEE Computer Society, 107--113.
[63]
Rafael Maestre, Fadi J Kurdahi, Milagros Fernández, Roman Hermida, Nader Bagherzadeh, and Hartej Singh. 2001. A framework for reconfigurable computing: task scheduling and context management. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 9, 6 (2001), 858--873.
[64]
Alan Mishchenko, Sungmin Cho, Satrajit Chatterjee, and Robert Brayton. 2007. Combinational and sequential mapping with priority cuts Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design. IEEE Press, 354--361.
[65]
Raphael Njuguna. 2008. A survey of FPGA benchmarks. Project Report, November Vol. 24 (2008).
[66]
Eriko Nurvitadhi, David Sheffield, Jaewoong Sim, Asit Mishra, Ganesh Venkatesh, and Debbie Marr. 2016. Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC. Proc. ICFPT (2016).
[67]
Corey B Olson, Maria Kim, Cooper Clauson, Boris Kogon, Carl Ebeling, Scott Hauck, and Walter L Ruzzo. 2012. Hardware acceleration of short read mapping. In Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on. IEEE, 161--168.
[68]
Jian Ouyang, Shiding Lin, Wei Qi, Yong Wang, Bo Yu, and Song Jiang. 2014. SDA: Software-defined accelerator for large-scale DNN systems Hot Chips 26 Symposium (HCS), 2014 IEEE. IEEE, 1--23.
[69]
Kostas Pagiamtzis and Ali Sheikholeslami. 2006. Content-addressable memory (CAM) circuits and architectures: A tutorial and survey. IEEE Journal of Solid-State Circuits Vol. 41, 3 (2006), 712--727.
[70]
Andrew Putnam, Adrian M Caulfield, Eric S Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, et al. 2014. A reconfigurable fabric for accelerating large-scale datacenter services Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on. IEEE, 13--24.
[71]
Yang Qu, Juha-Pekka Soininen, and Jari Nurmi. 2007. Static scheduling techniques for dependent tasks on dynamically reconfigurable devices. Journal of Systems Architecture Vol. 53, 11 (2007), 861--876.
[72]
Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. 2007. Evaluating mapreduce for multi-core and multiprocessor systems High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on. Ieee, 13--24.
[73]
Deepak Ravichandran, Patrick Pantel, and Eduard Hovy. 2005. Randomized algorithms and nlp: using locality sensitive hash function for high speed noun clustering. In Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, 622--629.
[74]
Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 267--278.
[75]
Ao Ren, Ji Li, Zhe Li, Caiwen Ding, Xuehai Qian, Qinru Qiu, Bo Yuan, and Yanzhi Wang. 2016. Sc-dcnn: highly-scalable deep convolutional neural network using stochastic computing. arXiv preprint arXiv:1611.05939 (2016).
[76]
Javier Resano, Daniel Mozos, and Francky Catthoor. 2005. A hybrid prefetch scheduling heuristic to minimize at run-time the reconfiguration overhead of dynamically reconfigurable hardware Proceedings of the conference on Design, Automation and Test in Europe-Volume 1. IEEE Computer Society, 106--111.
[77]
Javier Resano, Daniel Mozos, Diederik Verkest, Francky Catthoor, and Serge Vernalde. 2004. Specific scheduling support to minimize the reconfiguration overhead of dynamically reconfigurable hardware. In Design Automation Conference, 2004. Proceedings. 41st. IEEE, 119--124.
[78]
Herman Schmit, David Whelihan, Andrew Tsai, Matthew Moe, Benjamin Levine, and R Reed Taylor. 2002. PipeRench: A virtualized programmable datapath in 0.18 micron technology Custom Integrated Circuits Conference, 2002. Proceedings of the IEEE 2002. IEEE, 63--66.
[79]
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 14--26.
[80]
Rajendra Shinde, Ashish Goel, Pankaj Gupta, and Debojyoti Dutta. 2010. Similarity search and locality sensitive hashing using ternary content addressable memories Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 375--386.
[81]
Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. PipeLayer: A pipelined ReRAM-based accelerator for deep learning High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on. IEEE.
[82]
Suraj Sudhir, Suman Nath, and Seth Copen Goldstein. 2001. Configuration caching and swapping. In FPL, Vol. Vol. 1. Springer, 192--202.
[83]
Daisuke Suzuki and Takahiro Hanyu. 2015. Nonvolatile field-programmable gate array using 2-transistor--1-MTJ-cell-based multi-context array for power and area efficient dynamically reconfigurable logic. Japanese Journal of Applied Physics Vol. 54, 4S (2015), 04DE01.
[84]
Kosuke Tatsumura, Masato Oda, and Shinichi Yasuda. 2014. A pure-CMOS nonvolatile multi-context configuration memory for dynamically reconfigurable FPGAs Field-Programmable Technology (FPT), 2014 International Conference on. IEEE, 215--222.
[85]
David E Taylor and Jonathan S Turner. 2007. Classbench: A packet classification benchmark. IEEE/ACM Transactions on Networking (TON) Vol. 15, 3 (2007), 499--511.
[86]
Michael Bedford Taylor, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffman, Paul Johnson, Jae-Wook Lee, Walter Lee, et al. 2002. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE micro, Vol. 22, 2 (2002), 25--35.
[87]
Antonio C Torrezan et al. 2011. Sub-nanosecond switching of a tantalum oxide memristor. Nanotechnology, Vol. 22, 48 (2011), 485203.
[88]
Steven Trimberger, Dean Carberry, Anders Johnson, and Jennifer Wong. 1997. A time-multiplexed FPGA. In Field-Programmable Custom Computing Machines, 1997. Proceedings., the 5th Annual IEEE Symposium on. IEEE, 22--28.
[89]
Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2016. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. arXiv preprint arXiv:1612.07119 (2016).
[90]
Zeke Wang, Shuhao Zhang, Bingsheng He, and Wei Zhang. 2016. Melia: A mapreduce framework on opencl-based fpgas. IEEE Transactions on Parallel and Distributed Systems, Vol. 27, 12 (2016), 3547--3560.
[91]
Z. Wei, Y. Kanzawa, K. Arita, Y. Katoh, K. Kawai, S. Muraoka, S. Mitani, S. Fujii, K. Katayama, M. Iijima, T. Mikawa, T. Ninomiya, R. Miyanaga, Y. Kawashima, K. Tsuji, A. Himeno, T. Okada, R. Azuma, K. Shimakawa, H. Sugaya, T. Takagi, R. Yasuhara, K. Horiba, H. Kumigashira, and M. Oshima. 2008. Highly reliable TaOx ReRAM and direct evidence of redox reaction mechanism 2008 IEEE International Electron Devices Meeting. 1--4. 1145/3020078.3021698
[92]
Jialiang Zhang and Jing Li. 2017 b. Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 25--34.
[93]
Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM, New York, NY, USA, 15--24.
[94]
Le Zheng, Sangho Shin, Scott Lloyd, Maya Gokhale, Kyungmin Kim, and Sung-Mo Kang. 2016. RRAM-based TCAMs for pattern search. In Circuits and Systems (ISCAS), 2016 IEEE International Symposium on. IEEE, 1382--1385.

Cited By

View all
  • (2021)Evaluating Neural Network-Inspired Analog-to-Digital Conversion With Low-Precision RRAMIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301356340:5(808-821)Online publication date: May-2021
  • (2020)Liquid Silicon: A Nonvolatile Fully Programmable Processing-in-Memory Processor With Monolithically Integrated ReRAMIEEE Journal of Solid-State Circuits10.1109/JSSC.2019.296300555:4(908-919)Online publication date: Apr-2020
  • (2020)A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00076(684-695)Online publication date: May-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
March 2018
827 pages
ISBN:9781450349116
DOI:10.1145/3173162
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 53, Issue 2
    ASPLOS '18
    February 2018
    809 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3296957
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 March 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. binarized neural network
  2. monolithic 3D stacking
  3. multi-context architecture
  4. non-volatile memory
  5. processing-in-memory
  6. reconfigurable architecture
  7. ternary content addressable memory

Qualifiers

  • Research-article

Conference

ASPLOS '18

Acceptance Rates

ASPLOS '18 Paper Acceptance Rate 56 of 319 submissions, 18%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)4
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Evaluating Neural Network-Inspired Analog-to-Digital Conversion With Low-Precision RRAMIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301356340:5(808-821)Online publication date: May-2021
  • (2020)Liquid Silicon: A Nonvolatile Fully Programmable Processing-in-Memory Processor With Monolithically Integrated ReRAMIEEE Journal of Solid-State Circuits10.1109/JSSC.2019.296300555:4(908-919)Online publication date: Apr-2020
  • (2020)A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00076(684-695)Online publication date: May-2020
  • (2024)Towards High-Throughput Neural Network Inference with Computational BRAM on Nonvolatile FPGAs2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546738(1-6)Online publication date: 25-Mar-2024
  • (2021)Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAs2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM51124.2021.00018(88-96)Online publication date: May-2021
  • (2020)CATCAM: Constant-time Alteration Ternary CAM with Scalable In-Memory Architecture2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00038(342-355)Online publication date: Oct-2020
  • (2020)Endurance-Aware RRAM-Based Reconfigurable Architecture using TCAM Arrays2020 30th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL50879.2020.00018(40-46)Online publication date: Aug-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media