research-article

Open access

Resistive GP-SIMD Processing-In-Memory

Authors:

Shahar Kvatinsky,

Ran GinosarAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 12, Issue 4

Article No.: 57, Pages 1 - 22

https://doi.org/10.1145/2845084

Published: 06 January 2016 Publication History

Abstract

GP-SIMD, a novel hybrid general-purpose SIMD architecture, addresses the challenge of data synchronization by in-memory computing, through combining data storage and massive parallel processing. In this article, we explore a resistive implementation of the GP-SIMD architecture. In resistive GP-SIMD, a novel resistive row and column addressable 4F² crossbar is utilized, replacing the modified CMOS 190F² SRAM storage previously proposed for GP-SIMD architecture. The use of the resistive crossbar allows scaling the GP-SIMD from few millions to few hundred millions of processing units on a single silicon die. The performance, power consumption and power efficiency of a resistive GP-SIMD are compared with the CMOS version. We find that PiM architectures and, specifically, GP-SIMD benefit more than other many-core architectures from using resistive memory. A framework for in-place arithmetic operation on a single multivalued resistive cell is explored, demonstrating a potential to become a building block for next-generation PiM architectures.

Supplementary Material

TACO1204-57 (taco1204-57.pdf)

Slide deck associated with this paper

Download
2.31 MB

References

[1]

Fabien Alibart, Liang Gao, Brian D. Hoskins, and Dmitri B. Strukov. 2012. High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm. Nanotechnology 23, (Jan. 2012), 075201.

[2]

Fabien Alibart, Timothy Sherwood, and Dmitri B. Strukov. 2011. Hybrid CMOS/nanodevice circuits for high throughput pattern matching applications. In 2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS). IEEE, 279--286.

[3]

Doug Burger and Todd M. Austin. 1997. The SimpleScalar tool set, version 2.0. ACM SIGARCH Computer Architecture News 25, 3 (June 1997), 13--25.

Digital Library

[4]

Yuval Cassuto, Shahar Kvatinsky, and Eitan Yaakobi. Sneak-Path constraints in Memristor crossbar arrays. In 2013 IEEE International Symposium on Information Theory Proceedings (ISIT). IEEE, 156--160.

[5]

Meng-Fan Chang, Che-Wei Wu, Chia-Cheng Kuo, Shin-Jang Shen, Ku-Feng Lin, Shu-Meng Yang, Ya-Chin King, Chorng-Jung Lin, and Yu-Der Chih. 2012. A 0.5 v 4 Mb logic-process compatible embedded resistive RAM (ReRAM) in 65 nm CMOS using low-voltage current-mode sensing scheme with 45ns random read time. In IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE, 434--436.

[6]

Meng-Fan Chang, Chien-Chen Lin, Albert Lee, Chia-Chen Kuo, Geng-Hau Yang, Hsiang-Jen Tsai, Tien-Fu Chen, Shyh-Shyuan Sheu, Pei-Ling Tseng, Heng-Yuan Lee, Tzu-Kun Ku, National Chiao Tung University. 2015. A 3T1R non-volatile TCAM using MLC ReRAM with sub-1ns search Time. In IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. (Feb 2015), 318--449.

[7]

Mu-Tien Chang, Paul Rosenfeld, Shih-Lien Lu, and Biji Jacob. 2013. Technology comparison for large last-level caches (L 3 Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA’13). IEEE, 143--154.

Digital Library

[8]

Yi-Chou Chen, C. F. Chen, C. T. Chen, J. Y. Yu, S. Wu, S. L. Lung, Rich Liu, and Chih-Yuan Lu. 2003. An access-transistor-Free (0T/1R) non-volatile resistance random access memory (RRAM) using a novel threshold switching, self-rectifying chalcogenide device. In IEEE IEDM. IEEE, 37.4.1--37.4.4.

[9]

Eric S. Chung, Peter A. Milder, James C. Hoe, and Ken Mai. 2010. Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPGPUs? In 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 225--236.

Digital Library

[10]

Timothy Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS) 38, 1 (2011), 1.

Digital Library

[11]

Richard Dorrance, Fengbo Ren, and Dejan Marković. 2014. A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-BLAS on FPGAs. In 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, New York, 161--170.

Digital Library

[12]

Kamran Eshraghian, Kyoung-Rok Cho, Omid Kavehei, Soon-Ku Kang, Derek Abbott, and Sung-Mo Steve Kang. 2011. Memristor MOS content addressable memory (MCAM): Hybrid architecture for future high performance search engines. IEEE Trans. VLSI Systems, 19, 8 (July 2011), 1407--1417.

Digital Library

[13]

Seungbum Hong, Orlando Auciello, and Dirk J. Wouters. 2014. Emerging Non-Volatile Memories. Springer-Verlag, New York.

[14]

Mark Horowitz. 2014. 1.1 Computing's energy problem (and what we can do about it). In Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE, 10--14.

[15]

I. T. R. S. Roadmap. Retrieved from http://www.itrs.net/

[16]

Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, and Samuel Williams. 2010. An auto-tuning framework for parallel Multicore stencil computations. In IEEE International Symposium on Parallel & Distributed Processing. IEEE, 1--12.

[17]

Akifumi Kawahara, Ryotaro Azuma, Yasuhiro Ikeda, Kunihiro Kawai, Yoshikazu Katoh, Yoshikazu Hayakawa, Keita Tsuji, Shinichi Yoneda, Atsushi. Himeno, Kazuhiko Shimakawa, Takeshi Takagi, Takumi Mikawa, and Kunioshi Aono. 2013. An 8 Mb Multi-layered cross-point ReRAM Macro with 443 MB/s write throughput. IEEE J. Solid-State Circuits 48, 1 (Dec. 2012), 178--185.

[18]

Jakub Kurzak, David A. Bader, and Jack Dongarra. 2010. Scientific Computing with Multicore and Accelerators. CRC Press, Inc., Boca Raton, FL.

Digital Library

[19]

Shahar Kvatinsky, Eby G. Friedman, Avinoam. Kolodny, and Uri C. Weiser. 2013. TEAM: threshold adaptive Memristor model. IEEE Trans. Circuits Syst. I 60, 1 (Jan. 2013), 211--221.

[20]

Shahar Kvatinsky, Keren Talisveyberg, Dmitry Fliter, Avinoam Kolodny, Uri C. Weiser, and Eby G. Friedman. 2012. Models of Memristors for SPICE simulations. In IEEE Convention of Electrical and Electronics Engineers in Israel. IEEE, 1--5.

[21]

Shahar Kvatinsky, Nimrod Wald, Guy Satat, Avinoam Kolodny, Uri C. Weiser, and Eby G. Friedman. 2012b. MRL - Memristor Ratioed logic. In 13th International Workshop on Cellular Nanoscale Networks and Their Applications. IEEE, 29--31.

[22]

R. Lauwereins. 2015. New memory technologies and their impact on computer architectures. HiPeac’15 keynote.

[23]

Colin Yu Lin, Ngai Wong, and Hayden Kwok-Hay So. 2013. Design space exploration for sparse matrix-matrix multiplication on FPGAs. Int. J. Circ. Theor. Appl. 41, 2 (Feb. 2013), 205--219.

[24]

T.-Y. Liu, Tian Hong Yan, R. Scheuerlein, Yingchang Chen, K. K. Lee, and G. Balakrishnan. 2013. A 130.7 mm² 2-layer 32 Gb ReRAM memory device in 24 nm technology. In IEEE International Solid-State Circuits Conference, 49, 1 (Feb. 2013), 210--211.

[25]

Xing Liu, Mikhail Smelyanskiy, Edmond Chow, and Pradeep Dubey. 2013b. Efficient sparse matrix-vector multiplication on x86-based many-core processors. In International Conference on Supercomputing. ACM, New York, 273--282.

Digital Library

[26]

Amir Morad, Leonid Yavits, and Ran Ginosar. 2014. Efficient dense and sparse Matrix multiplication on GP-SIMD. In Power and Timing Modeling, Optimization and Simulation. IEEE, 1--8.

[27]

Amir Morad, Leonid Yavits, and Ran Ginosar. 2015. GP-SIMD processing-in-memory. ACM Trans. Archit. Code Optim. 11, 4 (Jan. 2015), Article 53.

Digital Library

[28]

J. Nickel. 2011. Memristor materials engineering: from flash replacement towards a universal memory. In Proceedings of the IEEE International Electron Devices Meeting. December 2011.

[29]

Dimin Niu, Cong Xu, Naveen Muralimanohar, Norman P. Jouppi, and Yuan Xie. 2012. Design trade-offs for high density cross-point resistive memory. In 2012 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED’12). ACM, New York, 209--214.

Digital Library

[30]

Elaine Ou and S. Simon Wong. 2011. Array architecture for a nonvolatile 3-dimensional cross-point resistance-change memory. IEEE J. Solid-State Circuits 46, 9 (Aug. 2011), 2158--2170.

[31]

Rahul Patel, Shahar Kvatinsky, Eby G. Friedman, and Avinoam Kolodny. 2014. Multistate register based on resistive RAM. IEEE Trans. VLSI Syst. 23, 9 (Aug. 2015), 1750--1759.

Digital Library

[32]

Ravi Patel and Eby G. Friedman. 2012. Arithmetic encoding for memristive multi-bit storage. In IEEE/IFIP 20th International Conference on VLSI and System-on-Chip (VLSI-SoC’12). IEEE, 99--104.

[33]

Erik Saule, Kamer Kaya, and Ümit V. Çatalyürek. 2014. Performance evaluation of sparse Matrix multiplication kernels on Intel Xeon Phi. arXiv preprint arXiv:1302.1078.

[34]

Shyh-Shyuan Shu, Pei-Chia Chiang, Wen-Pin Lin, Heng-Yuan Lee, Pang-Shiu Chen, Yu-Sheng Chen, Tai-Yuan Wu, Frederick T. Chen, Keng-Li Su, Ming-Jer Kao, Kuo-Hsing Cheng, and Ming-Jinn Tsai. 2009. A 5ns fast write multi-level non-volatile 1 K bits RRAM memory with advance write scheme. In Symposium on VLSI circuits. IEEE, 82, 83.

[35]

Jonathan Thatcher, Tom Coughlin, Jim Handy, and Neal Ekker. 2009. NAND flash solid state storage for the Enterprise, an in-depth look at reliability. In Solid State Storage Initiative (SSSI) of the Storage Network Industry Association (SNIA’09).

[36]

Antonio C. Torrezan, John Paul Strachan, Gilberto Medeiros-Ribeiro, and R. Stanley Williams. 2011. Sub-nanosecond switching of a tantalum oxide memristor. Nanotechnology 22, 48 (2011), 485203.

[37]

Samual Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. 2009. Optimization of sparse matrix--vector multiplication on emerging multicore platforms. Parallel Comput. 35, 3 (March 2009), 178--194.

Digital Library

[38]

H.-S. Philip Wong, Heng-Yuan Lee, Shimeng Yu, Yu-Sheng Chen, Yi Wu, Pang-Shiu Chen, Byoungil Lee, Frederick T. Chen, and Ming-Jinn Tsai. 2012. Metal--oxide RRAM. Proc. IEEE 100, 6 (June 2012), 1951,1970.

[39]

Ming-Chi Wu, Yi-Wei Lin, Wen-Yueh Jang, Chen-Hsi Lin, and Tseung-Yuen Tseng. 2011. Low-Power and highly reliable multilevel operation in ZrO2 1T1R RRAM. IEEE Electron. Dev. Lett., 32, 8 (July 2011), 1026--1028.

[40]

Cong Xu, Xiangyu Dong, Norman P. Jouppi, and Yuan Xie. 2011. Design implications of Memristor-Based RRAM cross-point structures. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1--6.

[41]

Leonid Yavits, Amir Morad, and Ran Ginosar. 2014a. Computer architecture with associative processor replacing last-level cache and SIMD accelerator. IEEE Trans. Comput. 64, 2 (Jan. 2015), 368--381.

Digital Library

[42]

Leonid Yavits, Amir Morad, and Ran Ginosar. 2014b. The effect of communication and synchronization on Amdahl's law in multicore systems. Parallel Computing 40, 1 (Jan. 2014), 1--16.

Digital Library

[43]

Leonid Yavits, Shahar kvatinsky, Amir Morad, and Ran Ginosar. 2014. Resistive associative processor. IEEE Comput. Arch. Lett. PP, 99 (Nov. 2014), 1.

Digital Library

[44]

Mohsen Zangeneh and Akanksha Joshi. 2014. Design and optimization of nonvolatile Multibit 1T1R resistive RAM. IEEE Trans. Very Large Scale Integrat (VLSI) Syst. 22, 8 (July 2014), 1815--1828.

Cited By

Jangra PDuhan M(2024)In-memory computing: characteristics, spintronics, and neural network applications insightsMultiscale and Multidisciplinary Modeling, Experiments and Design10.1007/s41939-024-00517-07:6(5005-5029)Online publication date: 9-Jul-2024
https://doi.org/10.1007/s41939-024-00517-0
Al-Hawaj KTa TCebry NAgwa SAfuye OHall EGolden CApsel ABatten C(2023)EVE: Ephemeral Vector Engines2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071074(691-704)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071074
Gebregiorgis ADu Nguyen HYu JBishnoi RTaouil MCatthoor FHamdioui S(2022)A Survey on Memory-centric Computer ArchitecturesACM Journal on Emerging Technologies in Computing Systems10.1145/354497418:4(1-50)Online publication date: 25-Oct-2022
https://dl.acm.org/doi/10.1145/3544974
Show More Cited By

Index Terms

Resistive GP-SIMD Processing-In-Memory
1. Computer systems organization
  1. Architectures
    1. Other architectures
    2. Parallel architectures
      1. Multiple instruction, multiple data

Recommendations

GP-SIMD Processing-in-Memory

GP-SIMD, a novel hybrid general-purpose SIMD computer architecture, resolves the issue of data synchronization by in-memory computing through combining data storage and massively parallel processing. GP-SIMD employs a two-dimensional access memory with ...
SPIMulator: A Spintronic Processing-in-memory Simulator for Racetracks
In-memory processing is becoming a popular method to alleviate the memory bottleneck of the Von Neumann computing model. With the goal of improving both latency and energy cost associated with such in-memory processing, emerging non-volatile memory ...
Exploring Time and Energy for Complex Accesses to a Hybrid Memory Cube
MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems

Through-Silicon Vias (TSVs) and three-dimensional die stacking technologies are enabling a combination of DRAM and CMOS die layer within a single stack, leading to stacked memory. Functionality that was previously associated with the microprocessor, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 12, Issue 4

January 2016

848 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2836331

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 January 2016

Accepted: 01 November 2015

Revised: 01 October 2015

Received: 01 March 2015

Published in TACO Volume 12, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI)
Viterbi Fellowship
Hasso-Plattner Institute (HPI)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
735
Total Downloads

Downloads (Last 12 months)79
Downloads (Last 6 weeks)13

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jangra PDuhan M(2024)In-memory computing: characteristics, spintronics, and neural network applications insightsMultiscale and Multidisciplinary Modeling, Experiments and Design10.1007/s41939-024-00517-07:6(5005-5029)Online publication date: 9-Jul-2024
https://doi.org/10.1007/s41939-024-00517-0
Al-Hawaj KTa TCebry NAgwa SAfuye OHall EGolden CApsel ABatten C(2023)EVE: Ephemeral Vector Engines2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071074(691-704)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071074
Gebregiorgis ADu Nguyen HYu JBishnoi RTaouil MCatthoor FHamdioui S(2022)A Survey on Memory-centric Computer ArchitecturesACM Journal on Emerging Technologies in Computing Systems10.1145/354497418:4(1-50)Online publication date: 25-Oct-2022
https://dl.acm.org/doi/10.1145/3544974
Yavits LKaplan RGinosar R(2022)GIRAF: General Purpose In-Storage Resistive Associative FrameworkIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306544833:2(276-287)Online publication date: 1-Feb-2022
https://doi.org/10.1109/TPDS.2021.3065448
Ferreira JFalcao GGomez-Luna JAlser MOrosa LSadrosadati MKim JOliveira GShahroodi TNori AMutlu O(2022)pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00067(900-919)Online publication date: Oct-2022
https://doi.org/10.1109/MICRO56248.2022.00067
Garzón EYavits LLanuzza MTeman AWebster J(2022)Emerging Memory Structures for VLSI CircuitsWiley Encyclopedia of Electrical and Electronics Engineering10.1002/047134608X.W8438(1-28)Online publication date: 12-May-2022
https://doi.org/10.1002/047134608X.W8438
Caminal HYang KSrinivasa SRamanathan AAl-Hawaj KWu TNarayanan VBatten CMartinez J(2021)CAPE: A Content-Addressable Processing Engine2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00054(557-569)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00054
Xu NPark TYoon KHwang C(2021)In‐Memory Stateful Logic Computing Using Memristors: Gate, Calculation, and Applicationphysica status solidi (RRL) – Rapid Research Letters10.1002/pssr.20210020815:9Online publication date: 6-Aug-2021
https://doi.org/10.1002/pssr.202100208
Nguyen HYu JLebdeh MTaouil MHamdioui SCatthoor F(2020)A Classification of Memory-Centric ComputingACM Journal on Emerging Technologies in Computing Systems10.1145/336583716:2(1-26)Online publication date: 30-Jan-2020
https://dl.acm.org/doi/10.1145/3365837
Yu JNane RAshraf ITaouil MHamdioui SCorporaal HBertels K(2020)Skeleton-Based Synthesis Flow for Computation-in-Memory ArchitecturesIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2017.27609278:2(545-558)Online publication date: 1-Apr-2020
https://doi.org/10.1109/TETC.2017.2760927
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents