Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3386263.3407581acmotherconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

IMC-Sort: In-Memory Parallel Sorting Architecture using Hybrid Memory Cube

Published: 07 September 2020 Publication History

Abstract

Processing-in-memory (PIM) architectures have gained significant importance as an alternative paradigm to the von-Neumann architectures to alleviate the memory wall and technology scaling problems. PIM architectures have achieved significant latency and energy consumption improvements for various emerging and widely used workloads such as deep neural networks, graph analytics, databases and computational genomics. In this work, we propose a PIM based accelerator architecture (IMC-Sort) for the sort algorithm. Sort is one of the fundamental and widely used algorithm in various applications such as databases, networking, and data analytics. IMC-Sort architecture augments the hybrid memory cube memory system by incorporating custom sorting network at each of the HMC vault's logic layer. IMC-Sort uses optimized folded Bitonic sort and merge network to sort input sequences of arbitrary length at each vault and optimized address mapping mechanism to distribute the input data across HMC vaults. Merging of the sorted results across individual vaults is also performed using the vault's sorting network by communicating with other vaults through the HMC's crossbar network. Overall, IMC-Sort achieves 16.8x, 1.1x speedup and 375.5x, 13.6x savings in energy consumption compared to the widely used CPU implementation and state of the art near memory custom sort accelerator respectively.

Supplementary Material

MP4 File (3386263.3407581.mp4)
Presentation video

References

[1]
A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, "ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars," in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 14--26, 2016.
[2]
P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, "PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory," in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 27--39, 2016.
[3]
S. Gudaparthi, S. Narayanan, R. Balasubramonian, E. Giacomin, H. Kambalasubramanyam, and P.-E. Gaillardon, "Wire-aware architecture and dataflow for cnn accelerators," in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, p. 1--13, 2019.
[4]
J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi, "A scalable processing-in-memory accelerator for parallel graph processing," in 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 105--117, June 2015.
[5]
G. Li, G. Dai, S. Li, Y. Wang, and Y. Xie, "GraphIA: An In-situ Accelerator for Large-scale Graph Processing," in Proceedings of the International Symposium on Memory Systems, pp. 79--84, 2018.
[6]
H.-c. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker, "Map-reduce-merge: Simplified relational data processing on large clusters," in Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, p. 1029--1040, 2007.
[7]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy, "Hive - a petabyte scale data warehouse using hadoop," in 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 996--1005, 2010.
[8]
A. Farmahini-Farahani, H. J. Duwe III, M. J. Schulte, and K. Compton, "Modular design of high-throughput, low-latency sorting units," IEEE Trans. Comput., vol. 62, p. 1389--1402, July 2013.
[9]
S. H. Pugsley, A. Deb, R. Balasubramonian, and F. Li, "Fixed-function hardware sorting accelerators for near data mapreduce execution," in 2015 33rd IEEE International Conference on Computer Design (ICCD), pp. 439--442, 2015.
[10]
N. Samardzic, W. Qiao, V. Aggarwal, M. F. Chang, and J. Cong, "Bonsai: High- Performance Adaptive Merge Tree Sorting," in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 2020.
[11]
S. Zhou, C. Chelmis, and V. K. Prasanna, "High-throughput and energy-efficient graph processing on fpga," in 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 103--110, 2016.
[12]
A. Srivastava, R. Chen, V. K. Prasanna, and C. Chelmis, "A hybrid design for high performance large-scale sorting on fpga," in 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig), pp. 1--6, 2015.
[13]
K. E. Batcher, "Sorting networks and their applications," in Proceedings of the April 30--May 2, 1968, Spring Joint Computer Conference, p. 307--314, 1968.
[14]
J. Jeddeloh and B. Keeth, "Hybrid memory cube new dram architecture increases density and performance," 2012 Symposium on VLSI Technology (VLSIT), pp. 87--88, 2012.
[15]
B. Akin, F. Franchetti, and J. C. Hoe, "Data reorganization in memory using 3d-stacked dram," in Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA '15, (New York, NY, USA), p. 131--143, Association for Computing Machinery, 2015.
[16]
S. Jiang, P. Pan, Y. Ou, and C. Batten, "Pymtl3: A python framework for opensource hardware modeling, generation, simulation, and verification," IEEE Micro, vol. 40, no. 4, pp. 58--66, 2020.
[17]
H. Chen, S. Madaminov, M. Ferdman, and P. Milder, "Fpga-accelerated sample sort for large data sets," in The 2020 ACM/SIGDA International Symposium on Field- Programmable Gate Arrays, p. 222--232, 2020.
[18]
J. D. Leidel and Y. Chen, "Hmc-sim-2.0: A simulation platform for exploring custom memory cube operations," in 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 621--630, 2016.

Cited By

View all
  • (2024)Fully Digital, Standard-Cell-Based Multifunction Compute-in-Memory Arrays for Genome SequencingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.330826232:1(30-41)Online publication date: 1-Jan-2024
  • (2024)DL-Sort: A Hybrid Approach to Scalable Hardware-Accelerated Fully-Streaming SortingIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337725571:5(2549-2553)Online publication date: May-2024
  • (2024)On Key–Value Sort With Active Compute MemoryIEEE Transactions on Computers10.1109/TC.2024.337177373:5(1341-1356)Online publication date: May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
GLSVLSI '20: Proceedings of the 2020 on Great Lakes Symposium on VLSI
September 2020
597 pages
ISBN:9781450379441
DOI:10.1145/3386263
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hybrid memory cube
  2. merge
  3. processing-in-memory
  4. sort
  5. vault

Qualifiers

  • Research-article

Funding Sources

  • Semiconductor Research Corporation

Conference

GLSVLSI '20
GLSVLSI '20: Great Lakes Symposium on VLSI 2020
September 7 - 9, 2020
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)74
  • Downloads (Last 6 weeks)14
Reflects downloads up to 01 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Fully Digital, Standard-Cell-Based Multifunction Compute-in-Memory Arrays for Genome SequencingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.330826232:1(30-41)Online publication date: 1-Jan-2024
  • (2024)DL-Sort: A Hybrid Approach to Scalable Hardware-Accelerated Fully-Streaming SortingIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337725571:5(2549-2553)Online publication date: May-2024
  • (2024)On Key–Value Sort With Active Compute MemoryIEEE Transactions on Computers10.1109/TC.2024.337177373:5(1341-1356)Online publication date: May-2024
  • (2024)In-memory computing: characteristics, spintronics, and neural network applications insightsMultiscale and Multidisciplinary Modeling, Experiments and Design10.1007/s41939-024-00517-0Online publication date: 9-Jul-2024
  • (2023)Sky-Sorter: A Processing-in-Memory Architecture for Large-Scale SortingIEEE Transactions on Computers10.1109/TC.2022.316943472:2(480-493)Online publication date: 1-Feb-2023
  • (2022)Sorting in Memristive MemoryACM Journal on Emerging Technologies in Computing Systems10.1145/351718118:4(1-21)Online publication date: 13-Oct-2022
  • (2022)Multi-Function CIM Array for Genome Alignment Applications built with Fully Digital Flow2022 IEEE Nordic Circuits and Systems Conference (NorCAS)10.1109/NorCAS57515.2022.9934470(1-7)Online publication date: 25-Oct-2022
  • (2022)Pulley: An Algorithm/Hardware Co-Optimization for In-Memory SortingIEEE Computer Architecture Letters10.1109/LCA.2022.320825521:2(109-112)Online publication date: 1-Jul-2022
  • (2022)ReCSA: a dedicated sort accelerator using ReRAM-based content addressable memoryFrontiers of Computer Science10.1007/s11704-022-1322-317:2Online publication date: 8-Aug-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media