Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

MMNNN: : A tree-based Multicast Mechanism for NoC-based deep Neural Network accelerators

Published: 01 September 2021 Publication History

Abstract

Network-on-Chip (NoC) devices have been widely used in multiprocessor systems. In recent years, NoC-based Deep Neural Network (DNN) accelerators have been proposed to connect neural computing devices using NoCs. Such designs dramatically reduce off-chip memory accesses of these platforms. However, the large number of one-to-many packet transfers significantly degrade performance with traditional unicast channels. We propose a multicast mechanism for a NoC-based DNN accelerator called Multicast Mechanism for NoC-based Neural Network accelerator (MMNNN). To do so, we propose a tree-based multicast routing algorithm with excellent scalability and the ability to minimize the number of packets in the network. We also propose a router architecture for single-flit packets. Our proposed router transfers flits to multiple destinations in a single process and has no head-of-line blocking issue, offering higher throughput and lower latency than traditional wormhole router architectures. Simulation results show that our proposed multicast mechanism offers excellent performance in classification latency, average packet latency, and energy consumption.

References

[1]
Goossens K., Dielissen J., Radulescu A., Æthereal network on chip: Concepts, architectures, and implementations, IEEE Des. Test Comput. 22 (5) (2005) 414–421,.
[2]
Wang L., Jin Y., Kim H., Kim E.J., Recursive partitioning multicast: A bandwidth-efficient routing for networks-on-chip, in: Third International Symposium on Networks-on-Chips, NOCS 2009, May 10–13 2009, la Jolla, CA, USA. Proceedings, IEEE Computer Society, 2009, pp. 64–73,.
[3]
Peh L., Dally W.J., A delay model and speculative architecture for pipelined routers, in: Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA’01), Nuevo Leone, Mexico, January 20–24, 2001, IEEE Computer Society, 2001, pp. 255–266,.
[4]
Kumar A., Peh L., Kundu P., Jha N.K., Express virtual channels: towards the ideal interconnection fabric, in: Tullsen D.M., Calder B. (Eds.), 34th International Symposium on Computer Architecture (ISCA 2007), June 9–13, 2007, San Diego, California, USA, ACM, 2007, pp. 150–161,.
[5]
Matsutani H., Koibuchi M., Amano H., Yoshinaga T., Prediction router: Yet another low latency on-chip router architecture, in: 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 14–18 February 2009, Raleigh, North Carolina, USA, IEEE Computer Society, 2009, pp. 367–378,.
[6]
Deb S., Ganguly A., Pande P.P., Belzer B., Heo D.H., Wireless NoC as interconnection backbone for multicore chips: Promises and challenges, IEEE J. Emerg. Sel. Topics Circuits Syst. 2 (2) (2012) 228–239,.
[7]
Ouyang Y., Yang J., Xing K., Huang Z., Liang H., An improved communication scheme for non-HOL-blocking wireless NoC, Integration 60 (2018) 240–247,.
[8]
Ouyang Y., Li Z., Li J., Sun C., Liang H., Du G., CPCA: an efficient wireless routing algorithm in winoc for cross path congestion awareness, Integration 69 (2019) 75–84,.
[9]
Ouyang Y., Wang Q., Hu L., Liang H., DVFS based error avoidance strategy in wireless network-on-chip, J. Electron. Test. 35 (6) (2019) 767–777,.
[10]
Chen K.J., Ebrahimi M., Wang T., Yang Y., Noc-based DNN accelerator: a future design paradigm, in: Bogdan P., Silvano C. (Eds.), Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, NOCS 2019, New York, NY, USA, October 17–18, 2019, ACM, 2019, pp. 11:1–11:8,.
[11]
Dally W.J., Towles B.P., Principles and Practices of Interconnection Networks, Elsevier, 2004.
[12]
Painkras E., Plana L.A., Garside J.D., Temple S., Galluppi F., Patterson C., Lester D.R., Brown A.D., Furber S.B., SpiNNaker: A 1-W 18-core system-on-chip for massively-parallel neural network simulation, IEEE J. Solid State Circuits 48 (8) (2013) 1943–1953,.
[13]
Carrillo S., Harkin J., McDaid L., Morgan F., Pande S., Cawley S., McGinley B., Scalable hierarchical network-on-chip architecture for spiking neural network hardware implementations, IEEE Trans. Parallel Distributed Syst. 24 (12) (2013) 2451–2461,.
[14]
Chen Y., Yang T., Emer J.S., Sze V., Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices, IEEE J. Emerg. Sel. Topics Circuits Syst. 9 (2) (2019) 292–308,.
[15]
Liu X., Wen W., Qian X., Li H., Chen Y., Neu-NoC: A high-efficient interconnection network for accelerated neuromorphic systems, in: Shin Y. (Ed.), 23rd Asia and South Pacific Design Automation Conference, ASP-DAC 2018, Jeju, Korea (South), January 22–25, 2018, IEEE, 2018, pp. 141–146,.
[16]
Chen K.J., Wang T., NN-Noxim: High-level cycle-accurate noc-based neural networks simulator, in: 11th International Workshop on Network on Chip Architectures, NoCArc@MICRO 2018, Fukuoka, Japan, October 20, 2018, IEEE Computer Society, 2018, pp. 1–5,.
[17]
Chen K.J., Wang T.G., Yang Y.A., Cycle-accurate noc-based convolutional neural network simulator, in: Firouzi F., Chakrabarty K., Farahani B., Ye F., Pavlidis V.F. (Eds.), Proceedings of the International Conference on Omni-Layer Intelligent Systems, COINS 2019, Crete, Greece, May 5–7, 2019, ACM, 2019, pp. 199–204,.
[18]
Chen K.J., Ebrahimi M., Wang T., Yang Y., Liao Y., A noc-based simulator for design and evaluation of deep neural networks, Microprocess. Microsystems 77 (2020),.
[19]
Xiao S., Guo Y., Liao W., Deng H., Luo Y., Zheng H., Wang J., Li C., Li G., Yu Z., Neuronlink: An efficient chip-to-chip interconnect for large-scale neural network accelerators, IEEE Trans. Very Large Scale Integr. Syst. 28 (9) (2020) 1966–1978,.
[20]
Shen X., Ye X., Tan X., Wang D., Zhang L., Li W., Zhang Z., Fan D., Sun N., An efficient network-on-chip router for dataflow architecture, J. Comput. Sci. Technol. 32 (1) (2017) 11–25,.
[21]
Kumar D.R., Najjar W.A., Srimani P.K., A new adaptive hardware tree-based multicast routing in K-ary N-cubes, IEEE Trans. Computers 50 (7) (2001) 647–659,.
[22]
Hu W., Lu Z., Jantsch A., Liu H., Power-efficient tree-based multicast support for networks-on-chip, in: Proceedings of the 16th Asia South Pacific Design Automation Conference, ASP-DAC 2011, Yokohama, Japan, January 25–27, 2011, IEEE, 2011, pp. 363–368,.
[23]
Lin X., McKinley P.K., Ni L.M., Deadlock-free multicast wormhole routing in 2-D mesh multicomputers, IEEE Trans. Parallel Distributed Syst. 5 (8) (1994) 793–804,.
[24]
Ebrahimi M., Daneshtalab M., Liljeberg P., Tenhunen H., HAMUM - A novel routing protocol for unicast and multicast traffic in mpsocs, in: Danelutto M., Bourgeois J., Gross T. (Eds.), Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP 2010, Pisa, Italy, February 17–19, 2010, IEEE Computer Society, 2010, pp. 525–532,.
[25]
Nguyen S.T., Oyanagi S., A low cost single-cycle router based on virtual output queuing for on-chip networks, in: López S. (Ed.), 13th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, DSD 2010, 1–3 September 2010, Lille, France, IEEE Computer Society, 2010, pp. 60–67,.
[26]
T. Speier, B. Wolford, Qualcomm centriq 2400 processor, in: Hot Chips: A Symposium on High Performance Chips, HC29 (2017), 2017.
[27]
Jeffers J., Reinders J., Sodani A., Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition, Morgan Kaufmann, 2016.
[28]
Lian X., Liu Z., Song Z., Dai J., Zhou W., Ji X., High-performance FPGA-based CNN accelerator with block-floating-point arithmetic, IEEE Trans. Very Large Scale Integr. Syst. 27 (8) (2019) 1874–1885,.
[29]
Farabet C., Poulet C., LeCun Y., An fpga-based stream processor for embedded real-time vision with convolutional networks, in: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, IEEE, 2009, pp. 878–885.
[30]
Gupta S., Agrawal A., Gopalakrishnan K., Narayanan P., Deep learning with limited numerical precision, 2015, CoRR abs/1502.02551, arXiv:1502.02551.
[31]
Moons B., Verhelst M., A 0.3-2.6 TOPS/W precision-scalable processor for real-time large-scale convnets, in: 2016 IEEE Symposium on VLSI Circuits, VLSIC 2016, Honolulu, HI, USA, June 15–17, 2016, IEEE, 2016, pp. 1–2,.
[32]
Yin S., Ouyang P., Tang S., Tu F., Li X., Liu L., Wei S., A 1.06-to-5.09 TOPS/W reconfigurable hybrid-neural-network processor for deep learning applications, in: 2017 Symposium on VLSI Circuits, IEEE, 2017, pp. C26–C27.
[33]
A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, in: P.L. Bartlett, F.C.N. Pereira and C.J.C. Burges, L. Bottou, K.Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a Meeting Held December 3–6, 2012, Lake Tahoe, Nevada, United States, 2012, pp. 1106–1114.
[34]
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
[35]
Catania V., Mineo A., Monteleone S., Palesi M., Patti D., Cycle-accurate network on chip simulation with noxim, ACM Trans. Model. Comput. Simul. 27 (1) (2016) 4:1–4:25.
[36]
Garcia-Molina H., Spauster A., Message ordering in a multicast environment, in: 9th International Conference on Distributed Computing Systems, ICDCS 1989, Newport Beach, CA, USA, June 5–9, 1989, IEEE Computer Society, 1989, pp. 354–361,.
[37]
Chang J., Maxemchuk N.F., Reliable broadcast protocols, ACM Trans. Comput. Syst. 2 (3) (1984) 251–273,.
[38]
Lin X., Ni L.M., Multicast communication in multicomputer networks, IEEE Trans. Parallel Distrib. Syst. 4 (10) (1993) 1105–1117,.
[39]
Jerger N.D.E., Peh L., Lipasti M.H., Virtual circuit tree multicasting: A case for on-chip hardware multicast support, in: 35th International Symposium on Computer Architecture (ISCA 2008), June 21–25, 2008, Beijing, China, IEEE Computer Society, 2008, pp. 229–240,.
[40]
Lecun Y., Bottou L., Bengio Y., Haffner P., Gradient-based learning applied to document recognition, Proc. IEEE 86 (11) (1998) 2278–2324,.
[41]
Lee J., Kim C., Kang S., Shin D., Kim S., Yoo H., UNPU: an energy-efficient deep neural network accelerator with fully variable weight bit precision, IEEE J. Solid State Circuits 54 (1) (2019) 173–185,.

Cited By

View all
  • (2023)URMP: using reconfigurable multicast path for NoC-based deep neural network acceleratorsThe Journal of Supercomputing10.1007/s11227-023-05255-779:13(14827-14847)Online publication date: 1-Sep-2023

Index Terms

  1. MMNNN: A tree-based Multicast Mechanism for NoC-based deep Neural Network accelerators
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Microprocessors & Microsystems
    Microprocessors & Microsystems  Volume 85, Issue C
    Sep 2021
    236 pages

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 01 September 2021

    Author Tags

    1. Network-on-Chip
    2. Deep Neural Network (DNN) accelerator
    3. Multicast routing algorithm
    4. Router architecture

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)URMP: using reconfigurable multicast path for NoC-based deep neural network acceleratorsThe Journal of Supercomputing10.1007/s11227-023-05255-779:13(14827-14847)Online publication date: 1-Sep-2023

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media