Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Exploration of distributed shared memory architectures for NoC-based multiprocessors

Published: 01 October 2007 Publication History

Abstract

Multiprocessor system-on-chip (MP-SoC) platforms represent an emerging trend for embedded multimedia applications. To enable MP-SoC platforms, scalable communication-centric interconnect fabrics, such as networks-on-chip (NoCs), have been recently proposed. The shared memory represents one of the key elements in designing MP-SoCs to provide data exchange and synchronization support. This paper focuses on the energy/delay exploration of a distributed shared memory architecture, suitable for low-power on-chip multiprocessors based on NoC. A mechanism is proposed for the data allocation on the distributed shared memory space, dynamically managed by an on-chip hardware memory management unit (HwMMU). Moreover, the exploitation of the HwMMU primitives for the migration, replication, and compaction of shared data is discussed. Experimental results show the impact of different distributed shared memory configurations for a selected set of parallel benchmark applications from the power/-performance perspective. Furthermore, a case study for a graph exploration algorithm is discussed, accounting for the effects of the core mapping and the network topology on energy and performance at the system level.

References

[1]
Chang, J.M. and Gehringer, E.F., A high-performance memory allocator for object-oriented systems. IEEE Transactions on Computers. v45 i3. 106-111.
[2]
Clauset, A., Newman, M.E.J. and Moore, C., Finding community structure in very large networks. Physical Review E. v6 i70. 066111
[3]
Duch, J. and Arenas, A., Community detection in complex networks using extremal optimization. Physical Review E. v72. 027104
[4]
Li, T. and Wolf, W.H., Hardware/software co-synthesis with memory hierarchies. IEEE Transactions on Computers. v18 i10. 1405-1417.
[5]
Milutinovic, V. and Stenstrom, P., Special issue on distributed shared memory systems. Proceedings of the IEEE. v87 i3. 399-404.
[6]
Newman, M.E.J., Detecting community structure in networks. European Physical Journal B. v38 iMay. 321-330.
[7]
Newman, M.E.J., Fast algorithm for detecting community structure in networks. Physical Review E. v69. 066133
[8]
Newman, M.E.J. and Girvan, M., Finding and evaluating community structure in networks. Physical Review E. v69. 026113
[9]
Palermo, G. and Silvano, C., PIRATE: a framework for power/performance exploration of network-on-chip architectures. In: Lecture Notes in Computer Science, vol. 3254. Springer. pp. 521-531.
[10]
Pande, P.P., Grecu, C., Jones, M., Ivanov, A. and Saleh, R., Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Transactions on Computers. v54 i8. 1025-1040.
[11]
Sinha, A., Ickes, N. and Chandrakasan, A.P., Instruction level and operating system profiling for energy exposed software. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. v11 i6. 1044-1057.
[12]
Von Puttkamer, E., A simple hardware buddy system memory allocator. IEEE Transactions on Computers. vC-24 i10. 953-957.
[13]
Wuytack, S., da Silva, J.L., Catthoor, F., de Jong, G. and Ykman-Couvreur, C., Memory management for embedded network applications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. v18 i5. 533-544.
[14]
F. Angiolini, L. Benini, A. Caprara, Polynomial-time algorithm for on-chip scratchpad memory partitioning, in: CASES '03: Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, San Jose, California, USA, 2003, pp. 318-326.
[15]
R. Banakar, S. Steinke, B. Lee, M. Balakrishnan, P. Marwedel, Scratchpad memory: a design alternative for cache on-chip memory in embedded systems, in: CODES '02: Proceedings of the 10th International Workshop on Hardware/Software Codesign, Estes Park, Colorado, USA, 2002, pp. 73-78.
[16]
J.M. Chang, W. Srisa-an, C.D. Lo, Introduction to DMMX (dynamic memory management extension), in: Proceedings of ICCD Workshop on Hardware Support for Objects and Microarchitectures for Java, October, 1999, pp. 11-14.
[17]
G. Chen, O. Ozturk, M. Kandemir, M. Karakoy, Dynamic scratch-pad memory management for irregular array access patterns, in: DATE '06: Proceedings of the Conference on Design, Automation and Test in Europe, Munich, Germany, 2006, pp. 931-936.
[18]
M. Coppola, R. Locatelli, G. Maruccia, L. Pieralisi, A. Scandurra, Spidergon: a novel on-chip communication network, in: SOC 2004: Proceedings of International Symposium on System-on-Chip, Tampere, Finland, November 2004, p. 15.
[19]
V. De La Luz, M. Kandemir, I. Kolcu, Automatic data migration for reducing energy consumption in multi-bank memory systems, in: DAC '02: Proceedings of the 39th Conference on Design Automation, New Orleans, Louisiana, USA, 2002, pp. 213-218.
[20]
M. Kandemir, J. Ramanujam, A. Choudhary, Exploiting shared scratch pad memory space in embedded multiprocessor systems, in: DAC '02: Proceedings of the 39th Conference on Design Automation, New Orleans, Louisiana, USA, 2002, pp. 219-224.
[21]
A.M. Molnos, M.J.M. Heijligers, S.D. Cotofana, J.T.J. Van Eijndhoven, Compositional memory systems for multimedia communicating tasks, in: DATE '05: Proceedings of the Conference on Design, Automation and Test in Europe, 2005, pp. 932-937.
[22]
M. Monchiero, G. Palermo, C. Silvano, O. Villa, Exploration of distributed shared memory architectures for NoC-based multiprocessors, in: IEEE IC-SAMOS'06: Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, July 2006, pp. 144-151.
[23]
D.S. Nikolopoulos, T.S. Papatheodorou, C.D. Polychronopoulos, J. Labarta, E. Ayguad, User-level dynamic page migration for multiprogrammed shared-memory multiprocessors, in: ICPP '00: Proceedings of the 2000 International Conference on Parallel Processing, 2000, p. 95.
[24]
O. Ozturk, M. Kandemir, S.W. Son, M. Karakoy, Selective code/data migration for reducing communication energy in embedded MPSoC architectures, in: GLSVLSI '06: Proceedings of the 16th ACM Great Lakes Symposium on VLSI, Philadelphia, PA, USA, 2006, pp. 386-391.
[25]
M. Shalan, V.J. Mooney, A dynamic memory management unit for embedded real-time system-on-a-chip, in: CASES '00: Proceedings of the 2000 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, San Jose, California, USA, 2000, pp. 180-186.
[26]
M. Shalan, V.J. Mooney, Hardware support for real-time embedded multiprocessor system-on-a-chip memory management, in: CODES'02: Proceedings of the 10th International Workshop on Hardware/Software Codesign, Estes Park, Colorado, 2002, pp. 79-84.
[27]
W. Srisa-an, C.D. Lo, J.M. Chang, A hardware implementation of realloc function, in: WVLSI '99: Proceedings of the IEEE Computer Society Workshop on VLSI'99, 1999, p. 106.
[28]
P.R. Wilson, M.S. Johnstone, M. Neely, D. Boles, Dynamic storage allocation: a survey and critical review, in: Proceedings of International Workshop on Memory Management, Kinross Scotland (UK), September 1995, pp. 1-78.
[29]
S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, A. Gupta, The SPLASH-2 programs: characterization and methodological considerations, in: Proceedings of the 22th International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, 1995, pp. 24-36.
[30]
E. Artiaga, N. Navarro, X. Martorell, Y. Becerra, Implementing PARMACS macros for shared-memory multiprocessor environments, in: Technical Report No. UPC-DAC-1997-07, 1997.
[31]
V. Catalano, M. Monchiero, G. Palermo, C. Silvano, O. Villa, GRAPES: a cycle-based design and simulation framework for heterogeneous MPSoC, Technical Report No. 45, Politecnico di Milano, 2006.
[32]
P. Shivakumar, N.P. Jouppi, CACTI 3.0: An integrated cache timing, power, and area model, Technical Report, HP Labs, 2001.

Cited By

View all
  • (2019)Multi-objective Spiking Neural Network Hardware Mapping Based on Immune Genetic AlgorithmArtificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation10.1007/978-3-030-30487-4_58(745-757)Online publication date: 17-Sep-2019
  • (2014)SPMCloudACM Transactions on Design Automation of Electronic Systems10.1145/261175519:3(1-45)Online publication date: 23-Jun-2014
  • (2013)Power-aware dynamic memory management on many-core platforms utilizing DVFSACM Transactions on Embedded Computing Systems10.1145/2536747.253676213:1s(1-25)Online publication date: 6-Dec-2013
  • Show More Cited By

Index Terms

  1. Exploration of distributed shared memory architectures for NoC-based multiprocessors

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Journal of Systems Architecture: the EUROMICRO Journal
      Journal of Systems Architecture: the EUROMICRO Journal  Volume 53, Issue 10
      October, 2007
      118 pages

      Publisher

      Elsevier North-Holland, Inc.

      United States

      Publication History

      Published: 01 October 2007

      Author Tags

      1. Design space exploration
      2. Low-power design
      3. Multiprocessor systems-on-chip
      4. Network-on-chip

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 19 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Multi-objective Spiking Neural Network Hardware Mapping Based on Immune Genetic AlgorithmArtificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation10.1007/978-3-030-30487-4_58(745-757)Online publication date: 17-Sep-2019
      • (2014)SPMCloudACM Transactions on Design Automation of Electronic Systems10.1145/261175519:3(1-45)Online publication date: 23-Jun-2014
      • (2013)Power-aware dynamic memory management on many-core platforms utilizing DVFSACM Transactions on Embedded Computing Systems10.1145/2536747.253676213:1s(1-25)Online publication date: 6-Dec-2013
      • (2012)System-level synthesis of memory architecture for stream processing sub-systems of a MPSoCProceedings of the 49th Annual Design Automation Conference10.1145/2228360.2228481(672-677)Online publication date: 3-Jun-2012
      • (2012)Fault Resilient Real-Time Design for NoC ArchitecturesProceedings of the 2012 IEEE/ACM Third International Conference on Cyber-Physical Systems10.1109/ICCPS.2012.16(75-84)Online publication date: 17-Apr-2012
      • (2011)Energy-efficient cache coherence protocol for NoC-based MPSoCsProceedings of the 24th symposium on Integrated circuits and systems design10.1145/2020876.2020925(215-220)Online publication date: 30-Aug-2011
      • (2010)PM-COSYNProceedings of the Conference on Design, Automation and Test in Europe10.5555/1870926.1871308(1590-1595)Online publication date: 8-Mar-2010
      • (2010)Introducing mNUMAProceedings of the Fourth Conference on Partitioned Global Address Space Programming Model10.1145/2020373.2020379(1-10)Online publication date: 12-Oct-2010

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media