Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

SPMCloud: Towards the Single-Chip Embedded ScratchPad Memory-Based Storage Cloud

Published: 23 June 2014 Publication History

Abstract

The era of cloud computing on-a-chip is enabled by the aggressive move towards many-core platforms and the rapid adoption of Network-on-Chips. As a result, there is a need for large-scale distributed on-chip shared memories that are reliable, low power, and seamlessly manageable. In this work, we propose SPMCloud, a novel scratchpad-memory-based cloud-inspired volatile storage subsystem designed to meet the needs of future-generation many-core platforms. SPMCloud is composed of several concepts, including: (1) a highly scalable data-center-like memory subsystem that exploits two enterprise-network-inspired memory configurations, namely, embedded Network Attached Storage (eNAS) and embedded Storage Area Network (eSAN), and (2) on-demand allocation of reliable memory space through memory virtualization and the use of embedded RAIDs. Our experimental results on Mediabench/CHStone benchmarks show that the SPMCloud's fully distributed reliable memory subsystems can achieve 48% energy savings and 70% latency reduction on average over state-of-the-art NoC memory reliability techniques. We then evaluate the scalability of the SPMCloud and compare it with traditional SPM allocation policies. The SPMCloud's dynamic allocator outperforms the best competition by an average 60% (eNAS) and 46% (eSAN) when the platform runs at 250 MHz and by an average 80% (eNAS) and 40% when running at 1 GHz. Moreover, the SPMCloud achieves an average 83% energy savings across all configurations (number of cores) with respect to the best competitors when running at 250 MHz and 1 GHz. We then studied the SPM hit ratio across the various allocation policies discussed in this article and showed that on average the SPMCloud's priority-driven dynamic allocation policy achieves 93.5% SPM hit ratio, 0.6% higher hit ratio than the closest allocation policy. We then showed that the eNAS and eSAN achieve an average of 67.9% and 29% reduction in execution time, respectively, over the best competitor. Similarly, the eNAS and eSAN achieve an average of 82.7% and 82.3% energy savings, respectively, over the best competitor. Furthermore, we evaluated the scalability of the SPMCloud and its performance/energy efficiency when providing support for some of the heavier E-RAID levels, and showed that the eNAS/eSAN configurations with SECDED achieve an average of 51.5% and 34.9% reduction in execution time, respectively, over the best competitor with SECDED. Similarly, the eNAS/eSAN configurations with E-RAID Level 1, + SECDED achieve an average of 82.3% and 75.6% energy savings, respectively, over the best competitor.

References

[1]
F. Angiolini, D. Atienza, S. Murali, L. Benini, and G. De Micheli. 2006. Reliability support for on-chip memories using networks-on-chip. In Proceedings of the International Conference on Computer Design (ICCD'06).
[2]
A. Ansari, S. Feng, S. Gupta, and S. Mahike. 2009a. Enabling ultra low voltage system operation by tolerating on-chip cache failures. In Proceedings of the 14th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'09). 307--310.
[3]
A. Ansari, S. Gupta, S. Feng, and S. Mahike. 2009b. Zerehcache: Armoring cache architectures in high defect density technologies. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'09). 100--110.
[4]
ARM. 2012. Arm cortex m3 processor. http://www.arm.com/products/processors/cortex-m/cortexm3.php.
[5]
T. Austin, E. Larson, and D. Ernst. 2002. Simplescalar: An infrastructure for computer system modeling. Comput. 35, 2, 59--67.
[6]
K. Bai and A. Shrivastava. 2010. Heap data management for limited local memory (LLM) multi-core processors. In Proceedings of the 8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES/ISSS'10). 317--326.
[7]
A. Banaiyanmofrad, H. Homayoun, and N. Dutt. 2011. Fft-cache: A flexible fault-tolerant cache architecture for ultra low voltage operation. In Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'11). ACM Press, New York, 95--104.
[8]
R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES'02).
[9]
L. Bathen, Y. Ahn, N. Dutt, and S. Pasricha. 2009. Inter-kernel data reuse and pipelining on chip-multiprocessors for multimedia applications. In Proceedings of the 7th IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia'09). 45--54.
[10]
L. Bathen and N. Dutt. 2010. Towards embedded raids-on-chip. Tech. rep. 10-12, Center for Embedded Computer Systems, University of California, Irvine.
[11]
L. Bathen and N. Dutt. 2011. E-roc: Embedded raids-on-chip for low power distributed dynamically managed reliable memories. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'11).
[12]
L. Bathen, N. Dutt, A. Nicolau, and P. Gupta. 2012. Vamv: Variability-aware memory virtualization. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'12). 284--287.
[13]
L. Bathen, D. Shin, S.-S. Lim, and N. Dutt. 2011. Spmvisor: Dynamic scratchpad memory virtualization for secure, low power and high performance, distributed on-chip memories. In Proceedings of the International Conference on Hardware - Software Codesign and System Synthesis (CODES+ISSS'11).
[14]
L. A. Bathen and N. Dutt. 2012. Havoc: A hybrid memory-aware virtualization layer for on-chip distributed scratchpad and non-volatile memories. In Proceedings of the 49th Annual Design Automation Conference (DAC'12). ACM Press, New York.
[15]
L. Benini and G. D. Micheli. 2002. Networks on chips: A new soc paradigm. IEEE Comput. 35, 1.
[16]
D. Bertozzi, L. Benini, and G. De Micheli. 2005. Error control schemes for on-chip communication links: The energy-reliability tradeoff. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 24, 6, 818--831.
[17]
S. Borkar. 2007. Thousand core chips: A technology perspective. In Proceedings of the 44th Annual Design Automation Conference (DAC'07). 746--749.
[18]
B. Calhoun and A. P. Chandrakasan. 2007. A 256-kb 65-nm sub-threshold sram design forultra-low-voltage operation. IEEE J. Solid State Circ. 42, 3, 680--688.
[19]
A. Chakraborty, H. Homayoun, A. Khajeh, N. Dutt, A. Eltawil, and F. Kurdahi. 2010. E<mc2: Less energy through multi-copy cache. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'10). 237--246.
[20]
L. Chang, D. M. Fried, J. W. Hergenrother, J. W. Sleight, R. H. Dennard, et al. 2005. Stable sram cell design for the 32 nm node and beyond. In Proceedings of the Symposium on VLSI Technology (Digest of Technical Papers). 128--129.
[21]
X. Chen, Z. Lu, A. Jantsch, and S. Chen. 2010a. Run-time partitioning of hybrid distributed shared memory on multi-core network-on-chips. In Proceedings of the 3rd International Symposium on Parallel Architectures, Algorithms and Programming (PAAP'10). IEEE Computer Society, 39--46.
[22]
X. Chen, Z. Lu, A. Jantsch, and S. Chen. 2010b. Supporting distributed shared memory on multicore network-on-chips using a dual microcoded controller. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'10). 39--44.
[23]
D. Cho, S. Pasricha, I. Issenin, N. Dutt, Y. Paek, and S. Ko. 2008. Compiler driven data layout optimization for regular/irregular array access patterns. In Proceedings of the ACM/SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'08). 41--50.
[24]
S. Cho and L. Jin. 2006. Managing distributed, shared l2 caches through os-level page allocation. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06). 455--468.
[25]
S.-H. Chou, C.-C. Chen, C.-N. Wen, Y.-C. Chan, T.-F. Chen, C.-C. Wang, and J.-S. Wang. 2009. No cache-coherence: A single-cycle ring interconnection for multi-core l1-nuca sharing on 3D chips. In Proceedings of the 46th ACM/IEEE. Design Automation Conference (DAC'09). 587--592.
[26]
O. Cores. 2014. Openrisc 1200. http://opencores.org/openrisc,or1200.
[27]
A. Das, M. Schuchhardt, N. Hardavellas, G. Memik, and A. Choudhary. 2010. Pad: Power-aware directory placement in distributed caches. Tech. rep. NWU-EECS-10-11, Northwestern University.
[28]
B. Egger, S. Kim, C. Jang, J. Lee, S. L. Min, and H. Shin. 2010. Scratchpad memory management techniques for code in embedded systems without an MMU. IEEE Trans. Comput. 59, 8.
[29]
B. Egger, J. Lee, and H. Shin. 2008. Dynamic scratchpad memory management for code in portable systems with an MMU. ACM Trans. Embedd. Comput. Syst. 7, 2.
[30]
P. Foglia, D. Mangano, and C. Prete. 2005. A nuca model for embedded systems cache design. In Proceedings of the 3rd Workshop on Embedded Systems for Real-Time Multimedia. 41--46.
[31]
L. Gauthier, T. Ishihara, H. Takase, H. Tomiyama, and H. Takada. 2010. Minimizing inter-task interferences in scratch-pad memory usage for reducing the energy consumption of multi-task systems. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'10). 157--166.
[32]
S. Ghosh, S. Basu, and N. Touba. 2004. Reducing power consumption in memory ECC checkers. In Proceedings of the Test Conference (ITC'04).
[33]
D. S. Gracia, G. Dimitrakopoulos, T. M. Arnal, M. G. H. Katevenis, and V. V. Yufera. 2011. Lp-nuca: Networks-in-cache for high-performance low-power embedded processors. IEEE Trans. VLSI Syst. 99, 1.
[34]
T. Granlund, B. Granbom, and N. Olsson. 2003. Soft error rate increase for new generations of srams. IEEE Trans. Nuclear Sci. 50, 6, 2065--2068.
[35]
P. Gratz, C. Kim, R. McDonald, S. Keckler, and D. Burger. 2006. Implementation and evaluation of on-chip network architectures. In Proceedings of the International Conference on Computer Design (ICCD'06). 477--484.
[36]
Y. Hara, H. Tomiyama, S. Honda, H. Takada, and K. Ishii. 2008. Chstone: A benchmark program suite for practical c-based high-level synthesis. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'08). 1192--1195.
[37]
J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, et al. 2010. A 48-core ia-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference (Digest of Technical Papers) (ISSCC'10). 108--109.
[38]
IBM. 2005. The cell project. http://www.research.ibm.com/cell/.
[39]
Intel. 2007. Teraflops research chip. http://techresearch.intel.com/ProjectDetails.aspx?Id=151.
[40]
Intel. 2009. Single-chip cloud computer. http://techresearch.intel.com/ProjectDetails.aspx?Id=1.
[41]
I. Issenin, E. Brockmeyer, B. Durinck, and N. Dutt. 2006. Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies. In Proceedings of the 43rd Annual Design Automation Conference (DAC'06). 49--52.
[42]
I. Issenin, E. Brockmeyer, M. Miranda, and N. Dutt. 2004. Data reuse analysis technique for software-controlled memory hierarchies. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'04). 202--207.
[43]
ITRS. 2007. Process integration, device and structures. http://www.itrs.net/.
[44]
S. Jahinuzzaman, T. Shakir, S. Lubana, J. Shah, and M. Sachdev. 2008. A multiword based high speed ECC scheme for low-voltage embedded srams. In Proceedings of the 34th European Solid-State Circuits Conference (ESSCIRC'08). 226--229.
[45]
S. C. Jung, A. Shrivastava, and K. Bai. 2010. Dynamic code mapping for limited local memory systems. In Proceedings of the 21st IEEE International Conference on Application-Specific Systems Architectures and Processors (ASAP'10). 13--20.
[46]
A. Kahng, B. Li, L.-S. Peh, and K. Samadi. 2012. Orion 2.0: A power-area simulator for interconnection networks. IEEE Trans. VLSI Syst. 20, 1, 191--196.
[47]
H. L. Kalter, C. H. Stapper, J. E. Barth, J. Dilorenzo, C. E. Drake, J. A. Fifield, G. A. Kelley, S. C. Lewis, W. B. Van Der Hoeven, and J. A. Yankosky. 1990. A 50-ns 16-mb dram with a 10-ns data rate and on-chip ECC. IEEE J. Solid-State Circ. 25, 1118--1128.
[48]
M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the 38th Annual Design Automation Conference (DAC'01). 690--695.
[49]
S. Kaneko, H. Kondo, N. Masui, K. Ishimi, T. Itou, et al. 2003. A 600mhz single-chip multiprocessor with 4.8gb/s internal shared pipelined bus and 512kb internal memory. IEEE J. Solid State Circ. 39, 1, 184--193.
[50]
O. Khan, H. Hoffmann, M. Lis, F. Hijaz, A. Agarwal, and S. Devadas. 2011a. Arcc: A case for an architecturally redundant cache-coherence architecture for large multicores. In Proceedings of the 29th IEEE International Conference on Computer Design (ICCD'11).
[51]
O. Khan, M. Lis, Y. Sinangil, and S. Devadas. 2011b. Dcc: A dependable cache coherence multicore architecture. IEEE Comput. Archit. Lett. 10, 12--15.
[52]
C. Kim, D. Burger, and S. W. Keckler. 2002. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. SIGOPS Oper. Syst. Rev. 36, 211--222.
[53]
J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J. C. Hoe. 2007. Multi-bit error tolerant caches using two-dimensional error coding. In Proceedings of the 40th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO'07). 197--209.
[54]
M. Kim, D. Kim, and G. Sobelman. 2006. Network-on-chip quality-of-service through multiprotocol label switching. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'06).
[55]
Y. B. Kim and Y.-B. Kim. 2007. Fault tolerant source routing for network-on-chip. In Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT'07). 12--20.
[56]
S. Kobbe, L. Bauer, D. Lohmann, W. Schroder-Preikschat, and J. Henkel. 2011. DISTRM: Distributed resource management for on-chip many-core systems. In Proceedings of the 7th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'11). ACM Press, New York, 119--128.
[57]
J. Kulkarni, K. Kim, and K. Roy. 2007. A 160 mv, fully differential, robust Schmitt trigger based sub-threshold sram. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'07). 171--176.
[58]
F. Kurdahi, A. Eltawil, K. Yi, S. Cheng, and A. Khajeh. 2010. Low-power multimedia system design by aggressive voltage scaling. IEEE Trans. VLSI Syst. 18, 5.
[59]
C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO'97). 330--335.
[60]
H. Lee, S. Cho, and B. R. Childers. 2011. Cloudcache: Expanding and shrinking private caches. In Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA'11). 219--230.
[61]
K. Lee, A. Shrivastava, I. Issenin, N. Dutt, and N. Venkatasubramanian. 2006. Mitigating soft error failures for multimedia applications by selective data protection. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'06).
[62]
F. Li, G. Chen, M. Kandemir, and I. Kolcu. 2005. Improving scratch-pad memory reliability through compiler-guided data block duplication. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD'05). IEEE Computer Society, 1002--1005.
[63]
X. Liang, R. Canal, G.-Y. Wei, and D. Brooks. 2007. Process variation tolerant 3T1D-based cache architectures. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'07). IEEE Computer Society, 15--26.
[64]
J. Lira, C. Molina, and A. Gonzalez. 2009. Lru-pea: A smart replacement policy for non-uniform cache architectures on chip multiprocessors. In Proceedings of the IEEE International Conference on Computer Design (ICCD'09). 275--281.
[65]
J. Lira, C. Molina, and A. Gonzalez. 2010. The auction: Optimizing banks usage in non-uniform cache architectures. In Proceedings of the 24th ACM International Conference on Supercomputing (ICS'10). ACM Press, New York, 37--47.
[66]
M. Lis, K. S. Shim, O. Khan, and S. Devadas. 2011. Shared memory via execution migration. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'11).
[67]
M. Lucente, C. Harris, and R. Muir. 1990. Memory system reliability improvement through associative cache redundancy. In Proceedings of the IEEE Custom Integrated Circuits Conference.
[68]
M. Makhzan, A. Khajeh, A. Eltawil, and F. Kurdahi. 2007. Limits on voltage scaling for caches utilizing fault tolerant techniques. In Proceedings of the 25th International Conference on Computer Design (ICCD'07). 488--495.
[69]
A. Marongiu and L. Benini. 2010. An openmp compiler for efficient use of distributed scratchpad memory in mpsocs. IEEE Trans. Comput. 99, 1.
[70]
T. Mattson, M. Riepen, T. Lehnig, P. Brett, W. Haas, et al. 2010. The 48-core SCC processor: The programmer's view. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10). 1--11.
[71]
A. Mejia, M. Palesi, J. Flich, S. Kumar, P. Lopez, R. Holsmark, and J. Duato. 2009. Regionbased routing: A mechanism to support efficient routing algorithms in NOCS. IEEE Trans. VLSI Syst. 17, 3, 356--369.
[72]
D. Melpignano, L. Benini, E. Flamand, B. Jego, T. Lepley, et al. 2012. Platform 2012, a many-core computing accelerator for embedded SOCS: Performance evaluation of visual analytics applications. In Proceedings of the 49th Annual Design Automation Conference (DAC'12). 1137--1142.
[73]
M. Monchiero, G. Palermo, C. Silvano, and O. Villa. 2007. Exploration of distributed shared memory architectures for NOC-based multiprocessors. J. Syst. Archit. 53, 10.
[74]
F. Moradi, D. T. Wisland, S. Aunet, H. Mahmoodi, and T. V. Cao. 2008. 65nm sub-threshold 11t-sram for ultra low voltage applications. In Proceedings of the IEEE International SOC Conference. 113--118.
[75]
R. J. T. Morris and B. J. Truskowski. 2003. The evolution of storage systems. IBM Syst. J. 42.
[76]
S. Murali, T. Theocharides, N. Vijaykrishnan, M. Irwin, L. Benini, and G. de Micheli. 2005. Analysis of error recovery schemes for networks on chips. IEEE Des. Test Comput. 22, 5, 434--442.
[77]
M. Mutyam and V. Narayanan. 2007. Working with process variation aware caches. In Proceedings of the Design, Automation and Test in Europe Conference (DATE'07). 1152--1157.
[78]
S. Nassif. 2001. Modeling and analysis of manufacturing variations. In Proceedings of the IEEE Conference on Custom Integrated Circuits. 223--228.
[79]
K. Osada, Y. Saitoh, E. Ibe, and K. Ishibashi. 2003. 16.7 fa/cell tunnel-leakage-suppressed 16 mb sram for handling cosmic-ray-induced multi-errors. In Proceedings of the IEEE International Solid-State Circuits Conference, Digest of Technical Papers (ISSCC'03), vol. 1. 302--494.
[80]
M. Palesi, D. Patti, and F. Fazzino. 2014. Noxim noc simulator. http://www.noxim.org/.
[81]
P. R. Panda, N. D. Dutt, and A. Nicolau. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the European Conference on Design and Test (EDTC'97). 7.
[82]
V. Papirla and C. Chakrabarti. 2009. Energy-aware error control coding for flash memories. In Proceedings of the 46th Annual Design Automation Conference (DAC'09). ACM Press, New York, 658--663.
[83]
Y.-H. Park, S. Pasricha, F. J. Kurdahi, and N. Dutt. 2011. A multi-granularity power modeling methodology for embedded processors. IEEE Trans. VLSI Syst. 19, 4, 668--681.
[84]
S. Pasricha, Y. Zou, D. Connors, and H. J. Siegel. 2010. Oe+ioe: A novel turn model based fault tolerant routing scheme for networks-on-chip. In Proceedings of the 8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES/ISSS'10).
[85]
D. A. Patterson, G. Gibson, and R. H. Katz. 1988. A case for redundant arrays of inexpensive disks (raid). In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'88).
[86]
F. Poletti, P. Marchal, D. Atienza, L. Benini, F. Catthoor, and J. M. Mendias. 2004. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st Annual Design Automation Conference (DAC'04).
[87]
R. Pyka, C. Faßbach, M. Verma, H. Falk, and P. Marwedel. 2007. Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications. In Proceedings of the 10th International Workshop on Software and Compilers for Embedded Systems (SCOPES'07).
[88]
F. Ruckerbauer and G. Georgakos. 2007. Soft error rates in 65nm srams--Analysis of new phenomena. In Proceedings of the 13th IEEE International On-Line Testing Symposium (IOLTS'07). 203--204.
[89]
A. Sasan, H. Homayoun, A. Eltawil, and F. Kurdahi. 2009a. A fault tolerant cache architecture for sub 500mv operation: Resizable data composer cache (RDC-cache). In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'09). 251--260.
[90]
A. Sasan, H. Homayoun, A. Eltawil, and F. Kurdahi. 2009b. Process variation aware sram/cache for aggressive voltage-frequency scaling. In Proceedings of the Design, Automation and Test in Europe Conference (DATE'09).
[91]
T. Schonwald, J. Zimmermann, O. Bringmann, and W. Rosenstiel. 2007. Fully adaptive faulttolerant routing algorithm for network-on-chip architectures. In Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD'07). 527--534.
[92]
M. Shalan and V. J. Mooney. 2000. A dynamic memory management unit for embedded real-time system-on-a-chip. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'00).
[93]
P. Shirvani and E. J. McCluskey. 1999. Padded cache: A new fault-tolerance technique for cache memories. In Proceedings of the 17th IEEE VLSI Test Symposium (VTS'99). 440.
[94]
P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi. 2002. Modeling the effect of technology trends on the soft error rate of combinational logic. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'02). 389--398.
[95]
V. Suhendra, C. Raghavan, and T. Mitra. 2006. Integrated scratchpad memory optimization and task scheduling for mpsoc architectures. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'06). 401--410.
[96]
V. Suhendra, A. Roychoudhury, and T. Mitra. 2008. Scratchpad allocation for concurrent embedded software. In Proceedings of the 6th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'08). 37--42.
[97]
H. Takase, H. Tomiyama, and H. Takada. 2010. Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE'10).
[98]
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. 2004. Hp labs cacti v5.3. cacti 5.1, tr. http://www.hpl.hp.com/techreports/2008/HPL-2008-20.html.
[99]
Tilera. 2010. Tilepro 64. http://www.tilera.com/.
[100]
M. Verma, S. Steinke, and P. Marwedel. 2003. Data partitioning for maximal scratchpad usage. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'03). 77--83.
[101]
C. Wilkerson, A. R. Alameldeen, Z. Chishti, W. Wu, D. Somasekhar, and S.-L. Lu. 2010. Reducing cache power with low-cost, multi-bit error-correcting codes. SIGARCH Comput. Archit. News 38, 3, 83--93.
[102]
C. Wilkerson, H. Gao, A. R. Alameldeen, Z. Chishti, M. Khellah, and S.-L. Lu. 2008. Trading off cache capacity for reliability to enable low voltage operation. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08).
[103]
D. H. Yoon and M. Erez. 2009. Memory mapped ECC: Low-cost error protection for last level caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM Press, New York, 116--127.
[104]
D. H. Yoon and M. Erez. 2010. Virtualized and flexible ECC for main memory. SIGPLAN Not. 45, 397--408.
[105]
W. Zhang. 2004. Enhancing data cache reliability by the addition of a small fully-associative replication cache. In Proceedings of the 18th Annual International Conference on Supercomputing (ICS'04).
[106]
W. Zhang, S. Gurumurthi, M. Kandemir, and A. Sivasubramaniam. 2003. Icr: In-cache replication for enhancing data cache reliability. In Proceedings of the International Conference on Dependable Systems and Networks.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 19, Issue 3
June 2014
257 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/2634048
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 23 June 2014
Accepted: 01 February 2014
Revised: 01 October 2013
Received: 01 January 2012
Published in TODAES Volume 19, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Network-on-chip
  2. distributed memories
  3. many-core platforms
  4. reliability
  5. virtualization

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)An interactive and dynamic scratchpad memory management strategy for multi-core processorsMicroprocessors & Microsystems10.1016/j.micpro.2022.10456592:COnline publication date: 1-Jul-2022
  • (2018)ShaVe-ICEACM Transactions on Embedded Computing Systems10.1145/315766717:2(1-25)Online publication date: 5-Feb-2018
  • (2016)Many-Core Real-Time Task Scheduling with Scratchpad MemoryIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.251651927:10(2953-2966)Online publication date: 1-Oct-2016
  • (2016)Automatic management of Software Programmable Memories in Many-core ArchitecturesIET Computers & Digital Techniques10.1049/iet-cdt.2016.002410:6(288-298)Online publication date: 1-Nov-2016
  • (2014)A Reliability-Aware Address Mapping Strategy for NAND Flash Memory Storage SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2014.234792933:11(1623-1631)Online publication date: Nov-2014

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media