Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

FOS: a low-power cache organization for multicores

Published: 01 October 2019 Publication History

Abstract

The cache hierarchy of current multicore processors typically consists of one or two levels of private caches per core and a large shared last-level cache. This approach incurs area and energy wasting due to oversizing the private cache space, data replication through the inclusive cache levels, as well as the use of highly set-associative caches. In this paper, we claim that although this is the commonly adopted approach, it presents important design issues that can be addressed by a more energy efficient organization. This work proposes Flat On-chip Storage (FOS), a novel cache organization that, aimed at addressing energy and area on low-power processors, resolves the mentioned issues. For this purpose, FOS combines L2 and L3 cache levels into a single one, organized as a flat space, and composed of a pool of private small cache slices. These slices are initially powered off to save energy, and they are powered on and assigned to cores provided that the system performance is expected to improve. To provide fast and uniform access from the private L1 caches to the FOS’s cache slices, multiple architectural challenges are overcome, which entails the design of a custom optical network-on-chip. Experimental results show that FOS achieves significant energy savings on both static and dynamic energy over conventional cache organizations with the same storage capacity. FOS static energy savings are as much as 60% over an electrically connected shared cache; these savings grow up to 75% compared to optically connected baselines. Moreover, despite deactivating part of the cache space, FOS achieves similar performance values as those achieved by conventional approaches.

References

[1]
Awasthi M, Sudan K, Balasubramonian R, Carter J (2009) Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches. In: 2009 IEEE 15th International Symposium on High Performance Computer Architecture, pp 250–261. https://doi.org/10.1109/HPCA.2009.4798260
[2]
Baer J, Low D, Crowley P, Sidhwaney N (2003) Memory hierarchy design for a multiprocessor look-up engine. In: 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003)
[3]
Bahirat S, Pasricha S (2014) Meteor: hybrid photonic ring-mesh network-on-chip for multicore architectures. ACM Trans Embed Comput Syst 13(3s):116:1–116:33. https://doi.org/10.1145/2567940
[4]
Bartolini S, Grani P (2012) A simple on-chip optical interconnection for improving performance of coherency traffic in CMPS. In: 15th Euromicro Conference on Digital System Design, pp 312–318. https://doi.org/10.1109/DSD.2012.13
[5]
Beckmann BM, Marty MR, Wood DA (2006) ASR: adaptive selective replication for CMP caches. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 39. IEEE Computer Society, Washington, DC, USA, pp 443–454. https://doi.org/10.1109/MICRO.2006.10
[6]
Beckmann N, Sanchez D (2013) Jigsaw: scalable software-defined caches. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT ’13. IEEE Press, Piscataway, NJ, USA, pp 213–224. https://doi.org/10.1109/PACT.2013.6618818
[7]
Bergman K, Carloni LP, Bibermani AC, Hendry G (2014) Photonic network-on-chip design, vol 68. Springer, New York
[8]
Chang J, Sohi GS (2006) Cooperative caching for chip multiprocessors. In: Proceedings 33rd Annual International Symposium on Computer Architecture, pp 264–276. https://doi.org/10.1109/ISCA.2006.17
[9]
Chen G, Chen H, Haurylau M, Nelson N, Fauchet PM, Friedman EG, Albonesi D (2005) Predictions of CMOS compatible on-chip optical interconnect. In: Proceedings of International Workshop on System Level Interconnect Prediction, SLIP ’05, pp 13–20
[10]
Chishti Z, Powell MD, Vijaykumar TN (2005) Optimizing replication, communication, and capacity allocation in cmps. SIGARCH Comput Archit News 33(2):357–368. https://doi.org/10.1145/1080695.1070001
[11]
Cho S, Jin L (2006) Managing distributed, shared l2 caches through os-level page allocation. In: 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06), pp 455–468. https://doi.org/10.1109/MICRO.2006.31
[12]
Cianchetti MJ, Kerekes JC, Albonesi DH (2009) Phastlane: a rapid transit optical routing network. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA’09, pp 441–450. https://doi.org/10.1145/1555754.1555809
[13]
Demir Y, Hardavellas N (2015) Parka: thermally insulated nanophotonic interconnects. In: NOCS ’15, pp 1:1–1:8. https://doi.org/10.1145/2786572.2786597
[14]
Duan GH, Fedeli JM, Keyvaninia S, Thomson D (2012) 10 gb/s integrated tunable hybrid iii-v/si laser and silicon mach-zehnder modulator. In: European Conference and Exhibition on Optical Communication. https://doi.org/10.1364/ECEOC.2012.Tu.4.E.2
[15]
Dybdahl H, Stenstrom P (2007) An adaptive shared/private NUCA cache partitioning scheme for chip multiprocessors. In: 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pp 2–12. https://doi.org/10.1109/HPCA.2007.346180
[16]
García A, Fernández R, Garca JM, Bartolini S (2014) Managing resources dynamically in hybrid photonic-electronic networks-on-chip. Concurr Comput Pract Exp 26(15):2530–2550. https://doi.org/10.1002/cpe.3332
[17]
Hardavellas N, Ferdman M, Falsafi B, Ailamaki A (2009) Reactive NUCA: near-optimal block placement and replication in distributed caches. SIGARCH Comput Archit News 37(3):184–195. https://doi.org/10.1145/1555815.1555779
[18]
Herrero E, González J, Canal R (2008) Distributed cooperative caching. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT ’08, pp 134–143. https://doi.org/10.1145/1454115.1454136
[19]
Herrero E, González J, Canal R (2010) Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, pp 419–428. https://doi.org/10.1145/1815961.1816018
[20]
Huh J, Kim C, Shafi H, Zhang L, Burger D, Keckler SW (2005) A NUCA substrate for flexible CMP cache sharing. In: Proceedings of the 19th Annual International Conference on Supercomputing, ICS ’05. ACM, pp 31–40. https://doi.org/10.1145/1088149.1088154
[21]
Kahng AB, Li B, Peh LS, Samadi K (2009) Orion 2.0: a fast and accurate NoC power and area model for early-stage design space exploration. In: DATE. European Design and Automation Association, pp 423–428
[22]
Kaxiras S, Hu Z, Martonosi M (2001) Cache decay: exploiting generational behavior to reduce cache leakage power. In: Proceedings of the 28th Annual International Symposium on Computer Architecture, ISCA’01, pp 240–251
[23]
Kim S, Chandra D, Solihin D (2004) Fair cache sharing and partitioning in a chip multiprocessor architecture. In: PACT, pp 111–122
[24]
Merino J, Puente V, Gregorio JA (2010) ESP-NUCA: a low-cost adaptive non-uniform cache architecture. In: HPCA-16 2010 the Sixteenth International Symposium on High-performance Computer Architecture, pp 1–10. https://doi.org/10.1109/HPCA.2010.5416641
[25]
Morris R, Kodi AK, Louri A (2012) Dynamic reconfiguration of 3d photonic networks-on-chip for maximizing performance and improving fault tolerance. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp 282–293. https://doi.org/10.1109/MICRO.2012.34
[26]
Muralimanohar N, Balasubramonian R, Jouppi NP (2009) Cacti 6.0: a tool to model large caches. In: HP Laboratories
[27]
Pang J, Dwyer C, Lebeck AR (2013) Exploiting emerging technologies for nanoscale photonic networks-on-chip. In: Proceedings of 6th International Workshop on NoC Architectures, NoCArc ’13, pp 53–58
[28]
Petit S, Sahuquillo J, Such JM, Kaeli DR (2005) Exploiting temporal locality in drowsy cache policies. In: Proceedings of the Second Conference on Computing Frontiers, Ischia, Italy, 4–6 May 2005, pp 371–377
[29]
Pons L, Selfa V, Sahuquillo J, Petit S, Pons J (2018) Improving system turnaround time with intel CAT by identifying LLC critical applications. In: Euro-Par 2018—Parallel Processing—24th International Conference on Parallel and Distributed Computing, Turin, Italy, 27–31 Aug 2018, Proceedings, pp 603–615. https://doi.org/10.1007/978-3-319-96983-1_43
[30]
Qureshi M, Patt Y (2006) Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: MICRO, pp 423–432
[31]
Rivers JA, Tam ES, Tyson GS, Davidson ES, Farrens MK (1998) Utilizing reuse information in data cache management. In: Proceedings of the 12th International Conference on Supercomputing, ICS 1998, Melbourne, Australia, 13–17 July 1998, pp 449–456. https://doi.org/10.1145/277830.277941
[32]
Rosenfeld P, Cooper-Balis E, Jacob B (2011) Dramsim2: a cycle accurate memory system simulator. IEEE Comput Archit Lett 10:16–19. https://doi.org/10.1109/L-CA.2011.4
[33]
Sahuquillo J, Pont A (1999) The filter cache: a run-time cache management approach1. In: 25th EUROMICRO ’99 Conference, Informatics: Theory and Practice for the New Millenium, 8–10 Sept 1999, Milan, Italy, pp 1424–1431. https://doi.org/10.1109/EURMIC.1999.794504
[34]
Sahuquillo J, Pont A (2000) Splitting the data cache: a survey. IEEE Concurr 8(3):30–35. https://doi.org/10.1109/4434.865890
[35]
Selfa V, Sahuquillo J, Eeckhout L, Petit S, Gómez ME (2017) Application clustering policies to address system fairness with intel’s cache allocation technology. In: 26th International Conference on Parallel Architectures and Compilation Techniques, PACT 2017, Portland, OR, USA, 9–13 Sept 2017, pp 194–205. https://doi.org/10.1109/PACT.2017.19
[36]
Shacham A, Bergman K, Carloni L (2007) On the design of a photonic network-on-chip. In: Networks-on-Chip, NOCS 2007, pp 53–64
[37]
Soref R, Bennett B (1987) Electrooptical effects in silicon. IEEE J Quantum Electron 23(1):123–129. https://doi.org/10.1109/JQE.1987.1073206
[38]
Henning JL (2006) SPEC CPU2006 benchmark descriptions. SIGARCH Comput Archit News 34(4):1–17. https://doi.org/10.1145/1186736.1186737
[39]
Tsai PA, Beckmann N, Sanchez D (2017) Jenga: software-defined cache hierarchies. SIGARCH Comput Archit News 45(2):652–665. https://doi.org/10.1145/3140659.3080214
[40]
Ubal R, Sahuquillo J, Petit S, Lopez P (2007) Multi2sim: a simulation framework to evaluate multicore-multithreaded processors. In: International Symposium on Computer Architecture and High Performance Computing, pp 62–68. https://doi.org/10.1109/SBAC-PAD.2007.17
[41]
Valero A, Sahuquillo J, Petit S, López P, Duato J (2012) Combining recency of information with selective random and a victim cache in last-level caches. ACM Trans Archit Code Optim 9(3):16:1–16:20. https://doi.org/10.1145/2355585.2355589
[42]
Vantrease D, Binkert N, Schreiber R, Lipasti M (2009) Light speed arbitration and flow control for nanophotonic interconnects. In: Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium, pp 304–315
[43]
Werner S, Navaridas J, Lujan M (2017) Designing low-power, low-latency networks-on-chip by optimally combining electrical and optical links. In: 2017 IEEE International Symposium of High Performance Computer Architecture

Index Terms

  1. FOS: a low-power cache organization for multicores
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image The Journal of Supercomputing
        The Journal of Supercomputing  Volume 75, Issue 10
        Oct 2019
        878 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 01 October 2019

        Author Tags

        1. Cache hierarchy
        2. Multicores
        3. Energy efficiency

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 14 Nov 2024

        Other Metrics

        Citations

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media