Abstract
High-performance computing (HPC) is at the crossroads of a potential transition toward mobile market processor technology. Unlike in prior transitions, numerous hardware vendors and integrators will have access to state-of-the-art processor designs due to Arm’s licensing business model. This fact gives them greater flexibility to implement custom HPC-specific designs. In this paper, we undertake a study to quantify the different energy-performance trade-offs when architecting a processor based on mobile market technology. Through detailed simulations over a representative set of benchmarks, our results show that: (i) a modest amount of last-level cache per core is sufficient, leading to significant power and area savings; (ii) in-order cores offer favorable trade-offs when compared to out-of-order cores for a wide range of benchmarks; and (iii) heterogeneous configurations help to improve processor performance and energy efficiency.
Similar content being viewed by others
Notes
The Xeon Phi product line has been discontinued.
References
Rajovic N, Carpenter PM, Gelado I, Puzovic N, Ramirez A, Valero M (2013) Supercomputing with commodity CPUs: are mobile SoCs ready for HPC? In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Series SC ’13. ACM, New York, pp 40:1–40:12. https://doi.org/10.1145/2503210.2503281
Rajovic N et al (2016) The mont-blanc prototype: an alternative approach for HPC systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Series. SC ’16, pp 444–455. https://doi.org/10.1109/SC.2016.37
Gwennap L (2014) ThunderX rattles server market. Microprocess Rep 29(6):1–4
Feldman M (2017) Cray to deliver ARM-powered supercomputer to UK Consortium. https://www.top500.org/news/cray-to-deliver-arm-powered-supercomputer-to-uk-consortium/. Accessed 22 Feb 2017
McIntosh-Smith S, Deakin T, Poenaru A (2018) Comparative benchmarking of the first generation of HPC-optimised ARM processors on Isambard. In: Cray User Group (CUG) Conference
Yoshida T (2018) Fujitsu high performance CPU for the Post-K Computer. In: Hot Chips 30 Symposium (HCS), Series Hot Chips ’18. IEEE
Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: a view from Berkeley. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2006-183. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
Asanovic K, Bodik R, Demmel J, Keaveny T, Keutzer K, Kubiatowicz J, Morgan N, Patterson D, Sen K, Wawrzynek J, Wessel D, Yelick K (2009) A view of the parallel computing landscape. Commun ACM 52(10):56–67. https://doi.org/10.1145/1562764.1562783
Dongarra J (2016) Report on the Sunway TaihuLight System. University of Tennessee, Oak Ridge National Laboratory, Tech. Rep. UT-EECS-16-742
Fu H, Liao J, Yang J, Wang L, Song Z, Huang X, Yang C, Xue W, Liu F, Qiao F, Zhao W, Yin X, Hou C, Zhang C, Ge W, Zhang J, Wang Y, Zhou C, Yang G (2016) The Sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci 59(7):072001:1–072001:16. https://doi.org/10.1007/s11432-016-5588-7
Sodani A, Gramunt R, Corbal J, Kim H, Vinod K, Chinthamani S, Hutsell S, Agarwal R, Liu Y (2016) Knights landing: second-generation Intel Xeon Phi Product. IEEE Micro 36(2):34–46. https://doi.org/10.1109/MM.2016.25
Khubaib MA, Suleman M, Hashemi C, Wilkerson, Patt YN (2012) MorphCore: an energy-efficient microarchitecture for high performance ILP and high throughput TLP. In: 45th Annual IEEE/ACM International Symposium on Microarchitecture, Series. MICRO ’12, pp 305–316. https://doi.org/10.1109/MICRO.2012.36
Guevara M, Lubin B, Lee BC (2014) Strategies for anticipating risk in heterogeneous system design. In: 20th IEEE International Symposium on High Performance Computer Architecture, Series HPCA ’14, pp 154–164. https://doi.org/10.1109/HPCA.2014.6835926
Lotfi-Kamran P, Grot B, Ferdman M, Volos S, Kocberber O, Picorel J, Adileh A, Jevdjic D, Idgunji S, Ozer E, Falsafi B (2012) Scale-out processors. In: Proceedings of the 39th Annual International Symposium on Computer Architecture, Series ISCA ’12. IEEE Computer Society, Washington, DC, pp 500–511. http://dl.acm.org/citation.cfm?id=2337159.2337217
Craeynest KV, Jaleel A, Eeckhout L, Narvaez P, Emer J (2012) Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In: 39th Annual International Symposium on Computer Architecture (ISCA), Series ISCA ’12, pp 213–224
Azizi O, Mahesri A, Lee BC, Patel SJ, Horowitz M (2010) Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, Series ISCA ’10. ACM, New York, pp 26–36. https://doi.org/10.1145/1815961.1815967
Hill MD, Marty MR (2008) Amdahl’s law in the multicore era. Computer 41(7):33–38. https://doi.org/10.1109/MC.2008.209
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput. Archit. News 39(2):1–7. https://doi.org/10.1145/2024716.2024718
Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: 42st Annual IEEE/ACM International Symposium on Microarchitecture, Series MICRO ’09, pp 469–480. https://doi.org/10.1145/1669112.1669172
Xi SL, Jacobson HM, Bose P, Wei G, Brooks DM (2015) Quantifying sources of error in McPAT and potential impacts on architectural studies. In: 21st IEEE International Symposium on High Performance Computer Architecture, Series HPCA ’15, pp 577–589. https://doi.org/10.1109/HPCA.2015.7056064
Muralimanohar N, Balasubramonian R, Jouppi N (2007) Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, Series MICRO ’07. IEEE Computer Society, Washington, DC, pp 3–14. https://doi.org/10.1109/MICRO.2007.30
Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, Series PACT ’08. ACM, New York, pp 72–81. https://doi.org/10.1145/1454115.1454128
Chasapis D, Casas M, Moretó M, Vidal R, Ayguadé E, Labarta J, Valero M (2016) PARSECSs: evaluating the impact of task parallelism in the PARSEC benchmark suite. TACO 12(4):41:1–41:22. https://doi.org/10.1145/2829952
Crozier PS, Thornquist HK, Numrich RW, Williams AB, Edwards HC, Keiter ER, Rajan M, Willenbring JM, Doerfler DW, Heroux MA (2009) Improving performance via mini-applications. Sandia National Laboratories Technical Report SAND2009-5574. https://doi.org/10.2172/993908
Hornung RD, Keasler JA, Gokhale MB (2011) Hydrodynamics challenge problem. Lawrence Livermore National Laboratory, Tech. Rep. LLNL-TR-490254
Rajovic N, Rico A, Vipond J, Gelado I, Puzovic N, Ramirez A (2013) Experiences with mobile processors for energy efficient HPC. In: Proceedings of the Conference on Design, Automation and Test in Europe, Series DATE ’13, pp 464–468. http://dl.acm.org/citation.cfm?id=2485288.2485400
Murphy RC, Wheeler KB, Barrett BW, Ang JA (2010) Introducing the graph 500. Cray User’s Group. Tech, Rep
Tramm JR, Siegel AR, Islam T, Schulz M (2014) XSBench—the development and verification of a performance abstraction for Monte Carlo reactor analysis. In: PHYSOR - The Role of Reactor Physics toward a Sustainable Future
Thoman P, Dichev K, Heller T, Iakymchuk R, Aguilar X, Hasanov K, Gschwandtner P, Lemarinier P, Markidis S, Jordan H, Fahringer T, Katrinis K, Laure E, Nikolopoulos DS (2018) A taxonomy of task-based parallel programming technologies for high-performance computing. J Supercomput 74(4):1422–1434. https://doi.org/10.1007/s11227-018-2238-4
Suzumura T, Ueno K, Sato H, Fujisawa K, Matsuoka S (2011) Performance characteristics of Graph500 on large-scale distributed environment. In: Proceedings of the International Symposium on Workload Characterization, Series IISWC ’11, pp 149–158
Fujisawa K (2015) How to win Graph500. http://co-at-work.zib.de/files/20151002-Fujisawa_lecture.pdf. Accessed 28 Mar 2017
Ferdman M, Adileh A, Koçberber YO, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu AD, Ailamaki A, Falsafi B (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2012, London, UK, March 3–7, 2012, Series ASPLOS ’12, pp 37–48. https://doi.org/10.1145/2150976.2150982
Lim K, Ranganathan P, Chang J, Patel C, Mudge T, Reinhardt S (2008) Understanding and designing new server architectures for emerging warehouse-computing environments. In: Proceedings of the 35th Annual International Symposium on Computer Architecture, Series ISCA ’08, Washington, DC, USA, pp 315–326. https://doi.org/10.1109/ISCA.2008.37
Albericio J, Ibez P, Vials V, Llabera JM (2013) The reuse cache: downsizing the shared last-level cache. In: 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 310–321
Siddique NA, Grubel PA, Badawy A-HA, Cook J (2018) A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC. J Supercomput 74(2):665–695. https://doi.org/10.1007/s11227-017-2144-1
Huh J, Burger D, Keckler SW (2001) Exploring the design space of future CMPs. In: 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 8–12 September 2001, Barcelona, Spain, pp 199–210. https://doi.org/10.1109/PACT.2001.953300
Laurenzano MA, Tiwari A, Jundt A, Peraza J Jr, Ward WA, Campbell RL, Carrington L (2014) Characterizing the performance-energy tradeoff of small ARM cores in HPC computation. In: Euro-Par 2014 Parallel Processing—20th International Conference, Porto, Portugal, August 25–29, 2014. Proceedings, pp 124–137. https://doi.org/10.1007/978-3-319-09873-9_11
Warren MS, Weigle EH, Feng W-C (2002) High-density computing: a 240-processor Beowulf in one cubic meter. In: Proceedings of the ACM/IEEE Supercomputing Conference, Series SC ’02, pp 61–61
Nakashima H, Nakamura H, Sato M, Boku T, Matsuoka S, Takahashi D, Hotta Y (2005) Megaproto: 1 tflops/10kw rack is feasible even with only commodity technology. In: Proceedings of the ACM/IEEE Supercomputing Conference, pp 28–28
Vasudevan V, Andersen D, Kaminsky M, Tan L, Franklin J, Moraru I (2010) Energy-efficient cluster computing with FAWN: workloads and implications. In: Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking, Series e-Energy ’10. ACM, New York, pp 195–204. https://doi.org/10.1145/1791314.1791347
Fürlinger K, Klausecker C, Kranzlmüller D (2011) Towards energy efficient parallel computing on consumer electronic devices. In: Proceedings of the First International Conference on Information and Communication on Technology for the Fight Against Global Warming, Series ICT-GLOW’11. Springer, Heidelberg, pp 1–9. http://dl.acm.org/citation.cfm?id=2035539.2035541
Rajovic N, Rico A, Puzovic N, Adeniyi-Jones C, Ramírez A (2014) Tibidabo: making the case for an ARM-based HPC system. Future Gener Comput Syst 36:322–334. https://doi.org/10.1016/j.future.2013.07.013
Acknowledgements
This work has been partially supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), and by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). The Mont-Blanc project receives funding from the EUs H2020 Framework Programme (H2020/2014-2020) under Grant Agreements Nos. 671697 and 779877. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104. M. Casas has been partially supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the European Union (Contract 2013 BP B 00243). Finally, A. Armejach has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Juan de la Cierva postdoctoral fellowship number FJCI-2015-24753.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Armejach, A., Casas, M. & Moretó, M. Design trade-offs for emerging HPC processors based on mobile market technology. J Supercomput 75, 5717–5740 (2019). https://doi.org/10.1007/s11227-019-02819-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-02819-4