Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Design trade-offs for emerging HPC processors based on mobile market technology

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

High-performance computing (HPC) is at the crossroads of a potential transition toward mobile market processor technology. Unlike in prior transitions, numerous hardware vendors and integrators will have access to state-of-the-art processor designs due to Arm’s licensing business model. This fact gives them greater flexibility to implement custom HPC-specific designs. In this paper, we undertake a study to quantify the different energy-performance trade-offs when architecting a processor based on mobile market technology. Through detailed simulations over a representative set of benchmarks, our results show that: (i) a modest amount of last-level cache per core is sufficient, leading to significant power and area savings; (ii) in-order cores offer favorable trade-offs when compared to out-of-order cores for a wide range of benchmarks; and (iii) heterogeneous configurations help to improve processor performance and energy efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The Xeon Phi product line has been discontinued.

References

  1. Rajovic N, Carpenter PM, Gelado I, Puzovic N, Ramirez A, Valero M (2013) Supercomputing with commodity CPUs: are mobile SoCs ready for HPC? In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Series SC ’13. ACM, New York, pp 40:1–40:12. https://doi.org/10.1145/2503210.2503281

  2. Rajovic N et al (2016) The mont-blanc prototype: an alternative approach for HPC systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Series. SC ’16, pp 444–455. https://doi.org/10.1109/SC.2016.37

  3. Gwennap L (2014) ThunderX rattles server market. Microprocess Rep 29(6):1–4

    Google Scholar 

  4. Feldman M (2017) Cray to deliver ARM-powered supercomputer to UK Consortium. https://www.top500.org/news/cray-to-deliver-arm-powered-supercomputer-to-uk-consortium/. Accessed 22 Feb 2017

  5. McIntosh-Smith S, Deakin T, Poenaru A (2018) Comparative benchmarking of the first generation of HPC-optimised ARM processors on Isambard. In: Cray User Group (CUG) Conference

  6. Yoshida T (2018) Fujitsu high performance CPU for the Post-K Computer. In: Hot Chips 30 Symposium (HCS), Series Hot Chips ’18. IEEE

  7. Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: a view from Berkeley. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2006-183. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html

  8. Asanovic K, Bodik R, Demmel J, Keaveny T, Keutzer K, Kubiatowicz J, Morgan N, Patterson D, Sen K, Wawrzynek J, Wessel D, Yelick K (2009) A view of the parallel computing landscape. Commun ACM 52(10):56–67. https://doi.org/10.1145/1562764.1562783

    Article  Google Scholar 

  9. Dongarra J (2016) Report on the Sunway TaihuLight System. University of Tennessee, Oak Ridge National Laboratory, Tech. Rep. UT-EECS-16-742

  10. Fu H, Liao J, Yang J, Wang L, Song Z, Huang X, Yang C, Xue W, Liu F, Qiao F, Zhao W, Yin X, Hou C, Zhang C, Ge W, Zhang J, Wang Y, Zhou C, Yang G (2016) The Sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci 59(7):072001:1–072001:16. https://doi.org/10.1007/s11432-016-5588-7

    Article  Google Scholar 

  11. Sodani A, Gramunt R, Corbal J, Kim H, Vinod K, Chinthamani S, Hutsell S, Agarwal R, Liu Y (2016) Knights landing: second-generation Intel Xeon Phi Product. IEEE Micro 36(2):34–46. https://doi.org/10.1109/MM.2016.25

    Article  Google Scholar 

  12. Khubaib MA, Suleman M, Hashemi C, Wilkerson, Patt YN (2012) MorphCore: an energy-efficient microarchitecture for high performance ILP and high throughput TLP. In: 45th Annual IEEE/ACM International Symposium on Microarchitecture, Series. MICRO ’12, pp 305–316. https://doi.org/10.1109/MICRO.2012.36

  13. Guevara M, Lubin B, Lee BC (2014) Strategies for anticipating risk in heterogeneous system design. In: 20th IEEE International Symposium on High Performance Computer Architecture, Series HPCA ’14, pp 154–164. https://doi.org/10.1109/HPCA.2014.6835926

  14. Lotfi-Kamran P, Grot B, Ferdman M, Volos S, Kocberber O, Picorel J, Adileh A, Jevdjic D, Idgunji S, Ozer E, Falsafi B (2012) Scale-out processors. In: Proceedings of the 39th Annual International Symposium on Computer Architecture, Series ISCA ’12. IEEE Computer Society, Washington, DC, pp 500–511. http://dl.acm.org/citation.cfm?id=2337159.2337217

  15. Craeynest KV, Jaleel A, Eeckhout L, Narvaez P, Emer J (2012) Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In: 39th Annual International Symposium on Computer Architecture (ISCA), Series ISCA ’12, pp 213–224

  16. Azizi O, Mahesri A, Lee BC, Patel SJ, Horowitz M (2010) Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, Series ISCA ’10. ACM, New York, pp 26–36. https://doi.org/10.1145/1815961.1815967

  17. Hill MD, Marty MR (2008) Amdahl’s law in the multicore era. Computer 41(7):33–38. https://doi.org/10.1109/MC.2008.209

    Article  Google Scholar 

  18. Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput. Archit. News 39(2):1–7. https://doi.org/10.1145/2024716.2024718

    Article  Google Scholar 

  19. Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: 42st Annual IEEE/ACM International Symposium on Microarchitecture, Series MICRO ’09, pp 469–480. https://doi.org/10.1145/1669112.1669172

  20. Xi SL, Jacobson HM, Bose P, Wei G, Brooks DM (2015) Quantifying sources of error in McPAT and potential impacts on architectural studies. In: 21st IEEE International Symposium on High Performance Computer Architecture, Series HPCA ’15, pp 577–589. https://doi.org/10.1109/HPCA.2015.7056064

  21. Muralimanohar N, Balasubramonian R, Jouppi N (2007) Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, Series MICRO ’07. IEEE Computer Society, Washington, DC, pp 3–14. https://doi.org/10.1109/MICRO.2007.30

  22. Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, Series PACT ’08. ACM, New York, pp 72–81. https://doi.org/10.1145/1454115.1454128

  23. Chasapis D, Casas M, Moretó M, Vidal R, Ayguadé E, Labarta J, Valero M (2016) PARSECSs: evaluating the impact of task parallelism in the PARSEC benchmark suite. TACO 12(4):41:1–41:22. https://doi.org/10.1145/2829952

    Article  Google Scholar 

  24. Crozier PS, Thornquist HK, Numrich RW, Williams AB, Edwards HC, Keiter ER, Rajan M, Willenbring JM, Doerfler DW, Heroux MA (2009) Improving performance via mini-applications. Sandia National Laboratories Technical Report SAND2009-5574. https://doi.org/10.2172/993908

  25. Hornung RD, Keasler JA, Gokhale MB (2011) Hydrodynamics challenge problem. Lawrence Livermore National Laboratory, Tech. Rep. LLNL-TR-490254

  26. Rajovic N, Rico A, Vipond J, Gelado I, Puzovic N, Ramirez A (2013) Experiences with mobile processors for energy efficient HPC. In: Proceedings of the Conference on Design, Automation and Test in Europe, Series DATE ’13, pp 464–468. http://dl.acm.org/citation.cfm?id=2485288.2485400

  27. Murphy RC, Wheeler KB, Barrett BW, Ang JA (2010) Introducing the graph 500. Cray User’s Group. Tech, Rep

  28. Tramm JR, Siegel AR, Islam T, Schulz M (2014) XSBench—the development and verification of a performance abstraction for Monte Carlo reactor analysis. In: PHYSOR - The Role of Reactor Physics toward a Sustainable Future

  29. Thoman P, Dichev K, Heller T, Iakymchuk R, Aguilar X, Hasanov K, Gschwandtner P, Lemarinier P, Markidis S, Jordan H, Fahringer T, Katrinis K, Laure E, Nikolopoulos DS (2018) A taxonomy of task-based parallel programming technologies for high-performance computing. J Supercomput 74(4):1422–1434. https://doi.org/10.1007/s11227-018-2238-4

    Article  Google Scholar 

  30. Suzumura T, Ueno K, Sato H, Fujisawa K, Matsuoka S (2011) Performance characteristics of Graph500 on large-scale distributed environment. In: Proceedings of the International Symposium on Workload Characterization, Series IISWC ’11, pp 149–158

  31. Fujisawa K (2015) How to win Graph500. http://co-at-work.zib.de/files/20151002-Fujisawa_lecture.pdf. Accessed 28 Mar 2017

  32. Ferdman M, Adileh A, Koçberber YO, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu AD, Ailamaki A, Falsafi B (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2012, London, UK, March 3–7, 2012, Series ASPLOS ’12, pp 37–48. https://doi.org/10.1145/2150976.2150982

  33. Lim K, Ranganathan P, Chang J, Patel C, Mudge T, Reinhardt S (2008) Understanding and designing new server architectures for emerging warehouse-computing environments. In: Proceedings of the 35th Annual International Symposium on Computer Architecture, Series ISCA ’08, Washington, DC, USA, pp 315–326. https://doi.org/10.1109/ISCA.2008.37

  34. Albericio J, Ibez P, Vials V, Llabera JM (2013) The reuse cache: downsizing the shared last-level cache. In: 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 310–321

  35. Siddique NA, Grubel PA, Badawy A-HA, Cook J (2018) A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC. J Supercomput 74(2):665–695. https://doi.org/10.1007/s11227-017-2144-1

    Article  Google Scholar 

  36. Huh J, Burger D, Keckler SW (2001) Exploring the design space of future CMPs. In: 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 8–12 September 2001, Barcelona, Spain, pp 199–210. https://doi.org/10.1109/PACT.2001.953300

  37. Laurenzano MA, Tiwari A, Jundt A, Peraza J Jr, Ward WA, Campbell RL, Carrington L (2014) Characterizing the performance-energy tradeoff of small ARM cores in HPC computation. In: Euro-Par 2014 Parallel Processing—20th International Conference, Porto, Portugal, August 25–29, 2014. Proceedings, pp 124–137. https://doi.org/10.1007/978-3-319-09873-9_11

    Google Scholar 

  38. Warren MS, Weigle EH, Feng W-C (2002) High-density computing: a 240-processor Beowulf in one cubic meter. In: Proceedings of the ACM/IEEE Supercomputing Conference, Series SC ’02, pp 61–61

  39. Nakashima H, Nakamura H, Sato M, Boku T, Matsuoka S, Takahashi D, Hotta Y (2005) Megaproto: 1 tflops/10kw rack is feasible even with only commodity technology. In: Proceedings of the ACM/IEEE Supercomputing Conference, pp 28–28

  40. Vasudevan V, Andersen D, Kaminsky M, Tan L, Franklin J, Moraru I (2010) Energy-efficient cluster computing with FAWN: workloads and implications. In: Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking, Series e-Energy ’10. ACM, New York, pp 195–204. https://doi.org/10.1145/1791314.1791347

  41. Fürlinger K, Klausecker C, Kranzlmüller D (2011) Towards energy efficient parallel computing on consumer electronic devices. In: Proceedings of the First International Conference on Information and Communication on Technology for the Fight Against Global Warming, Series ICT-GLOW’11. Springer, Heidelberg, pp 1–9. http://dl.acm.org/citation.cfm?id=2035539.2035541

    Google Scholar 

  42. Rajovic N, Rico A, Puzovic N, Adeniyi-Jones C, Ramírez A (2014) Tibidabo: making the case for an ARM-based HPC system. Future Gener Comput Syst 36:322–334. https://doi.org/10.1016/j.future.2013.07.013

    Article  Google Scholar 

Download references

Acknowledgements

This work has been partially supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), and by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). The Mont-Blanc project receives funding from the EUs H2020 Framework Programme (H2020/2014-2020) under Grant Agreements Nos. 671697 and 779877. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104. M. Casas has been partially supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the European Union (Contract 2013 BP B 00243). Finally, A. Armejach has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Juan de la Cierva postdoctoral fellowship number FJCI-2015-24753.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrià Armejach.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Armejach, A., Casas, M. & Moretó, M. Design trade-offs for emerging HPC processors based on mobile market technology. J Supercomput 75, 5717–5740 (2019). https://doi.org/10.1007/s11227-019-02819-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02819-4

Keywords

Navigation