Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2503210.2503281acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Supercomputing with commodity CPUs: are mobile SoCs ready for HPC?

Published: 17 November 2013 Publication History

Abstract

In the late 1990s, powerful economic forces led to the adoption of commodity desktop processors in high-performance computing. This transformation has been so effective that the June 2013 TOP500 list is still dominated by x86.
In 2013, the largest commodity market in computing is not PCs or servers, but mobile computing, comprising smart-phones and tablets, most of which are built with ARM-based SoCs. This leads to the suggestion that once mobile SoCs deliver sufficient performance, mobile SoCs can help reduce the cost of HPC.
This paper addresses this question in detail. We analyze the trend in mobile SoC performance, comparing it with the similar trend in the 1990s. We also present our experience evaluating performance and efficiency of mobile SoCs, deploying a cluster and evaluating the network and scalability of production applications. In summary, we give a first answer as to whether mobile SoCs are ready for HPC.

References

[1]
N. R. Adiga, G. Almási, G. S. Almasi, Y. Aridor, R. Barik, D. Beece, R. Bellofatto, G. Bhanot, et al. An overview of the BlueGene/L supercomputer. In ACM/IEEE 2002 Conference on Supercomputing. IEEE Computer Society, 2002.
[2]
S. Alam, R. Barrett, M. Bast, M. R. Fahey, J. Kuehn, C. McCurdy, J. Rogers, P. Roth, R. Sankaran, J. S. Vetter, P. Worley, and W. Yu. Early evaluation of IBM BlueGene/P. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pages 23:1--23:12, Piscataway, NJ, USA, 2008. IEEE Press.
[3]
Applied Micro. APM "X-Gene" Launch Press Briefing. http://www.apm.com/global/x-gene/docs/X-GeneOverview.pdf, 2012.
[4]
J. Balart, A. Duran, M. Gonzàlez, X. Martorell, E. Ayguadé, and J. Labarta. Nanos Mercurium: a research compiler for OpenMP. In Proceedings of the European Workshop on OpenMP, volume 8, 2004.
[5]
H. Berendsen, D. van der Spoel, and R. van Drunen. Gromacs: A message-passing parallel molecular dynamics implementation. Computer Physics Communications, 91(1):43--56, 1995.
[6]
E. Blem, J. Menon, and K. Sankaralingam. Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures. In 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013), 2013.
[7]
Calxeda. Calxeda EnergyCore ECX-1000 Series. http://www.calxeda.com/wp-content/uploads/2012/06/ECX1000-Product-Brief-612.pdf, 2012.
[8]
DEISA project. DEISA 2; Distributed European Infrastructure for Supercomputing Applications; Maintenance of the DEISA Benchmark Suite in the Second Year. Technical report.
[9]
J. Dongarra, P. Luszczek, and A. Petitet. The LINPACK Benchmark: past, present and future. Concurrency and Computation: Practice and Experience, 15(9):803--820, 2003.
[10]
A. Duran, E. Ayguade, R. M. Badia, J. Labarta, L. Martinell, X. Martorell, and J. Planas. OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, 21(02):173--193, 2011.
[11]
M. Folk, A. Cheng, and K. Yates. HDF5: A file format and I/O library for high performance computing applications. In Proceedings of Supercomputing, volume 99, 1999.
[12]
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005. Special issue on "Program Generation, Optimization, and Platform Adaptation".
[13]
D. Göddeke, D. Komatitsch, M. Geveler, D. Ribbrock, N. Rajovic, N. Puzovic, and A. Ramirez. Energy efficiency vs. performance of the numerical solution of PDEs: an application study on a low-power ARM-based cluster. Journal of Computational Physics, 237:132--150, Mar. 2013.
[14]
B. Goglin. High-Performance Message Passing over generic Ethernet Hardware with Open-MX. Elsevier Journal of Parallel Computing (PARCO), 37(2):85--100, Feb. 2011.
[15]
R. Grisenthwaite. ARMv8 Technology Preview. http://www.arm.com/files/downloads/ARMv8_Architecture.pdf, 2011.
[16]
R. Haring, M. Ohmacht, T. Fox, M. Gschwind, D. Satterfield, K. Sugavanam, P. Coteus, P. Heidelberger, M. Blumrich, R. Wisniewski, a. gara, G. Chiu, P. Boyle, N. Chist, and C. Kim. The IBM Blue Gene/Q Compute Chip. Micro, IEEE, 32(2):48--60, march-april 2012.
[17]
J.-H. Huang. GTC 2013 Keynote, March 2013.
[18]
IBM Systems and Technology. IBM System Blue Gene/Q Data Sheet, November 2011.
[19]
IHS iSuppli News Flash. Low-End Google Nexus 7 Carries $152 BOM, Teardown Reveals. http://www.isuppli.com/Teardowns/News/pages/Low-End-Google-Nexus-7-Carries-\$157-BOM-Teardown-Reveals.aspx.
[20]
Intel. Intel ATOM S1260. http://ark.intel.com/products/71267/Intel-Atom-Processor-S1260-1MB-Cache-2_00-GHz.
[21]
Intel. Intel MPI Benchmarks 3.2.4. http://software.intel.com/en-us/articles/intel-mpi-benchmarks.
[22]
Intel. Intel Xeon Processor E5-2670. http://ark.intel.com/products/64595/Intel-Xeon-Processor-E5-2670-20M-Cache-2_60-GHz-8_00-GTs-Intel-QPI.
[23]
Khronos OpenCL Working Group. The OpenCL Specification, version 1.0.29. http://khronos.org/registry/cl/specs/opencl-1.0.29.pdf, 2008.
[24]
D. Komatitsch and J. Tromp. Introduction to the spectral element method for three-dimensional seismic wave propagation. Geophysical Journal International, 139(3):806--822, 1999.
[25]
W. Lang, J. Patel, and S. Shankar. Wimpy node clusters: What about non-wimpy workloads? In Proceedings of the Sixth International Workshop on Data Management on New Hardware, pages 47--55. ACM, 2010.
[26]
S. Li, K. Lim, P. Faraboschi, J. Chang, P. Ranganathan, and N. P. Jouppi. System-level integrated server architectures for scale-out datacenters. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pages 260--271. ACM, 2011.
[27]
T. Mattson and G. Henry. An Overview of the Intel TFLOPS Supercomputer. Intel Technology Journal, 2(1), 1998.
[28]
J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.
[29]
H. Nakashima, H. Nakamura, M. Sato, T. Boku, S. Matsuoka, D. Takahashi, and Y. Hotta. Megaproto: 1 TFlops/10kW rack is feasible even with only commodity technology. In Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing. IEEE, 2005.
[30]
NVIDIA. CUDA Programming Guide 2.2, 2009.
[31]
NVIDIA. Bringing High-End Graphics to Handheld Devices, 2011.
[32]
V. Pillet, J. Labarta, T. Cortes, and S. Girona. Paraver: A tool to visualize and analyze parallel code. WoTUG-18, pages 17--31, 1995.
[33]
N. Rajovic, A. Rico, N. Puzovic, C. Adeniyi-Jones, and A. Ramirez. Tibidabo: Making the case for an ARM-based HPC system. Future Generation Computer Systems, 2013.
[34]
N. Rajovic, A. Rico, J. Vipond, I. Gelado, N. Puzovic, and A. Ramirez. Experiences With Mobile Processors for Energy Efficient HPC. In Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 464--468, 2013.
[35]
N. Rajovic, L. Vilanova, C. Villavieja, N. Puzovic, and A. Ramirez. The low power architecture approach towards exascale computing. Journal of Computational Science, 2013.
[36]
K. P. Saravanan, P. M. Carpenter, and A. Ramirez. Power/Performance evaluation of Energy Efficient Ethernet (EEE) for High Performance Computing. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, April 2013.
[37]
B. Schroeder, E. Pinheiro, and W.-D. Weber. DRAM errors in the wild: a large-scale field study. In Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems, pages 193--204. ACM, 2009.
[38]
B. Subramaniam and W.-c. Feng. The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems. In 8th IEEE Workshop on High-Performance, Power-Aware Computing (HPPAC), Shanghai, China, May 2012.
[39]
Texas Instruments. AM5K2E04/02 Multicore ARM KeyStone II System-on-Chip (SoC) Data Manual, November 2012.
[40]
R. Teyssier. Cosmological hydrodynamics with adaptive mesh refinement. Astronomy and Astrophysics, 385(1):337--364, 2002.
[41]
TOP500. Top500®supercomputer cites. http://www.top500.org/.
[42]
J. Turley. Cortex-A15 "Eagle" flies the coop. Microprocessor Report, 24(11):1--11, November 2010.
[43]
V. Vasudevan, D. Andersen, M. Kaminsky, L. Tan, J. Franklin, and I. Moraru. Energy-efficient cluster computing with fawn: Workloads and implications. In Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking, pages 195--204. ACM, 2010.
[44]
M. Warren, E. Weigle, and W. Feng. High-density computing: A 240-processor beowulf in one cubic meter. In Supercomputing, ACM/IEEE 2002 Conference, pages 61--61. IEEE, 2002.
[45]
R. Whaley and J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1--27. IEEE Computer Society, 1998.
[46]
Yokogawa. WT210/WT230 Digital Power Meters. http://tmi.yokogawa.com/products/digital-power-analyzers/digital-power-analyzers/wt210wt230-digital-power-meters/.

Cited By

View all
  • (2024)An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FXProceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops10.1145/3636480.3637094(7-16)Online publication date: 11-Jan-2024
  • (2023)Junkyard Computing: Repurposing Discarded Smartphones to Minimize CarbonProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575710(400-412)Online publication date: 27-Jan-2023
  • (2023)Mobile crowd computing: potential, architecture, requirements, challenges, and applicationsThe Journal of Supercomputing10.1007/s11227-023-05545-080:2(2223-2318)Online publication date: 29-Jul-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
  • General Chair:
  • William Gropp,
  • Program Chair:
  • Satoshi Matsuoka
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SC13
Sponsor:

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)3
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FXProceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops10.1145/3636480.3637094(7-16)Online publication date: 11-Jan-2024
  • (2023)Junkyard Computing: Repurposing Discarded Smartphones to Minimize CarbonProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575710(400-412)Online publication date: 27-Jan-2023
  • (2023)Mobile crowd computing: potential, architecture, requirements, challenges, and applicationsThe Journal of Supercomputing10.1007/s11227-023-05545-080:2(2223-2318)Online publication date: 29-Jul-2023
  • (2022)A Satellite-Born Server Design with Massive Tiny Chips Towards In-Space Computing2022 IEEE International Conference on Satellite Computing (Satellite)10.1109/Satellite55519.2022.00009(1-6)Online publication date: Nov-2022
  • (2022)Position Paper: Renovating Edge Servers with ARM SoCs2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00024(216-223)Online publication date: Dec-2022
  • (2022)Renée: New Life for Old PhonesIEEE Embedded Systems Letters10.1109/LES.2022.314740914:3(135-138)Online publication date: Sep-2022
  • (2022)Quantitative Characterization of Scientific Computing ClustersHigh Performance Computing10.1007/978-3-031-23821-5_4(47-62)Online publication date: 21-Dec-2022
  • (2021)Mont-Blanc 2020: Towards Scalable and Power Efficient European HPC Processors2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474093(136-141)Online publication date: 1-Feb-2021
  • (2021)A64FX – Your Compiler You Must Decide!2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00109(736-740)Online publication date: Sep-2021
  • (2020)DVFS and Its Architectural Simulation Models for Improving Energy Efficiency of Complex Embedded Systems in Early Design PhaseComputers10.3390/computers90100029:1(2)Online publication date: 7-Jan-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media