research-article

Supercomputing with commodity CPUs: are mobile SoCs ready for HPC?

Authors:

Nikola Rajovic,

Paul M. Carpenter,

Nikola Puzovic,

Mateo ValeroAuthors Info & Claims

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Article No.: 40, Pages 1 - 12

https://doi.org/10.1145/2503210.2503281

Published: 17 November 2013 Publication History

Abstract

In the late 1990s, powerful economic forces led to the adoption of commodity desktop processors in high-performance computing. This transformation has been so effective that the June 2013 TOP500 list is still dominated by x86.

In 2013, the largest commodity market in computing is not PCs or servers, but mobile computing, comprising smart-phones and tablets, most of which are built with ARM-based SoCs. This leads to the suggestion that once mobile SoCs deliver sufficient performance, mobile SoCs can help reduce the cost of HPC.

This paper addresses this question in detail. We analyze the trend in mobile SoC performance, comparing it with the similar trend in the 1990s. We also present our experience evaluating performance and efficiency of mobile SoCs, deploying a cluster and evaluating the network and scalability of production applications. In summary, we give a first answer as to whether mobile SoCs are ready for HPC.

References

[1]

N. R. Adiga, G. Almási, G. S. Almasi, Y. Aridor, R. Barik, D. Beece, R. Bellofatto, G. Bhanot, et al. An overview of the BlueGene/L supercomputer. In ACM/IEEE 2002 Conference on Supercomputing. IEEE Computer Society, 2002.

Digital Library

[2]

S. Alam, R. Barrett, M. Bast, M. R. Fahey, J. Kuehn, C. McCurdy, J. Rogers, P. Roth, R. Sankaran, J. S. Vetter, P. Worley, and W. Yu. Early evaluation of IBM BlueGene/P. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pages 23:1--23:12, Piscataway, NJ, USA, 2008. IEEE Press.

Digital Library

[3]

Applied Micro. APM "X-Gene" Launch Press Briefing. http://www.apm.com/global/x-gene/docs/X-GeneOverview.pdf, 2012.

[4]

J. Balart, A. Duran, M. Gonzàlez, X. Martorell, E. Ayguadé, and J. Labarta. Nanos Mercurium: a research compiler for OpenMP. In Proceedings of the European Workshop on OpenMP, volume 8, 2004.

[5]

H. Berendsen, D. van der Spoel, and R. van Drunen. Gromacs: A message-passing parallel molecular dynamics implementation. Computer Physics Communications, 91(1):43--56, 1995.

[6]

E. Blem, J. Menon, and K. Sankaralingam. Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures. In 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013), 2013.

Digital Library

[7]

Calxeda. Calxeda EnergyCore ECX-1000 Series. http://www.calxeda.com/wp-content/uploads/2012/06/ECX1000-Product-Brief-612.pdf, 2012.

[8]

DEISA project. DEISA 2; Distributed European Infrastructure for Supercomputing Applications; Maintenance of the DEISA Benchmark Suite in the Second Year. Technical report.

[9]

J. Dongarra, P. Luszczek, and A. Petitet. The LINPACK Benchmark: past, present and future. Concurrency and Computation: Practice and Experience, 15(9):803--820, 2003.

[10]

A. Duran, E. Ayguade, R. M. Badia, J. Labarta, L. Martinell, X. Martorell, and J. Planas. OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, 21(02):173--193, 2011.

[11]

M. Folk, A. Cheng, and K. Yates. HDF5: A file format and I/O library for high performance computing applications. In Proceedings of Supercomputing, volume 99, 1999.

[12]

M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005. Special issue on "Program Generation, Optimization, and Platform Adaptation".

[13]

D. Göddeke, D. Komatitsch, M. Geveler, D. Ribbrock, N. Rajovic, N. Puzovic, and A. Ramirez. Energy efficiency vs. performance of the numerical solution of PDEs: an application study on a low-power ARM-based cluster. Journal of Computational Physics, 237:132--150, Mar. 2013.

Digital Library

[14]

B. Goglin. High-Performance Message Passing over generic Ethernet Hardware with Open-MX. Elsevier Journal of Parallel Computing (PARCO), 37(2):85--100, Feb. 2011.

Digital Library

[15]

R. Grisenthwaite. ARMv8 Technology Preview. http://www.arm.com/files/downloads/ARMv8_Architecture.pdf, 2011.

[16]

R. Haring, M. Ohmacht, T. Fox, M. Gschwind, D. Satterfield, K. Sugavanam, P. Coteus, P. Heidelberger, M. Blumrich, R. Wisniewski, a. gara, G. Chiu, P. Boyle, N. Chist, and C. Kim. The IBM Blue Gene/Q Compute Chip. Micro, IEEE, 32(2):48--60, march-april 2012.

Digital Library

[17]

J.-H. Huang. GTC 2013 Keynote, March 2013.

[18]

IBM Systems and Technology. IBM System Blue Gene/Q Data Sheet, November 2011.

[19]

IHS iSuppli News Flash. Low-End Google Nexus 7 Carries $152 BOM, Teardown Reveals. http://www.isuppli.com/Teardowns/News/pages/Low-End-Google-Nexus-7-Carries-\$157-BOM-Teardown-Reveals.aspx.

[20]

Intel. Intel ATOM S1260. http://ark.intel.com/products/71267/Intel-Atom-Processor-S1260-1MB-Cache-2_00-GHz.

[21]

Intel. Intel MPI Benchmarks 3.2.4. http://software.intel.com/en-us/articles/intel-mpi-benchmarks.

[22]

Intel. Intel Xeon Processor E5-2670. http://ark.intel.com/products/64595/Intel-Xeon-Processor-E5-2670-20M-Cache-2_60-GHz-8_00-GTs-Intel-QPI.

[23]

Khronos OpenCL Working Group. The OpenCL Specification, version 1.0.29. http://khronos.org/registry/cl/specs/opencl-1.0.29.pdf, 2008.

[24]

D. Komatitsch and J. Tromp. Introduction to the spectral element method for three-dimensional seismic wave propagation. Geophysical Journal International, 139(3):806--822, 1999.

[25]

W. Lang, J. Patel, and S. Shankar. Wimpy node clusters: What about non-wimpy workloads? In Proceedings of the Sixth International Workshop on Data Management on New Hardware, pages 47--55. ACM, 2010.

Digital Library

[26]

S. Li, K. Lim, P. Faraboschi, J. Chang, P. Ranganathan, and N. P. Jouppi. System-level integrated server architectures for scale-out datacenters. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pages 260--271. ACM, 2011.

Digital Library

[27]

T. Mattson and G. Henry. An Overview of the Intel TFLOPS Supercomputer. Intel Technology Journal, 2(1), 1998.

[28]

J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.

[29]

H. Nakashima, H. Nakamura, M. Sato, T. Boku, S. Matsuoka, D. Takahashi, and Y. Hotta. Megaproto: 1 TFlops/10kW rack is feasible even with only commodity technology. In Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing. IEEE, 2005.

Digital Library

[30]

NVIDIA. CUDA Programming Guide 2.2, 2009.

[31]

NVIDIA. Bringing High-End Graphics to Handheld Devices, 2011.

[32]

V. Pillet, J. Labarta, T. Cortes, and S. Girona. Paraver: A tool to visualize and analyze parallel code. WoTUG-18, pages 17--31, 1995.

[33]

N. Rajovic, A. Rico, N. Puzovic, C. Adeniyi-Jones, and A. Ramirez. Tibidabo: Making the case for an ARM-based HPC system. Future Generation Computer Systems, 2013.

[34]

N. Rajovic, A. Rico, J. Vipond, I. Gelado, N. Puzovic, and A. Ramirez. Experiences With Mobile Processors for Energy Efficient HPC. In Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 464--468, 2013.

Digital Library

[35]

N. Rajovic, L. Vilanova, C. Villavieja, N. Puzovic, and A. Ramirez. The low power architecture approach towards exascale computing. Journal of Computational Science, 2013.

Digital Library

[36]

K. P. Saravanan, P. M. Carpenter, and A. Ramirez. Power/Performance evaluation of Energy Efficient Ethernet (EEE) for High Performance Computing. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, April 2013.

[37]

B. Schroeder, E. Pinheiro, and W.-D. Weber. DRAM errors in the wild: a large-scale field study. In Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems, pages 193--204. ACM, 2009.

Digital Library

[38]

B. Subramaniam and W.-c. Feng. The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems. In 8th IEEE Workshop on High-Performance, Power-Aware Computing (HPPAC), Shanghai, China, May 2012.

Digital Library

[39]

Texas Instruments. AM5K2E04/02 Multicore ARM KeyStone II System-on-Chip (SoC) Data Manual, November 2012.

[40]

R. Teyssier. Cosmological hydrodynamics with adaptive mesh refinement. Astronomy and Astrophysics, 385(1):337--364, 2002.

[41]

TOP500. Top500®supercomputer cites. http://www.top500.org/.

[42]

J. Turley. Cortex-A15 "Eagle" flies the coop. Microprocessor Report, 24(11):1--11, November 2010.

[43]

V. Vasudevan, D. Andersen, M. Kaminsky, L. Tan, J. Franklin, and I. Moraru. Energy-efficient cluster computing with fawn: Workloads and implications. In Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking, pages 195--204. ACM, 2010.

Digital Library

[44]

M. Warren, E. Weigle, and W. Feng. High-density computing: A 240-processor beowulf in one cubic meter. In Supercomputing, ACM/IEEE 2002 Conference, pages 61--61. IEEE, 2002.

Digital Library

[45]

R. Whaley and J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1--27. IEEE Computer Society, 1998.

Digital Library

[46]

Yokogawa. WT210/WT230 Digital Power Meters. http://tmi.yokogawa.com/products/digital-power-analyzers/digital-power-analyzers/wt210wt230-digital-power-meters/.

Cited By

Pereira RRoussel ATsuji MCarribault PSato MMurai HGautier T(2024)An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FXProceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops10.1145/3636480.3637094(7-16)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3636480.3637094
Switzer JMarcano GKastner RPannuto PAamodt TJerger NSwift M(2023)Junkyard Computing: Repurposing Discarded Smartphones to Minimize CarbonProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575710(400-412)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575710
Pramanik PPal SChoudhury P(2023)Mobile crowd computing: potential, architecture, requirements, challenges, and applicationsThe Journal of Supercomputing10.1007/s11227-023-05545-080:2(2223-2318)Online publication date: 29-Jul-2023
https://doi.org/10.1007/s11227-023-05545-0
Show More Cited By

Recommendations

DEISA--Distributed European Infrastructure for Supercomputing Applications

The paper presents an overview of the current research and achievements of the DEISA project, with a focus on the general concept of the infrastructure, the operational model, application projects and science communities, the DEISA Extreme Computing ...
Hungarian Supercomputing Grid
ICCS '02: Proceedings of the International Conference on Computational Science-Part II

The main objective of the paper is to describe the main goals and activities within the newly formed Hungarian Supercomputing Grid (H-SuperGrid) which will be used as a high-performance and highthroughput Grid. In order to achieve these two features ...
A european perspective on supercomputing
ICS '09: Proceedings of the 23rd international conference on Supercomputing

Massive computing systems will be needed to maintain competitiveness in all areas of science, engineering and business, to provide both management efficiency and computing capability. From a systems management perspective, massive installations offer an ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

November 2013

1123 pages

ISBN:9781450323789

DOI:10.1145/2503210

General Chair:
William Gropp
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Satoshi Matsuoka
Tokyo Institute of Technology, Tokyo, Japan

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

SC13

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC13: International Conference for High Performance Computing, Networking, Storage and Analysis

November 17 - 21, 2013

Colorado, Denver

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

95
Total Citations
View Citations
1,234
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)3

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pereira RRoussel ATsuji MCarribault PSato MMurai HGautier T(2024)An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FXProceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops10.1145/3636480.3637094(7-16)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3636480.3637094
Switzer JMarcano GKastner RPannuto PAamodt TJerger NSwift M(2023)Junkyard Computing: Repurposing Discarded Smartphones to Minimize CarbonProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575710(400-412)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575710
Pramanik PPal SChoudhury P(2023)Mobile crowd computing: potential, architecture, requirements, challenges, and applicationsThe Journal of Supercomputing10.1007/s11227-023-05545-080:2(2223-2318)Online publication date: 29-Jul-2023
https://doi.org/10.1007/s11227-023-05545-0
Xu MZhang LLi HXing RSun Q(2022)A Satellite-Born Server Design with Massive Tiny Chips Towards In-Space Computing2022 IEEE International Conference on Satellite Computing (Satellite)10.1109/Satellite55519.2022.00009(1-6)Online publication date: Nov-2022
https://doi.org/10.1109/Satellite55519.2022.00009
Xu MZhang LWang S(2022)Position Paper: Renovating Edge Servers with ARM SoCs2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00024(216-223)Online publication date: Dec-2022
https://doi.org/10.1109/SEC54971.2022.00024
Switzer JSiu ERamesh SHu RZadorian EKastner R(2022)Renée: New Life for Old PhonesIEEE Embedded Systems Letters10.1109/LES.2022.314740914:3(135-138)Online publication date: Sep-2022
https://doi.org/10.1109/LES.2022.3147409
Vivas ACastro H(2022)Quantitative Characterization of Scientific Computing ClustersHigh Performance Computing10.1007/978-3-031-23821-5_4(47-62)Online publication date: 21-Dec-2022
https://doi.org/10.1007/978-3-031-23821-5_4
Armejach ABrank BCortina JDolique FHayes THo NLagadec PLemaire RLopez-Paradis GMarliac LMoreto MMarcuello PPleiter DTan XDerradji S(2021)Mont-Blanc 2020: Towards Scalable and Power Efficient European HPC Processors2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474093(136-141)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9474093
Domke J(2021)A64FX – Your Compiler You Must Decide!2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00109(736-740)Online publication date: Sep-2021
https://doi.org/10.1109/Cluster48925.2021.00109
Haririan P(2020)DVFS and Its Architectural Simulation Models for Improving Energy Efficiency of Complex Embedded Systems in Early Design PhaseComputers10.3390/computers90100029:1(2)Online publication date: 7-Jan-2020
https://doi.org/10.3390/computers9010002
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents