Abstract
Large-scale particle physics experiments face challenging demands for high-throughput computing resources both now and in the future. New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) ms with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600–700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We refer synonymously to a cloud service being accessed remotely.
We refer synonymously to a edge service being accessed on-premises, or on-prem.
It takes significant effort to adapt TensorFlow to be compatible with the multithreading pattern used in CMSSW, and hence the latest version of TensorFlow is usually not available to be used in the experiment’s software.
For that matter, CPU comparisons can also be nuanced when considering devices with many cores and large RAM. However, they do not fit in with the CMSSW computing model.
References
Apollinari G, Béjar Alonso I, Brüning O, Lamont M, Rossi L (2015) High-luminosity large hadron collider (HL-LHC): preliminary design report. https://cds.cern.ch/record/2116337. Accessed Dec 2018
HEP software foundation (2017) A roadmap for HEP software and computing R&D for the 2020s. arXiv:1712.06982
Acciarri R, et al (2016) Long-baseline neutrino facility (LBNF) and deep underground neutrino experiment (DUNE). arXiv:1601.05471
Mellema G et al (2013) Reionization and the cosmic dawn with the square kilometre array. Exp Astron 36:235. https://doi.org/10.1007/s10686-013-9334-5
National Research Council, the future of computing performance: game over or next level? (2011). https://doi.org/10.17226/12980
Acciarri R et al (2017) Convolutional neural networks applied to neutrino events in a liquid argon time projection chamber. JINST 12(03):P03011. https://doi.org/10.1088/1748-0221/12/03/P03011
Aurisano A, Radovic A, Rocco D, Himmel A, Messier MD, Niner E, Pawloski G, Psihas F, Sousa A, Vahle P (2016) A convolutional neural network neutrino event classifier. JINST 11(09):P09001. https://doi.org/10.1088/1748-0221/11/09/P09001
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE Confer Comput Vis Pattern Recog. https://doi.org/10.1109/CVPR.2016.90
Chatrchyan S et al (2013) Energy Calibration and resolution of the CMS electromagnetic calorimeter in \(pp\) collisions at \(\sqrt{s} = 7\) TeV. JINST 8:P09009. https://doi.org/10.1088/1748-0221/8/09/P09009
Nguyen TQ, Weitekamp D, Anderson D, Castello R, Cerri O, Pierini M, Spiropulu M, Vlimant JR (2018) Topology classification with deep learning to improve real-time event selection at the LHC. arXiv:1807.00083
Chatrchyan S et al (2012) Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC. Phys Lett B 716:30. https://doi.org/10.1016/j.physletb.2012.08.021
Aad G et al (2012) Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC. Phys Lett B 716:1. https://doi.org/10.1016/j.physletb.2012.08.020
Duarte J et al (2018) Fast inference of deep neural networks in FPGAs for particle physics. JINST 13(07):P07027. https://doi.org/10.1088/1748-0221/13/07/P07027
Low JF, Brinkerhoff AW, Busch EL, Carnes AM, Furic IK, Gleyzer S, Kotov K, Madorsky A, Rorie JT, Scurlock B, Shi W, Acosta DE (2017) Boosted decision trees in the level-1 muon endcap trigger at CMS, Tech. Rep. CMS-CR-2017-361, CERN, Geneva. https://cds.cern.ch/record/2289251. Accessed July 2018
Kasieczka G, Michael R, Tilman P (2017) Top tagging reference dataset. https://goo.gl/XGYju3. Accessed July 2018
Ayres DS et al (2007) The NOvA technical design report. https://doi.org/10.2172/935497
Caulfield A, Chung E, Putnam A, Angepat H, Fowers J, Haselman M, Heil S, Humphrey M, Kaur P, Kim JY, Lo D, Massengill T, Ovtcharov K, Papamichael M, Woods L, Lanka S, Chiou D, Burger D (2016) A cloud-scale acceleration architecture. IEEE Comput Soc. https://www.microsoft.com/en-us/research/publication/configurable-cloud-acceleration/. Accessed Oct 2017
CMS Collaboration (2015) Technical Proposal for the Phase-II Upgrade of the compact muon solenoid. CMS Technical Proposal CERN-LHCC-2015-010, CMS-TDR-15-02. https://cds.cern.ch/record/2020886. Accessed July 2017
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Huang G, Liu Z, Weinberger KQ (2017) Densely connected convolutional networks. 2017 IEEE Confer Comput Vis Pattern Recogn. https://doi.org/10.1109/CVPR.2017.243
Xilinx (2018) Xilinx ML Suite. https://github.com/Xilinx/ml-suite. Accessed Sept 2018
Tensorflow (2018) Using TPUs. https://www.tensorflow.org/guide/using_tpu. Accessed Sept 2018
Intel (2018) Intel distribution of OpenVINO toolkit. https://software.intel.com/en-us/openvino-toolkit. Accessed Sept 2018
Kasieczka G et al (2019) The machine learning landscape of top taggers. arXiv:1902.09914
Butter A, Kasieczka G, Plehn T, Russell M (2018) Deep-learned top tagging with a lorentz layer. Sci Post Phys 5(3):028. https://doi.org/10.21468/SciPostPhys.5.3.028
Sjöstrand T, Ask S, Christiansen JR, Corke R, Desai N, Ilten P, Mrenna S, Prestel S, Rasmussen CO, Skands PZ (2015) An introduction to PYTHIA 8.2. Comput Phys Commun 191:159. https://doi.org/10.1016/j.cpc.2015.01.024
Skands P, Carrazza S, Rojo J (2014) Tuning PYTHIA 8.1: the Monash 2013 Tune. Eur Phys J C 74(8):3024. https://doi.org/10.1140/epjc/s10052-014-3024-y
de Favereau J, Delaere C, Demin P, Giammanco A, Lematre V, Mertens A, Selvaggi M (2014) DELPHES 3. A modular framework for fast simulation of a generic collider experiment. JHEP 02:057. https://doi.org/10.1007/JHEP02(2014)057
Cacciari M, Salam GP, Soyez G (2012) FastJet user manual. Eur Phys J C 72:1896. https://doi.org/10.1140/epjc/s10052-012-1896-2
Cacciari M, Salam GP (2006) Dispelling the \(N^{3}\) myth for the \(k_t\) jet-finder. Phys Lett B 641:57. https://doi.org/10.1016/j.physletb.2006.08.037
Cacciari M, Salam GP, Soyez G (2008) The anti-\(k_t\) jet clustering algorithm. JHEP 04:063. https://doi.org/10.1088/1126-6708/2008/04/063
Qu H, Gouskos L (2019) ParticleNet: jet tagging via particle clouds. arXiv:1902.08570
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. Proc ICML 27:807–814
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. https://dblp.org/rec/bib/journals/corr/KingmaB14. Accessed July 2018
Adamson P et al (2017) Constraints on oscillation parameters from \(\nu _e\) appearance and \(\nu _\mu \) disappearance in NOvA. Phys Rev Lett 118(23):231801. https://doi.org/10.1103/PhysRevLett.118.231801
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR09. http://www.image-net.org/papers/imagenet_cvpr09.bib. Accessed June 2018
Private communicates with Alex Himmel (2018), October 2018
Radovic A, Williams M, Rousseau D, Kagan M, Bonacorsi D, Himmel A, Aurisano A, Terao K, Wongjirad T (2018) Machine learning at the energy and intensity frontiers of particle physics. Nature 560(7716):41. https://doi.org/10.1038/s41586-018-0361-2
Albertsson K et al (2018) Machine learning in high energy physics community white paper. J Phys Confer Ser 1085(2):022008. https://doi.org/10.1088/1742-6596/1085/2/022008
Farrell S, Anderson D, Calafiura P, Cerati G, Gray L, Kowalkowski J, Mudigonda M, Prabhat, P. Spentzouris, Spiropoulou M, Tsaris A, Vlimant JR, Zheng S (2017) The HEP.TrkX Project: deep neural networks for HL-LHC online and offline tracking. EPJ Web Confer 150:00003. https://doi.org/10.1051/epjconf/201715000003
CERN (2018) TrackML particle tracking challenge. https://www.kaggle.com/c/trackml-particle-identification. Accessed July 2018
Paganini M, de Oliveira L, Nachman B (2018) CaloGAN: simulating 3D high energy particle showers in multilayer electromagnetic calorimeters with generative adversarial networks. Phys Rev D 97(1):014021. https://doi.org/10.1103/PhysRevD.97.014021
Google (2018) gRPC. version v1.14.0. https://grpc.io/. Accessed Sept 2018
Google (2019) Protocol buffers. https://github.com/protocolbuffers/protobuf. Accessed Sept 2018
CMS Collaboration (2018) CMSSW. version CMSSW\_10\_2\_0. https://github.com/cms-sw/cmssw. Accessed Sept 2018
Intel (2018) Thread building blocks. version 2018\_U1. https://www.threadingbuildingblocks.org. Accessed Sept 2018
Pedro K (2019) SonicCMS. version v3.1.0. https://github.com/hls-fpga-machine-learning/SonicCMS. Accessed Sept 2018
Acknowledgements
We would like to thank the entire Microsoft Azure Machine Learning, Bing, and Project Brainwave teams for the development of and opportunity to preview and study the acceleration platform. In particular, we would like to acknowledge Doug Burger, Eric Chung, Jeremy Fowers, Daniel Lo, Kalin Ovtcharov, and Andrew Putnam, for their support and enthusiasm. We would like to thank Lothar Bauerdick and Oliver Gutsche for seed funding through USCMS computing operations. We would like to thank Alex Himmel and other NOvA collaborators for support and comments on the manuscript. Part of this work was conducted at “iBanks,” the AI GPU cluster at Caltech. We acknowledge NVIDIA, SuperMicro, and the Kavli Foundation for their support of “iBanks.” Part of this work was conducted using Google Cloud resources provided by the MIT Quest for Intelligence program. Part of this work is supported through IRIS-HEP under NSF-grant 1836650. We thank the organizers of the public available top tagging dataset (and others like it) for providing benchmarks for the physics community. The authors thank the NOvA collaboration for the use of its Monte Carlo software tools and data and for the review of this manuscript. This work was supported by the US Department of Energy and the US National Science Foundation. NOvA receives additional support from the Department of Science and Technology, India; the European Research Council; the MSMT CR, Czech Republic; the RAS, RMES, and RFBR, Russia; CNPq and FAPEG, Brazil; and the State and University of Minnesota. We are grateful for the contributions of the staff at the Ash River Laboratory, Argonne National Laboratory, and Fermilab.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
J.D., B.H., S.J., B.K., M.L., K.P., N.T., and A.T. are supported by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy, Office of Science, Office of High Energy Physics. P.H. and D.R. are supported by a Massachusetts Institute of Technology University grant. M.P., J.N., and V.L. received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant agreement no 772369). V.L. also received funding from the Ministry of Education, Science, and Technological Development of the Republic of Serbia under project ON171017. S-C.H. is supported by DOE Office of Science, Office of High Energy Physics Early Career Research program under Award No. DE-SC0015971. S.H., M.T., and D.W. are supported by F5 Networks. Z. W. is supported by the National Science Foundation under Grants No. 1606321 and 115164.
Rights and permissions
About this article
Cite this article
Duarte, J., Harris, P., Hauck, S. et al. FPGA-Accelerated Machine Learning Inference as a Service for Particle Physics Computing. Comput Softw Big Sci 3, 13 (2019). https://doi.org/10.1007/s41781-019-0027-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41781-019-0027-2