Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3526064.3534112acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Predicting Slow Network Transfers in Scientific Computing

Published: 27 June 2022 Publication History

Abstract

Data access throughput is one of the key performance metrics in scientific computing, particularly for distributed data-intensive applications. While there has been a body of studies focusing on elephant connections that consume a significant fraction of network bandwidth, this study focuses on predicting slow connections that create bottlenecks in distributed workflows. In this study, we analyze network traffic logs collected between January 2019 and May 2021 at National Energy Research Scientific Computing Center (NERSC). Based on the observed patterns from this data collection, we define a set of features to be used for identifying low-performing data transfers. Through extensive feature engineering and feature selection, we identify a number of new features to significantly enhance the prediction performance. With these new features, even the relatively simple decision tree model could predict slow connections with a F1 score as high as 0.945.

References

[1]
A Alekseev, A Kiryanov, A Klimentov, T Korchuganova, V Mitsyn, D Oleynik, A Smirnov, S Smirnov, and A Zarochentsev. 2020. Scientific Data Lake for High Luminosity LHC project and other data-intensive particle and astro-particle physics experiments. In Journal of Physics: Conference Series, Vol. 1690. 012166.
[2]
Ran Ben Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. 2017. Optimal elephant flow detection. In IEEE INFOCOM 2017-IEEE Conference on Computer Communications. IEEE, 1--9.
[3]
Thomas Beermann, Olga Chuchuk, Alessandro Di Girolamo, Maria Grigorieva, Alexei Klimentov, Mario Lassnig, Markus Schulz, Andrea Sciaba, and Eugeny Tretyakov. 2021. Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC. In EPJ Web of Conferences, Vol. 251. EDP Sciences, 02013.
[4]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD '16). Association for Computing Machinery, New York, NY, USA, 785--794.
[5]
Anshuman Chhabra and Mariam Kiran. 2017. Classifying elephant and mice flows in high-speed scientific networks. Proc. INDIS (2017), 1--8.
[6]
Bjoern Enders, Debbie Bard, Cory Snavely, Lisa Gerhardt, Jason Lee, Becci Totzke, Katie Antypas, Suren Byna, Ravi Cheema, Shreyas Cholia, et al. 2020. Cross-facility science with the superfacility project at LBNL. In 2020 IEEE/ACM 2nd Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP). 1--7.
[7]
Alessandro Finamore, Marco Mellia, Michela Meo, Maurizio M Munafo, Politecnico Di Torino, and Dario Rossi. 2011. Experiences of internet traffic monitoring with tstat. IEEE Network, Vol. 25, 3 (2011), 8--14.
[8]
Rajkumar Kettimuthu, Zhengchun Liu, Ian Foster, Peter H Beckman, Alex Sim, Kesheng Wu, Wei-keng Liao, Qiao Kang, Ankit Agrawal, and Alok Choudhary. 2018. Towards autonomic science infrastructure: Architecture, limitations, and open issues. In Proceedings of the 1st International Workshop on Autonomous Infrastructure for Science. 1--9.
[9]
Zhenlong Li, Qunying Huang, Yuqin Jiang, and Fei Hu. 2020. SOVAS: a scalable online visual analytic system for big climate data analysis. International Journal of Geographical Information Science, Vol. 34, 6 (2020), 1188--1209.
[10]
Albert Mestres, Alberto Rodriguez-Natal, Josep Carner, Pere Barlet-Ros, Eduard Alarcón, Marc Solé, Victor Muntés-Mulero, David Meyer, Sharon Barkai, Mike J Hibbett, et al. 2017. Knowledge-defined networking. ACM SIGCOMM Computer Communication Review, Vol. 47, 3 (2017), 2--10.
[11]
M Nakashima, A Sim, and J Kim. 2020. Evaluation of Deep Learning Models for Network Performance Prediction for Scientific Facilities. In Proceedings of the 3rd International Workshop on Systems and Network Telemetry and Analytics. 53--56.
[12]
Makiya Nakashima, Alex Sim, Youngsoo Kim, Jonghyun Kim, and Jinoh Kim. 2021. Automated feature selection for anomaly detection in network traffic data. ACM Transactions on Management Information Systems (TMIS), Vol. 12, 3 (2021), 1--28.
[13]
Taylor Reiter, Phillip T Brooks, Luiz Irber, Shannon EK Joslin, Charles M Reid, Camille Scott, C Titus Brown, and N Tessa Pierce-Ward. 2021. Streamlining data-intensive biology with workflow systems. GigaScience, Vol. 10, 1 (2021), giaa140.
[14]
Oleg Sukhoroslov. 2021. Toward efficient execution of data-intensive workflows. The Journal of Supercomputing, Vol. 77, 8 (2021), 7989--8012.
[15]
Astha Syal, Alina Lazar, Jinoh Kim, Alex Sim, and Kesheng Wu. 2019. Automatic detection of network traffic anomalies and changes. In Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics. 3--10.
[16]
Benjamin A Weaver, Michael R Blanton, Jon Brinkmann, Joel R Brownstein, and Fritz Stauffer. 2015. The Sloan digital sky survey data transfer infrastructure. Publications of the Astronomical Society of the Pacific, Vol. 127, 950 (2015), 397.

Cited By

View all
  • (2023)Leveraging History to Predict Infrequent Abnormal Transfers in Distributed WorkflowsSensors10.3390/s2312548523:12(5485)Online publication date: 10-Jun-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SNTA '22: Fifth International Workshop on Systems and Network Telemetry and Analytics
June 2022
62 pages
ISBN:9781450393157
DOI:10.1145/3526064
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. machine learning
  2. network transfer
  3. prediction
  4. scientific computing
  5. slow connection

Qualifiers

  • Research-article

Funding Sources

  • U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research

Conference

HPDC '22

Acceptance Rates

Overall Acceptance Rate 22 of 106 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Leveraging History to Predict Infrequent Abnormal Transfers in Distributed WorkflowsSensors10.3390/s2312548523:12(5485)Online publication date: 10-Jun-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media