Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3626203.3670632acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
poster

Orchestrating a DNN training job using an iScheduler Framework: a use case

Published: 17 July 2024 Publication History

Abstract

Orchestrating DNN training jobs efficiently on HPC centers such as Ohio Supercomputer Center (OSC), Texas Advanced Computing Center (TACC), and San Diego Supercomputer Center (SDSC) is crucial due to the prevalence of AI-driven workloads. However, managing these workloads effectively requires a deep understanding of available resources, allocation policies, and suitable execution configurations. Current approaches often lead to job interruptions, prolonged wait times, and inefficient resource utilization. To address these challenges, we propose the deployment of an iScheduler framework. This framework aims to automate workflow orchestration for DNN training by estimating resource needs (using existing state-of-art estimation models) and generating an infrastructure-aware execution plan. In this study, we demonstrate the practical application of the iScheduler framework in orchestrating a user-specific DNN training workflow, showcasing its capabilities in optimizing resource allocation and scheduling. This poster dives deeper into the user case and shows all user interactions with iScheduler and the responses.

Reference

[1]
Manikya Swathi Vallabhajosyula and Rajiv Ramnath. 2023. Insights from the HARP Framework: Using an AI-Driven Approach for Efficient Resource Allocation in HPC Scientific Workflows. In Practice and Experience in Advanced Research Computing. 341–344.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '24: Practice and Experience in Advanced Research Computing 2024: Human Powered Computing
July 2024
608 pages
ISBN:9798400704192
DOI:10.1145/3626203
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 July 2024

Check for updates

Author Tags

  1. AI4CI
  2. AI4OPT
  3. ML
  4. estimation scalability
  5. execution time estimation
  6. job scheduling
  7. model
  8. workflow orchestration

Qualifiers

  • Poster
  • Research
  • Refereed limited

Funding Sources

Conference

PEARC '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 34
    Total Downloads
  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)4
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media