Keyword: job scheduling : Search

research-article

NetLLM: Adapting Large Language Models for Networking

ACM SIGCOMM '24: Proceedings of the ACM SIGCOMM 2024 ConferencePages 661–678https://doi.org/10.1145/3651890.3672268

Many networking tasks now employ deep learning (DL) to solve complex prediction and optimization problems. However, current design philosophy of DL-based algorithms entails intensive engineering overhead due to the manual design of deep neural networks (...

poster

Orchestrating a DNN training job using an iScheduler Framework: a use case

PEARC '24: Practice and Experience in Advanced Research Computing 2024: Human Powered ComputingArticle No.: 108, Pages 1–3https://doi.org/10.1145/3626203.3670632

Orchestrating DNN training jobs efficiently on HPC centers such as Ohio Supercomputer Center (OSC), Texas Advanced Computing Center (TACC), and San Diego Supercomputer Center (SDSC) is crucial due to the prevalence of AI-driven workloads. However, ...

short-paper

Reference Implementation of Smart Scheduler: A CI-Aware, AI-Driven Scheduling Framework for HPC Workloads

PEARC '24: Practice and Experience in Advanced Research Computing 2024: Human Powered ComputingArticle No.: 75, Pages 1–4https://doi.org/10.1145/3626203.3670555

Many modern scientific workloads in HPC centers rely heavily on AI-driven tasks, particularly deep neural network (DNN) training workloads. Efficiently managing and scheduling these workloads via SLURM interfaces requires users to comprehensively ...

research-article

Open Access

Genetic-based Constraint Programming for Resource Constrained Job Scheduling

GECCO '24: Proceedings of the Genetic and Evolutionary Computation ConferencePages 942–951https://doi.org/10.1145/3638529.3654046

Resource constrained job scheduling is a hard combinatorial optimisation problem that originates in the mining industry. Off-the-shelf solvers cannot solve this problem satisfactorily in reasonable time-frames, while other solution methods such as ...

research-article

HPC Jobs Classification and Resource Prediction to Minimize Job Failures

CompSysTech '24: Proceedings of the International Conference on Computer Systems and Technologies 2024Pages 95–101https://doi.org/10.1145/3674912.3674914

In this work, we focus on HPC job classification and resource prediction. HPC users at Concordia University come from different departments, and not all of them have an IT background. One of the most challenging issues for users is how to properly ...

keynote

Demistifying HPC-Quantum integration: it's all about scheduling

Paolo Viviani

HPQCI '24: Proceedings of the 2024 Workshop on High Performance and Quantum Computing IntegrationPages 1–3https://doi.org/10.1145/3659996.3673223

Recent research on the integration between HPC and quantum computer was mostly focused on the software stack and quantum circuit compilation aspects, neglecting critical issues like HPC resource allocation and job scheduling given the scarcity of QPUs, ...

research-article

Open Access

A Design Framework for the Simulation of Distributed Quantum Computing

HPQCI '24: Proceedings of the 2024 Workshop on High Performance and Quantum Computing IntegrationPages 4–10https://doi.org/10.1145/3659996.3660035

The growing demand for large-scale quantum computers is pushing research on Distributed Quantum Computing (DQC). Recent experimental efforts have demonstrated some of the building blocks for such a design. DQC systems are clusters of quantum processing ...

research-article

FuncMem: Reducing Cold Start Latency in Serverless Computing Through Memory Prediction and Adaptive Task Execution

SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied ComputingPages 131–138https://doi.org/10.1145/3605098.3636033

Because serverless computing can scale automatically and affordably, it has become a popular choice for cloud-based services. However, despite these advantages, a serverless architecture is not suitable for applications requiring instantaneous executions ...

research-article

Open Access

DPS: Adaptive Power Management for Overprovisioned Systems

SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 27, Pages 1–14https://doi.org/10.1145/3581784.3607091

Maximizing performance under a power budget is essential for HPC systems and has inspired the development of many power management frameworks. These can be broadly characterized into two groups: model-based and stateless. Model-based frameworks use ...

research-article

Free

A Dual-Agent Scheduler for Distributed Deep Learning Jobs on Public Cloud via Reinforcement Learning

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 2776–2788https://doi.org/10.1145/3580305.3599241

Public cloud GPU clusters are becoming emerging platforms for training distributed deep learning jobs. Under this training paradigm, the job scheduler is a crucial component to improve user experiences, i.e., reducing training fees and job completion ...

research-article

Orchestration of materials science workflows for heterogeneous resources at large scale

International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 37, Issue 3-4Pages 260–271https://doi.org/10.1177/10943420231167800

In the era of big data, materials science workflows need to handle large-scale data distribution, storage, and computation. Any of these areas can become a performance bottleneck. We present a framework for analyzing internal material structures (e.g., ...

research-article

Research on overall energy consumption optimization method for data center based on deep reinforcement learning

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology (JIFS), Volume 44, Issue 5Pages 7333–7349https://doi.org/10.3233/JIFS-223769

With the rapid development of cloud computing, there are more and more large-scale data centers, which makes the energy management of data centers more complex. In order to achieve better energy-saving effect, it is necessary to solve the problems of ...

research-article

Towards Data Minimization and Access Control for Immunization Data Sharing

MEDES '22: Proceedings of the 14th International Conference on Management of Digital EcoSystemsPages 16–23https://doi.org/10.1145/3508397.3564837

The global interest in taking preventative measures has intensified in response to the coronavirus epidemic. The vaccination certificate is one approach used to control the spread of disease while enabling people to carry out their regular lives and ...

research-article

SMART: Speedup Job Completion Time by Scheduling Reduce Tasks

Journal of Computer Science and Technology (JCST), Volume 37, Issue 4Pages 763–778https://doi.org/10.1007/s11390-022-2118-5

Abstract

Distributed computing systems have been widely used as the amount of data grows exponentially in the era of information explosion. Job completion time (JCT) is a major metric for assessing their effectiveness. How to reduce the JCT for these ...

research-article

Open Access

A Hyper-Heuristic for the Preemptive Single Machine Scheduling Problem to Minimize the Total Weighted Tardiness

Vadim Romanuke

Applied Computer Systems (ACSS), Volume 27, Issue 1Pages 1–12https://doi.org/10.2478/acss-2022-0001

Abstract

A problem of minimizing the total weighted tardiness in the preemptive single machine scheduling for discrete manufacturing is considered. A hyper-heuristic is presented, which is composed of 24 various heuristics, to find an approximately optimal ...

research-article

Teaching learning-based optimisation for job scheduling in computational grids

International Journal of Advanced Intelligence Paradigms (IJAIP), Volume 21, Issue 1-2Pages 72–86https://doi.org/10.1504/ijaip.2022.121030

Grid computing is a framework that enables the sharing, selection and aggregation of geographically distributed resources dynamically to meet the current and growing computational demands. Job scheduling is a key issue of grid computing and its algorithm ...

research-article

MG-FIM: A Multi-GPU Fast Iterative Method Using Adaptive Domain Decomposition

SIAM Journal on Scientific Computing (SISC), Volume 44, Issue 1Pages C54–C76https://doi.org/10.1137/21M1414644

Applying the latest parallel computing technology has become a recent trend in Eikonal equation solvers. Many recent studies have focused on parallelization of Eikonal solvers for multithreaded CPUs or single GPU systems. However, multi-GPU Eikonal solvers ...

research-article

Public Access

Good Things Come to Those Who Wait: Optimizing Job Waiting in the Cloud

SoCC '21: Proceedings of the ACM Symposium on Cloud ComputingPages 229–242https://doi.org/10.1145/3472883.3487007

Cloud-enabled schedulers execute jobs on either fixed resources or those acquired on demand from cloud platforms. Thus, these schedulers must define not only a scheduling policy, which selects which jobs run when fixed resources become available, but ...

poster

Cost-effective data analytics across multiple cloud regions

SIGCOMM '21: Proceedings of the SIGCOMM '21 Poster and Demo SessionsPages 1–3https://doi.org/10.1145/3472716.3472842

We propose a cloud-native data analytics engine for processing data stored among geographically distributed cloud regions with reduced cost. A job is split into subtasks and placed across regions based on factors including prices of compute resources and ...

extended-abstract

An Algorithmic Framework for Approximating Maximin Share Allocation of Chores

EC '21: Proceedings of the 22nd ACM Conference on Economics and ComputationPages 630–631https://doi.org/10.1145/3465456.3467555

We consider the problem of fairly dividing m indivisible chores among n agents. The fairness measure we consider here is the maximin share. The previous best known result is that there always exists a 4/3-approximation maximin share allocation[3]. With ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences