Fan et al., 2022 - Google Patents
Dras: Deep reinforcement learning for cluster scheduling in high performance computingFan et al., 2022
View PDF- Document ID
- 6224890545849754358
- Author
- Fan Y
- Li B
- Favorite D
- Singh N
- Childers T
- Rich P
- Allcock W
- Papka M
- Lan Z
- Publication year
- Publication venue
- IEEE Transactions on Parallel and Distributed Systems
External Links
Snippet
Cluster schedulers are crucial in high-performance computing (HPC). They determine when and which user jobs should be allocated to available system resources. Existing cluster scheduling heuristics are developed by human experts based on their experience with …
- 230000002787 reinforcement 0 title abstract description 32
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G06F9/4887—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
- G06Q10/063—Operations research or analysis
- G06Q10/0631—Resource planning, allocation or scheduling for a business operation
- G06Q10/06311—Scheduling, planning or task assignment for a person or group
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation, e.g. linear programming, "travelling salesman problem" or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation, e.g. computer aided management of electronic mail or groupware; Time management, e.g. calendars, reminders, meetings or time accounting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tuli et al. | COSCO: Container orchestration using co-simulation and gradient based optimization for fog computing environments | |
Fan et al. | Deep reinforcement agent for scheduling in HPC | |
Zhang et al. | RLScheduler: an automated HPC batch job scheduler using reinforcement learning | |
CN110737529B (en) | Short-time multi-variable-size data job cluster scheduling adaptive configuration method | |
Yan et al. | HANSEL: Adaptive horizontal scaling of microservices using Bi-LSTM | |
Fan et al. | Dras: Deep reinforcement learning for cluster scheduling in high performance computing | |
Mahmoud et al. | Multiobjective task scheduling in cloud environment using decision tree algorithm | |
Tong et al. | DDQN-TS: A novel bi-objective intelligent scheduling algorithm in the cloud environment | |
Li et al. | Weighted double deep Q-network based reinforcement learning for bi-objective multi-workflow scheduling in the cloud | |
Li et al. | OKCM: improving parallel task scheduling in high-performance computing systems using online learning | |
CN113641445B (en) | Cloud resource self-adaptive configuration method and system based on depth deterministic strategy | |
Ye et al. | SHWS: Stochastic hybrid workflows dynamic scheduling in cloud container services | |
Mohammadzadeh et al. | Energy-aware workflow scheduling in fog computing using a hybrid chaotic algorithm | |
Perez et al. | Multi-objective reinforcement learning for responsive grids | |
Jalali Khalil Abadi et al. | A comprehensive survey on scheduling algorithms using fuzzy systems in distributed environments | |
Davami et al. | Distributed scheduling method for multiple workflows with parallelism prediction and DAG prioritizing for time constrained cloud applications | |
Cui et al. | Cloud workflow scheduling algorithm based on reinforcement learning | |
Baheri | Mars: Multi-scalable actor-critic reinforcement learning scheduler | |
Fan | Intelligent Job Scheduling on High Performance Computing Systems | |
Funika et al. | Automatic management of cloud applications with use of proximal policy optimization | |
Perez et al. | Responsive elastic computing | |
Nissanke et al. | Probabilistic performance analysis in multiprocessor scheduling | |
Fomperosa et al. | Task scheduler for heterogeneous data centres based on deep reinforcement learning | |
de Freitas Cunha et al. | An SMDP approach for Reinforcement Learning in HPC cluster schedulers | |
Farooqi et al. | Exploring Hybrid Classical-Quantum Compute Systems through Simulation |