Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3307681.3326605acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Public Access

Preemptive Multi-Queue Fair Queuing

Published: 17 June 2019 Publication History

Abstract

Fair queuing (FQ) algorithms have been widely adopted in computer systems to share resources among multiple users. Modern operating systems and hypervisors use variants of FQ algorithms to implement the critical OS resource management -- the thread scheduler. While the existing FQ algorithms enforce fair CPU allocation on a per-core basis, there lacks an algorithm to fairly allocate CPU on multiple cores. This common deficiency in state-of-the-art multicore schedulers causes unfair CPU allocations to parallel programs using blocking synchronization, leading to severe performance degradation. Parallel threads that frequently block due to synchronization exhibit deceptive idleness and are penalized by the thread scheduler. To this end, we propose a preemptive multi-queue fair queuing (P-MQFQ) algorithm that uses a centralized queue to fairly dispatch threads from different programs based on their received CPU bandwidth from multiple cores. We demonstrate that P-MQFQ can be approximated by augmenting the existing load balancing in the OS without requiring to implement the centralized queue or undermining scalability. We implement P-MQFQ in Linux and Xen, respectively, and show significantly improved utilization and performance for parallel programs.

References

[1]
J. Ahn, C. H. Park, and J. Huh. Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems. In Proceedings of the 47th IEEE/ACM International Symposium on Microarchitecture (Micro-47), 2014.
[2]
Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2.
[3]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (SC'91), 1991.
[4]
J. C. R. Bennett and H. Zhang. Hierarchical Packet Fair Queueing Algorithm. In Proceedings of the ACM Symposium on Communications Architectures and Protocols (SIGCOMM'96), 1996.
[5]
J. C. R. Bennett and H. Zhang. WF2Q: Worst-case Fair Weighted Fair Queueing. In Proceedings of 15th International Conference on Networking for Global Communications (INFOCOM'96), 1996.
[6]
J. M. Blanquer and B. Ozden. Fair Queueing for Aggregated Multiple Links. In Proceedings of the ACM Symposium on Communications Architectures and Protocols (SIGCOMM'01), 2001.
[7]
M. Blasgen, J. Gray, M. Mitoma, and T. Price. The Convoy Phenomenon. SIGOPS Operating System Review, 13(2), 1979.
[8]
A. Chandra, M. Adler, P. Goyal, and P. Shenoy. Surplus Fair Scheduling: A Proportional-share CPU Scheduling Algorithm for Symmetric Multiprocessors. In Proceedings of the 4th USENIX Symposium on Operating Systems Design and Implementation (OSDI'00), 2000.
[9]
A. Chandra, M. Adler, and P. Shenoy. Deadline Fair Scheduling: Bridging the Theory and Practice of Proportionate Fair Scheduling in Multiprocessor Systems. In Proceedings of 7th IEEE Symposium on Real-time Technology and Applications (RTTAS'01), 2001.
[10]
A. Chandra and P. Shenoy. Hierarchical Scheduling forSymmetric Multiprocessors. IEEE Transaction on Parallel and Distributed Systems., 19, 2008.
[11]
L. Cheng, J. Rao, and F. C. M. Lau. vScale: Automatic and Efficient Processor Scaling for SMP Virtual Machines. In Proceedings of the 11th European Conference on Computer Systems (EuroSys'16), 2016.
[12]
C. Delimitrou and C. Kozyrakis. Quasar: Resource-efficient and QoS-aware Cluster Management. In Proceedings of the 19th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS'14), 2014.
[13]
A. Demers, S. Keshav, and S. Shenker. Analysis and Simulation of a Fair Queueing Algorithm. In Proceedings of Symposium on Communications Architecture and Protocals (SIGCOMM'89), 1989.
[14]
X. Ding, P. B. Gibbons, M. A. Kozuch, and J. Shan. Gleaner: Mitigating the Blockwaiter Wakeup Problem for Virtualized Multicore Applications. In Proceedings of the 2014 USENIX Annual Technical Conference (ATC'14), 2014.
[15]
K. J. Duda and D. R. Cheriton. Borrowed-virtual-time Scheduling: Supporting Latency-sensitive Threads in a General-purpose Scheduler. In Proceedings of the 7th ACM Symposium on Operating Systems Principles (SOSP'99), 1999.
[16]
R. Essick. An Event-based Fair Share Scheduler. In Proceedings of the Winter 1990 USENIX Conference, 1990.
[17]
T. Friebel and S. Biemueller. How to Deal With Lock Holder Preemption. In Xen Developer Summit, 2008.
[18]
S. J. Golestani. A Self-clocked Fair Queueing Scheme for Broadband Applications. In Proceedings of 13th International Conference on Networking for Global Communications (INFOCOM'94), 1994.
[19]
Google Cloud Platform. http://cloud.google.com/compute.
[20]
P. Goyal, H. M. Vin, and H. Chen. Start-time Fair Queueing: A Scheduling Algorithm for Integrated Services Packet Scheduling Networks. In Proceedings of the ACM Symposium on Communications Architectures and Protocols (SIGCOMM'96), 1996.
[21]
A. G. Greenberg and N. Madras. How Fair is Fair Queueing. Journal of ACM, 39, 1992.
[22]
G. J. Henry. The Fair Share Scheduler. AT&T Bell Laboratories Technical Journal, 63, 1984.
[23]
S. Iyer and P. Druschel. Anticipatory Scheduling: A Disk Scheduling Framework to Overcome Deceptive Idleness in Synchronous I/O. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01), 2001.
[24]
W. Jin, J. S. Chase, and J. Kaur. Interposed Proportional Sharing for Storage Service Utility. In Proceedings of International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'04), 2004.
[25]
S. T. Jones, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Antfarm: Tracking Processes in a Virtual Machine Environment. In Proceedings of the 2006 USENIX Annual Technical Conference (ATC'06), 2006.
[26]
J. Kay and P. Lauder. The Fair Share Scheduler. Communications of the ACM, 31, 1988.
[27]
H. Kim, S. Kim, J. Jeong, J. Lee, and S. Maeng. Demand-Based Coordinated Scheduling for SMP VMs. In Proceedings of 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'13), 2013.
[28]
L. Kleinrock. Queueing System. 1975.
[29]
T. Li, D. Baumberger, and S. Hahn. Efficient and Scalable Multiprocessor Fair Scheduling Using Distributed Weighted Round Robin. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Computing (PPoPP'09), 2009.
[30]
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible collocations. In Proceedings of the 44th IEEE/ACM International Symposium on Microarchitecture (Micro-44), 2011.
[31]
I. Molnar. Completely Fair Scheduler. In Linux Journal, 2009.
[32]
J. Nagle. On Packet Switches with Infinite Storage. In RFC 970, FACC Palo Alto, 1985.
[33]
J. Nieh and M. S. Lam. The Design, Implementation and Evaluation of SMART: A Scheduler for Multimedia Applications. In Proceedings of the 16th ACMSymposium on Operating Systems Principles (SOSP'97), 1997.
[34]
J. Nieh, C. Vaill, and H. Zhong. Virtual-Time-Round-Robin: An O(1) Proportional Share Scheduler . In Proceedings of the 2001 USENIX Annual Technical Conference (ATC'11), 2011.
[35]
S. Orathai and K. S. Hyong. Is Co-Scheduling Too Expensive for SMP VMs? In Proceedings of the 6th European Conference on Computer Systems (EuroSys'11), 2011.
[36]
J. K. Ousterhout. Scheduling Techniques for Concurrent Systems. In Proceedings of the IEEE Distributed Compute System, 1982.
[37]
J. Ouyang and J. R. Lange. Preemptable Ticket Spinlocks: Improving Consolidated Performance in the Cloud. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environment (VEE'13), 2013.
[38]
S. Panneerselvam and M. M. Swift. Chameleon: Operating System Support for Dynamic Processors. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'12), 2012.
[39]
A. K. Parekh and R. G. Gallager. A Generalized Processor Sharing Approach to Flow Control in Integrated Service Networks: the Single-node Case. ACM Transaction on Networking., 1, 1993.
[40]
A. K. Parekh and R. G. Gallager. A Generalized Processor Sharing Approach to Flow Control in Integrated Service Networks: the Multiple Node Case. ACM Transaction on Networking., 2, 1994.
[41]
J. Rao, K. Wang, X. Zhou, and C. Xu. Optimizing Virtual Machine Scheduling in NUMA Multicore Systems. In Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA'13), 2013.
[42]
M. Shreedhar and G. Varghese. Efficient Fair Queueing Using Deficit Round Robin. In Proceedings of the ACM Symposium on Communications Architectures and Protocols (SIGCOMM'95), 1995.
[43]
D. Shue, M. J. Freedman, and A. Shaikh. Performance Isolation and Fairness for Multi-tenant Cloud Storage. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12)), 2012.
[44]
I. Stoica, H. Adbel-Wahab, and K. Jeffay. On the Duality between Resource Reservation and Proportional Share Resource Allocation. In Proceedings of Multimedia Computing and Networking, 1996.
[45]
The Princeton Application Repository for Shared-Memory Computers (PARSEC) . http://parsec.cs.princeton.edu/.
[46]
C. A. Waldspurger and W. E. Weihl. Lottery Scheduling: Flexible Proportionalshare Resource Management. In Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation (OSDI'94), 1994.
[47]
C. A.Waldspurger andW. E.Weihl. Stride Scheduling: Determinstic Proportionalshare Resource Management. In MIT Technical Report, 1995.
[48]
P. M. Wells, K. Chakraborty, and G. S. Sohi. Hardware Support for Spin Management in Overcommitted Virtual Machines. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT'06)), 2006.
[49]
C. Weng, Z. Zhang, M. Li, and X. Lu. The Hybrid Scheduling Framework for Virtual Machine Systems. In Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environment (VEE'09), 2009.
[50]
Windows Azure Open Cloud Platform. http://www.windowsazure.com.
[51]
C. Xu, S. Gamage, P. N. Rao, A. Kangarlou, R. R. Kompella, and D. Xu. vSlicer: Latency-aware Virtual Machine Scheduling via Differentiated-frequency CPU Slicing . In Proceedings of the 21th International Symposium on High Performance Parallel and Distributed Computing (HPDC'12)), 2012.
[52]
H. Zhang and S. Keshav. Comparison of Rate-Based Service Dispilines. In Proceedings of the ACM Symposium on Communications Architectures and Protocols (SIGCOMM'91), 1991.
[53]
L. Zhang. VirtualClock: A New Traffic Control Algorithm for Packet Switching Networks. In Proceedings of the ACM Symposium on Communications Architectures and Protocols (SIGCOMM'90), 1990.

Cited By

View all
  • (2024)Explorations and Exploitation for Parity-based RAIDs with Ultra-fast SSDsACM Transactions on Storage10.1145/362799220:1(1-32)Online publication date: 30-Jan-2024
  • (2024)A Systematic Investigation of Hardware and Software in Electric Vehicular PlatformProceedings of the 2024 ACM Southeast Conference10.1145/3603287.3651203(9-17)Online publication date: 18-Apr-2024
  • (2023)CONTC: A Traffic Control System for Container Overlay Networks2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS)10.1109/IWQoS57198.2023.10188758(1-10)Online publication date: 19-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '19: Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing
June 2019
278 pages
ISBN:9781450366700
DOI:10.1145/3307681
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. multicore
  3. performance
  4. scheduling

Qualifiers

  • Research-article

Funding Sources

Conference

HPDC '19
Sponsor:

Acceptance Rates

HPDC '19 Paper Acceptance Rate 22 of 106 submissions, 21%;
Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)254
  • Downloads (Last 6 weeks)26
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Explorations and Exploitation for Parity-based RAIDs with Ultra-fast SSDsACM Transactions on Storage10.1145/362799220:1(1-32)Online publication date: 30-Jan-2024
  • (2024)A Systematic Investigation of Hardware and Software in Electric Vehicular PlatformProceedings of the 2024 ACM Southeast Conference10.1145/3603287.3651203(9-17)Online publication date: 18-Apr-2024
  • (2023)CONTC: A Traffic Control System for Container Overlay Networks2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS)10.1109/IWQoS57198.2023.10188758(1-10)Online publication date: 19-Jun-2023
  • (2022)Improving scalability of database systems by reshaping user parallel I/OProceedings of the Seventeenth European Conference on Computer Systems10.1145/3492321.3519570(592-609)Online publication date: 28-Mar-2022
  • (2021)Parallelizing packet processing in container overlay networksProceedings of the Sixteenth European Conference on Computer Systems10.1145/3447786.3456241(261-276)Online publication date: 21-Apr-2021
  • (2021)Quantifying context switch overhead of artificial intelligence workloads on the cloud and edgesProceedings of the 36th Annual ACM Symposium on Applied Computing10.1145/3412841.3441993(1182-1189)Online publication date: 22-Mar-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media