Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/MICRO.2014.53acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
tutorial

SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers

Published: 13 December 2014 Publication History

Abstract

One of the key challenges for improving efficiency in warehouse scale computers (WSCs) is to improve server utilization while guaranteeing the quality of service (QoS) of latency-sensitive applications. To this end, prior work has proposed techniques to precisely predict performance and QoS interference to identify 'safe' application co-locations. However, such techniques are only applicable to resources shared across cores. Achieving such precise interference prediction on real-system simultaneous multithreading (SMT) architectures has been a significantly challenging open problem due to the complexity introduced by sharing resources within a core.
In this paper, we demonstrate through a real-system investigation that the fundamental difference between resource sharing behaviors on CMP and SMT architectures calls for a redesign of the way we model interference. For SMT servers, the interference on different shared resources, including private caches, memory ports, as well as integer and floating-point functional units, do not correlate with each other. This insight suggests the necessity of decoupling interference into multiple resource sharing dimensions. In this work, we propose SMiTe, a methodology that enables precise performance prediction for SMT co-location on real-system commodity processors. With a set of Rulers, which are carefully designed software stressors that apply pressure to a multidimensional space of shared resources, we quantify application sensitivity and contentiousness in a decoupled manner. We then establish a regression model to combine the sensitivity and contentiousness in different dimensions to predict performance interference. Using this methodology, we are able to precisely predict the performance interference in SMT co-location with an average error of 2.80% on SPEC CPU2006 and 1.79% on Cloud Suite. Our evaluation shows that SMiTe allows us to improve the utilization of WSCs by up to 42.57% while enforcing an application's QoS requirements.

References

[1]
L. A. Barroso and U. Hölzle, "The case for energy-proportional computing," IEEE computer, vol. 40, no. 12, pp. 33--37, 2007.
[2]
H. Yang, A. Breslow, J. Mars, and L. Tang, "Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers," in Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA). ACM, 2013, pp. 607--618.
[3]
L. Tang, J. Mars, W. Wang, T. Dey, and M. L. Soffa, "Reqos: Reactive static/dynamic compilation for qos in warehouse scale computers," in Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). AcM, 2013, pp. 89--100.
[4]
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa, "Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations," in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). ACM, 2011, pp. 248--259.
[5]
C. Delimitrou and C. Kozyrakis, "Quasar: Resource-efficient and qos-aware cluster management," in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2014.
[6]
M. Laurenzano, Y. Zhang, L. Tang, and J. Mars, "Protean code: Achieving near-free online code transformations for warehouse scale computers," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Cambridge, UK: ACM, 2014.
[7]
D. M. Tullsen, S. J. eggers, and H. M. Levy, "Simultaneous multithreading: Maximizing on-chip parallelism," in Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA). ACM, 1995, pp. 392--403.
[8]
D. M. Tullsen, S. J. eggers, J. S. emer, H. M. Levy, J. L. Lo, and R. L. Stamm, "Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor," in Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA), vol. 24, no. 2. ACM, 1996, pp. 191--202.
[9]
J. L. Henning, "Spec cpu2006 benchmark descriptions," ACM SIGARCH Computer Architecture News, vol. 34, no. 4, pp. 1--17, 2006.
[10]
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the clouds: a study of emerging scale-out workloads on modern hardware," in Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), vol. 40, no. 1. ACM, 2012, pp. 37--48.
[11]
"Intel next generation microarchitecture codename sandy bridge: New processor innovations," Intel Developer Forum. 2010.
[12]
B. Sinharoy, R. Kalla, W. J. Starke, H. Q. Le, R. Cargnoni, J. A. Van Norstrand, B. J. Ronchetti, J. Stuecheli, J. Leenstra, G. L. Guthrie, D. Q. Nguyen, B. Blaner, C. F. Marino, E. Retter, and P. Williams, "Ibm power7 multicore server processor," IBM Journal of Research and Development, vol. 55, no. 3, pp. 1:1--1:29, May 2011.
[13]
J. Dean and L. A. Barroso, "The tail at scale," Communications of the ACM, vol. 56, no. 2, pp. 74--80, 2013.
[14]
M. Harchol-Balter, Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, 2013.
[15]
V. Gupta, M. Harchol-Balter, J. Dai, and B. Zwart, "On the inapproximability of m/g/k: why two moments of job size distribution are not enough," Queueing Systems, vol. 64, no. 1, pp. 5--48, 2010.
[16]
J. Li, N. K. Sharma, D. R. Ports, and S. D. Gribble, "Tales of the tail: Hardware, os, and application-level sources of tail latency."
[17]
J. Feliu, J. Sahuquillo, S. Petit, and J. Duato, "L1-bandwidth aware thread allocation in multicore smt processors," in Proceedings of the 22nd international conference on Parallel architectures and compilation techniques (PACT). IEEE Press, 2013, pp. 123--132.
[18]
R. Bertran, A. Buyuktosunoglu, M. S. Gupta, M. González, and P. Bose, "Systematic energy characterization of cm-p/smt processor systems via automated micro-benchmarks," in Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2012, pp. 199--211.
[19]
P. Guide, "Intel® 64 and ia-32 architectures software developer's manual," 2010.
[20]
S. Eranian, "Linux perf_events subsystem status update," in Petascale Tools Workshop, July 2013.
[21]
L. A. Barroso, J. Clidaras, and U. Hölzle, "The datacenter as a computer: An introduction to the design of warehouse-scale machines," Synthesis Lectures on Computer Architecture, vol. 8, no. 3, pp. 1--154, 2013.
[22]
"Google data center pue performance," http://www.google. com/about/datacenters/efficiency/internal/, accessed 30-May- 2014.
[23]
S. Eyerman and L. Eeckhout, "Probabilistic job symbiosis modeling for smt processor scheduling," in Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 2010, pp. 91--102.
[24]
F. J. Cazorla, A. Ramirez, M. Valero, P. M. Knijnenburg, R. Sakellariou, and E. Fernández, "Qos for high-performance smt processors in embedded systems," IEEE Micro, vol. 24, no. 4, pp. 24--31, 2004.
[25]
F. J. Cazorla, P. M. Knijnenburg, R. Sakellariou, E. Fernandez, A. Ramirez, and M. Valero, "Predictable performance in smt processors: Synergy between the os and smts," IEEE Transactions on Computers, vol. 55, no. 7, pp. 785--799, 2006.
[26]
A. Snavely and D. M. Tullsen, "Symbiotic jobscheduling for a simultaneous mutlithreading processor," in Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), vol. 35, no. 11. ACM, 2000, pp. 234--244.
[27]
M. De Vuyst, R. Kumar, and D. M. Tullsen, "Exploiting unbalanced thread scheduling for energy and performance on a cmp of smt processors," in Proceedings of the 20th International Conference on Parallel and Distributed Processing (IPDPS). IEEE, 2006, pp. 10-pp.
[28]
A. Vega, A. Buyuktosunoglu, and P. Bose, "Smt-centric power-aware thread placement in chip multiprocessors," in 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 2013, pp. 167--176.
[29]
L. C. Carrington, M. Laurenzano, A. Snavely, R. L. Campbell, and L. P. Davis, "How well can simple metrics represent the performance of hpc applications?" in International Conference on High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 2005, pp. 48--48.
[30]
M. A. Laurenzano, M. Meswani, L. Carrington, A. Snavely, M. M. Tikir, and S. Poole, "Reducing energy usage with memory and computation-aware dynamic frequency scaling," in European Conference on Parallel Processing (Euro-Par). Springer, 2011, pp. 79--90.
[31]
C. Delimitrou and C. Kozyrakis, "ibench: Quantifying interference for datacenter applications," in 2013 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2013, pp. 23--33.
[32]
L. Tang, J. Mars, and M. L. Soffa, "Compiling for niceness: Mitigating contention for qos in warehouse scale computers," in Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO). ACM, 2012, pp. 1--12.

Cited By

View all
  • (2023)Component-distinguishable Co-location and Resource Reclamation for High-throughput ComputingACM Transactions on Computer Systems10.1145/363000642:1-2(1-37)Online publication date: 18-Nov-2023
  • (2023)The Gap Between Serverless Research and Real-world SystemsProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624785(475-485)Online publication date: 30-Oct-2023
  • (2023)Jointly Optimizing Job Assignment and Resource Partitioning for Improving System Throughput in Cloud DatacentersACM Transactions on Architecture and Code Optimization10.1145/359305520:3(1-24)Online publication date: 19-Jul-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture
December 2014
697 pages
ISBN:9781479969982

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 13 December 2014

Check for updates

Author Tags

  1. datacenter
  2. quality of service
  3. simultaneous multithreading
  4. warehouse scale computer

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

MICRO-47
Sponsor:

Acceptance Rates

MICRO-47 Paper Acceptance Rate 53 of 279 submissions, 19%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Component-distinguishable Co-location and Resource Reclamation for High-throughput ComputingACM Transactions on Computer Systems10.1145/363000642:1-2(1-37)Online publication date: 18-Nov-2023
  • (2023)The Gap Between Serverless Research and Real-world SystemsProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624785(475-485)Online publication date: 30-Oct-2023
  • (2023)Jointly Optimizing Job Assignment and Resource Partitioning for Improving System Throughput in Cloud DatacentersACM Transactions on Architecture and Code Optimization10.1145/359305520:3(1-24)Online publication date: 19-Jul-2023
  • (2023)OLPart: Online Learning based Resource Partitioning for Colocating Multiple Latency-Critical Jobs on Commodity ComputersProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3567490(347-364)Online publication date: 8-May-2023
  • (2022)GPUPoolProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569650(317-332)Online publication date: 8-Oct-2022
  • (2022)Characterizing Job Microarchitectural Profiles at Scale: Dataset and AnalysisProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545026(1-11)Online publication date: 29-Aug-2022
  • (2022)MISOProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563510(173-189)Online publication date: 7-Nov-2022
  • (2022)Workload consolidation in alibaba clustersProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563465(210-225)Online publication date: 7-Nov-2022
  • (2021)ApproxNet: Content and Contention-Aware Video Object Classification System for Embedded ClientsACM Transactions on Sensor Networks10.1145/346353018:1(1-27)Online publication date: 5-Oct-2021
  • (2021)Understanding, predicting and scheduling serverless workloads under partial interferenceProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476215(1-15)Online publication date: 14-Nov-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media