Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Optimal Symbiosis and Fair Scheduling in Shared Cache

Published: 01 April 2017 Publication History

Abstract

On multi-core processors, applications are run sharing the cache. This paper presents optimization theory to co-locate applications to minimize cache interference and maximize performance. The theory precisely specifies MRC-based composition, optimization, and correctness conditions. The paper also presents a new technique called footprint symbiosis to obtain the best shared cache performance under fair CPU allocation as well as a new sampling technique which reduces the cost of locality analysis. When sampling and optimization are combined, the paper shows that it takes less than 0.1 second analysis per program to obtain a co-run that is within 1.5 percent of the best possible performance. In an exhaustive evaluation with 12,870 tests, the best prior work improves co-run performance by 56 percent on average. The new optimization improves it by another 29 percent. Without single co-run test, footprint symbiosis is able to choose co-run choices that are just 8 percent slower than the best co-run solutions found with exhaustive testing.

References

[1]
M. Arnold and B. G. Ryder, “A framework for reducing the cost of instrumented code,” in Proc. ACM SIGPLAN Conf. Program. Language Des. Implementation, Jun. 2001, pp. 168–179.
[2]
E. Berg and E. Hagersten, “StatCache: A probabilistic approach to efficient and accurate data locality analysis,” in Proc. IEEE Int. Symp. Perform. Anal. Syst. Softw., 2004, pp. 20–27.
[3]
K. Beyls and E. H. D'Hollander, “Discovery of locality-improving refactoring by reuse path analysis,” in Proc. High Perform. Comput. Commun., 2006, pp. 220–229.
[4]
M. D. Bond, K. E. Coons, and K. S. McKinley, “PACER: Proportional detection of data races,” in Proc. ACM SIGPLAN Conf. Program. Language Des. Implementation, 2010, pp. 255–268.
[5]
J. F. Cantin and M. D. Hill, “Cache performance for SPEC CPU2000 benchmarks,” 2003. {Online}. Available: http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data
[6]
C. Cascaval, E. Duesterwald, P. F. Sweeney, and R. W. Wisniewski, “Multiple page size modeling and optimization,” in Proc. Int. Conf. Parallel Archit. Compilation Tech., 2005, pp. 339–349.
[7]
T. M. Chilimbi and M. Hirzel, “Dynamic hot data stream prefetching for general-purpose programs,” in Proc. ACM SIGPLAN Conf. Program. Language Des. Implementation, 2002, pp. 199–209.
[8]
C. Delimitrou and C. Kozyrakis, “Paragon: QoS-aware scheduling for heterogeneous datacenters,” in Proc. Int. Conf. Archit. Support Program. Languages Operating Syst., 2013, pp. 77–88.
[9]
P. J. Denning, “The working set model for program behaviour,” Commun. ACM, vol. Volume 11, no. Issue 5, pp. 323–333, 1968.
[10]
P. J. Denning and S. C. Schwartz, “Properties of the working set model,” Commun. ACM, vol. Volume 15, no. Issue 3, pp. 191–198, 1972.
[11]
C. Ding, X. Xiang, B. Bao, H. Luo, Y. Luo, and X. Wang, “Performance metrics and models for shared cache,” J. Comput. Sci. Technol., vol. Volume 29, no. Issue 4, pp. 692–712, 2014.
[12]
Y. Jiang, X. Shen, J. Chen, and R. Tripathi, “Analysis and approximation of optimal co-scheduling on chip multiprocessors,” in Proc. Int. Conf. Parallel Archit. Compilation Tech., 2008, pp. 220–229.
[13]
Y. Jiang, K. Tian, and X. Shen, “Combining locality analysis with online proactive job co-scheduling in chip multiprocessors,” in Proc. Int. Conf. High Perform. Embedded Archit. Compilers, 2010, pp. 201–215.
[14]
Y. Jiang, K. Tian, X. Shen, J. Zhang, J. Chen, and R. Tripathi, “The complexity of optimal job co-scheduling on chip multiprocessors and heuristics-based solutions,” IEEE Trans. Parallel Distrib. Syst., vol. Volume 22, no. Issue 7, pp. 1192–1205, 2011.
[15]
, 15, 2016. {Online}. Available: https://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt
[16]
L. Liu, Y. Li, Z. Cui, Y. Bao, M. Chen, and C. Wu, “Going vertical in memory management: Handling multiplicity by multi-policy,” in Proc. Int. Symp. Comput. Archit., 2014, pp. 169–180.
[17]
X. Liu, K. Sharma, and J. M. Mellor-Crummey, “ArrayTool: A lightweight profiler to guide array regrouping,” in Proc. Int. Conf. Parallel Archit. Compilation Tech., 2014, pp. 405–416.
[18]
J. Mars, L. Tang, K. Skadron, M. L. Soffa, and R. Hundt, “Increasing utilization in modern warehouse-scale computers using bubble-up,” IEEE Micro, vol. Volume 32, no. Issue 3, pp. 88–99, 2012.
[19]
R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger, “Evaluation techniques for storage hierarchies,” IBM Syst. J., vol. Volume 9, no. Issue 2, pp. 78–117, 1970.
[20]
T. Moseley, A. Shye, V. J. Reddi, D. Grunwald, and R. Peri, “Shadow profiling: Hiding instrumentation costs with parallelism,” in Proc. Int. Symp. Code Generation Optimization, 2007, pp. 198–208.
[21]
A. Sandberg, A. Sembrant, E. Hagersten, and D. Black-Schaffer, “Modeling performance variation due to cache sharing,” in Proc. Int. Symp. High-Perform. Comput. Archit., 2013, pp. 155–166.
[22]
D. L. Schuff, M. Kulkarni, and V. S. Pai, “Accelerating multicore reuse distance analysis with sampling and parallelization,” in Proc. Int. Conf. Parallel Archit. Compilation Tech., 2010, pp. 53–64.
[23]
R. Sen and D. A. Wood, “Reuse-based online models for caches,” in Proc. Int. Conf. Meas. Model. Comput. Syst., 2013, pp. 279–292.
[24]
A. J. Smith, “On the effectiveness of set associative page mapping and its applications in main memory management,” in Proc. Int. Conf. Softw. Eng., 1976, pp. 286–292.
[25]
A. Snavely and D. M. Tullsen, “Symbiotic jobscheduling for a simultaneous multithreading processor,” in Proc. Int. Conf. Archit. Support Program. Languages Operating Syst., 2000, pp. 234–244.
[26]
X.-H. Sun and D. Wang, “APC: A performance metric of memory systems,” SIGMETRICS Perform. Eval. Rev., vol. Volume 40, no. Issue 2, pp. 125–130, 2012.
[27]
D. K. Tam, R. Azimi, L. Soares, and M. Stumm, “RapidMRC: Approximating L2 miss rate curves on commodity systems for online optimizations,” in Proc. Int. Conf. Archit. Support Program. Languages Operating Syst., 2009, pp. 121–132.
[28]
S. Wallace and K. Hazelwood, “SuperPin: Parallelizing dynamic instrumentation for real-time performance,” in Proc. Int. Symp. Code Generation Optimization, 2007, pp. 209–220.
[29]
J. Wires, S. Ingram, Z. Drudi, N. J. Harvey, A. Warfield, and C. Data, “Characterizing storage workloads with counter stacks,” in Proc. Symp. Operating Syst. Des. Implementation, 2014, pp. 335–349.
[30]
M.-J. Wu, M. Zhao, and D. Yeung, “Studying multicore processor scaling via reuse distance analysis,” in Proc. Int. Symp. Comput. Archit., 2013, pp. 499–510.
[31]
X. Xiang, B. Bao, T. Bai, C. Ding, and T. M. Chilimbi, “All-window profiling and composable models of cache sharing,” in Proc. ACM SIGPLAN Symp. Principles Practice Parallel Program., 2011, pp. 91–102.
[32]
X. Xiang, B. Bao, C. Ding, and Y. Gao, “Linear-time modeling of program working set in shared cache,” in Proc. Int. Conf. Parallel Archit. Compilation Tech., 2011, pp. 350–360.
[33]
X. Xiang, B. Bao, C. Ding, and K. Shen, “Cache conscious task regrouping on multicore processors,” in Proc. IEEE Int. Symp. Cluster Comput. Grid, 2012, pp. 603–611.
[34]
X. Xiang, C. Ding, H. Luo, and B. Bao, “HOTL: A higher order theory of locality,” in Proc. Int. Conf. Archit. Support Program. Languages Operating Syst., 2013, pp. 343–356.
[35]
X. Zhang, S. Dwarkadas, and K. Shen, “Towards practical page coloring-based multicore cache management,” in Proc. 4th ACM Eur. Conf. Comput. Syst., 2009, pp. 89–102.
[36]
Y. Zhong and W. Chang, “Sampling-based program locality approximation,” in Proc. Int. Symp. Memory Manage., 2008, pp. 91–100.
[37]
Y. Zhong, X. Shen, and C. Ding, “Program locality analysis using reuse distance,” ACM Trans. Program. Languages Syst., vol. Volume 31, no. Issue 6, pp. 1–39, 2009.
[38]
S. Zhuravlev, S. Blagodurov, and A. Fedorova, “Addressing shared resource contention in multicore processors via scheduling,” in Proc. Int. Conf. Archit. Support Program. Languages Operating Syst., 2010, pp. 129–142.
[39]
S. Kim, D. Chandra, and Y. Solihin, “Fair cache sharing and partitioning in a chip multiprocessor architecture,” in Proc. 13th Int. Conf. Parallel Archit. Compilation Tech., 2004, pp. 111–122.
[40]
D. Chandra, F. Guo, S. Kim, and Y. Solihin, “Predicting inter-thread cache contention on a chip multi-processor architecture,” in Proc. Int. Symp. High-Perform. Comput. Archit., 2005, pp. 340–351.
[41]
G. E. Suh, S. Devadas, and L. Rudolph, “Analytical cache models with applications to cache partitioning,” in Proc. Int. Conf. Supercomputing, 2001, pp. 1–12.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 28, Issue 4
April 2017
310 pages

Publisher

IEEE Press

Publication History

Published: 01 April 2017

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)CARL: Compiler Assigned Reference LeasingACM Transactions on Architecture and Code Optimization10.1145/349873019:1(1-28)Online publication date: 17-Mar-2022
  • (2019)EMBAProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337863(1-12)Online publication date: 5-Aug-2019
  • (2018)Locality analysis through static parallel samplingACM SIGPLAN Notices10.1145/3296979.319240253:4(557-570)Online publication date: 11-Jun-2018
  • (2018)Locality analysis through static parallel samplingProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192402(557-570)Online publication date: 11-Jun-2018
  • (2018)DCAPSProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190511(1-15)Online publication date: 23-Apr-2018

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media