Abstract.
Simultaneous multithreaded processors use shared on-chip caches, which yield better cost-performance ratios. Sharing a cache between simultaneously executing threads causes excessive conflict misses. This paper proposes software solutions for dynamically partitioning the shared cache of an SMT processor, via the use of three methods originating in the optimizing compilers literature: dynamic tiling, copying and block data layouts. The paper presents an algorithm that combines these transformations and two runtime mechanisms to detect cache sharing between threads and react to it at runtime. The first mechanism uses minimal kernel extensions and the second mechanism uses information collected from the processor hardware counters. Our experimental results show that for regular, perfect loop nests, these transformations are very effective in coping with shared caches. When the caches are shared between threads from the same address space, performance is improved by 16-29% on average. Similar improvements are observed when the caches are shared between threads from different address spaces. To our knowledge, this is the first work to present an all-software approach for managing shared caches on SMT processors. It is also one of the first performance and program optimization studies conducted on a commercial SMT-based multiprocessor using Intel’s hyperthreading technology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Browne, S., Dongarra, J., Garner, N., London, K., Mucci, P.: A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters. In: Proc. of Supercomputing 2000: High Performance Networking and Computing Conference, Dallas, TX (November 2000)
Cascaval, C., Padua, D.: Estimating Cache Misses and Locality using Stack Distances. In: Proc. of the 17th ACM International Conference on Supercomputing (ICS 2003), San Francisco, CA, June 2003, pp. 150–159 (2003)
Chame, J., Moon, S.: A Tile Selection Algorithm for Data Locality and Cache Interference. In: Proc. of the 13th ACM International Conference on Supercomputing (ICS 1999), Rhodes, Greece, June 1999, pp. 492–499 (1999)
Coleman, S., McKinley, K.: Tile Size Selection Using Cache Organization and Data Layout. In: Proc. of the 1995 ACM SIGPLAN Conference on Programming Languages Design and Implementation (PLDI 1995), San Diego, CA, June 1995, pp. 279–290 (1995)
Craig, D.: An Integrated Kernel and User-Level Paradigm for Efficient Multiprogramming. Technical Report CSRD No. 1533, University of Illinois at Urbana- Champaign (June 1999)
Kodukula, I., Ahmed, N., Pingali, K.: Data-Centric Multilevel Blocking. In: Proc. of the 1997 ACM SIGPLAN Conference on Programming Languages Design and Implementation (PLDI 1997), Las Vegas, Nevada, June 1997, pp. 346–357 (1997)
Mateev, N., Ahmed, N., Pingali, K.: Tiling Imperfect Loop Nests. In: Proc. of the IEEE/ACM Supercomputing 2000: High Performance Networking and Computing Conference (SC 2000), Dallas, TX (November 2000)
McDowell, L., Eggers, S., Gribble, S.: Improving Server Software Support for Simultaneous Multithreaded Processors. In: Proc. of the 2003 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2003), San Diego, CA (June 2003)
McKinley, K., Carr, S., Tseng, C.: Improving Data Locality with Loop Transformations. ACM Transactions on Programming Languages and Systems 18(4), 424–453 (1996)
Park, N., Hong, B., Prasanna, V.: Analysis of Memory Hierarchy Performance of Block Data Layout. In: Proc. of the 2002 International Conference on Parallel Processing (ICPP 2002), Vancouver, Canada, August 2002, pp. 35–42 (2002)
Redstone, J., Eggers, S., Levy, H.: Analysis of Operating System Behavior on a Simultaneous Multithreaded Architecture. In: Proc. of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX), Cambridge, MA (November 2000)
Rivera, G., Tseng, C.: A Comparison of Tiling Algorithms. In: Jähnichen, S. (ed.) CC 1999. LNCS, vol. 1575, pp. 168–182. Springer, Heidelberg (1999)
Suh, G., Devadas, S., Rudolph, L.: Analytical Cache Models with Applications to Cache Partitioning. In: Proc. of the 15th ACM International Conference on Supercomputing (ICS 2001), Sorrento, Italy, June 2001, pp. 1–12 (2001)
Suh, G., Rudolph, L., Devadas, S.: Effects of Memory Performance on Parallel Job Scheduling. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 116–132. Springer, Heidelberg (2002)
Temam, O., Granston, E., Jalby, W.: To Copy or Not to Copy: A Compile- Time Technique for Assessing when Data Copying Should be Used to Eliminate Cache Conflicts. In: Proc. of the ACM/IEEE Supercomputing 1993: High Performance Networking and Computing Conference (SC 1993), Portland, OR, November 1993, pp. 410–419 (1993)
Tullsen, D., Eggers, S., Levy, H.: Simultaneous Multithreading: Maximizing On-Chip Parallelism. In: Proceedings of the 22nd International Symposium on Computer Architecture (ISCA 1995), June 1995, pp. 392–403. St. Margherita Ligure, Italy (1995)
Wolf, M., Lam, M.: A Data Locality Optimizing Algorithm. In: Proc. of the 1991 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 1991), Toronto, Canada, June 1991, pp. 30–44 (1991)
Xue, J.: Loop Tiling for Parallelism, August 2000. Kluwer Academic Publishers, Dordrecht (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nikolopoulos, D.S. (2003). Code and Data Transformations for Improving Shared Cache Performance on SMT Processors. In: Veidenbaum, A., Joe, K., Amano, H., Aiso, H. (eds) High Performance Computing. ISHPC 2003. Lecture Notes in Computer Science, vol 2858. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39707-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-39707-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20359-9
Online ISBN: 978-3-540-39707-6
eBook Packages: Springer Book Archive