Abstract
Text Cloning occurs when a processor is storing in its shared caches the same text multiple times. There are several causes of Text Cloning and we classify them either as Extrinsic or Intrinsic.
Extrinsic Text Cloning can happen due to user and software practices, or middleware policies, which result into making multiple copies of a binary and concurrently executing the multiple copies on the same processor.
Intrinsic Text Cloning can happen when an instruction cache is Virtually Indexed/Virtually Tagged. A simultaneous multithreaded processor, that employs such cache, will map different processes of the same binary to different instruction cache space due to their distinct process identifier.
Text cloning can be wasteful to performance, especially for simultaneous multithreaded processors, because concurrent processes compete for cache space to store the same instruction blocks.
Experimental results on simultaneous multithreaded processors indicate that the performance overhead of this type of undesirable cloning is significant.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Enabling Grids for E-sciencE, http://www.eu-egee.org/
KVM: Kernel Based Virtual Machine, http://www.linux-kvm.org/
ARM: Cortex-A8 Technical Reference Manual (2007)
Beckmann, B.M., Wood, D.A.: Managing wire delay in large chip-multiprocessor caches. In: MICRO 37: Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 319–330. IEEE Computer Society, Washington, DC (2004)
Beszedes, A., Ferenc, R., Gyimuthy, T., Dolenc, A., Karsisto, K.: Survey of Code-Size Reduction Methods. ACM Comput. Surv. 35(3) (September 2003)
Biswas, S., Franklin, D., Savage, A., Dixon, R., Sherwood, T., Chong, F.T.: Multi-execution: multicore caching for data-similar executions. In: ISCA (June 2009)
Casazza, J.: First the tick, now the tock: Intelmicroarchitecture (nehalem). Intel Corporation
Chishti, Z., Powell, M.D., Vijaykumar, T.N.: Optimizing replication, communication, and capacity allocation in cmps. SIGARCH Comput. Archit. News 33(2), 357–368 (2005)
Cooper, K.D., McIntosh, N.: Enhanced Code Compression for Embedded RISC Processors. In: Proceedings of PLDI (May 1999)
Debray, S., Evans, W., Muth, R., Sutter, B.D.: Compiler Techniques for Code Compaction. ACM Transactions on Programming Languages and Systems 22(2) (March 2000)
Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid - enabling scalable virtual organizations. International Journal of Supercomputer Applications 15, 2001 (2001)
Harizopoulos, S., Ailamaki, A.: Improving instruction cache performance in oltp. ACM Trans. Database Syst. 31(3), 887–920 (2006)
Kleanthous, M., Sazeides, Y.: Catch: A mechanism for dynamically detecting cache-content-duplication and its application to instruction caches. In: DATE (March 2008)
Koufaty, D., Marr, D.T.: Hyper-Threading Technology in the Netburst Microarchitecture. IEEE Micro 23(2), 56–65 (2003)
Lefurgy, C., Bird, P., Chen, I.C., Mudge, T.: Improving Code Density Using Compression Techniques. In: Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 194–203 (December 1997)
Marco, C., Fabio, C., Alvise, D., Antonia, C., Francesco, G., Alessandro, M., Moreno, M., Salvatore, M., Fabrizio, P., Luca, P., Francesco, P.: The glite workload management system. In: 4th International Conference on Grid and Pervasive Computing (2009)
Mohamood, F., Ghosh, M., Lee, H.H.S.: DLL-conscious Instruction Fetch Optimization for SMT Processors. Journal of Systems Architecture 54, 1089–1100 (2008)
Sager, D., Group, D.P., Corp, I.: The microarchitecture of the pentium 4 processor. Intel Technology Journal (2001)
Services, A.W.: Amazon elastic compute cloud: User guide. Tech. Rep. API Version 2009-11-30 (2010)
Shah, M., Barreh, J., Brooks, J., Golla, R., Grohoski, G., Gura, N., Hetherington, R., Jordan, P., Luttrell, M., Olson, C., Saha, B., Sheahan, D., Spracklen, L., Wynn, A.: Ultrasparc t2: A highly-threaded, power-efficient, sparc soc. In: A-SSCC 2007 (November 2007)
Shayesteh, A., Reinman, G., Jouppi, N., Sair, S., Sherwood, T.: Dynamically configurable shared cmp helper engines for improved performance. SIGARCH Comput. Archit. News 33(4), 70–79 (2005)
Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: ASPLOS (October 2002)
Sinharoy, B.: Power7 multi-core processor design. In: MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (2009)
Smith, A.J.: Cache Memories. ACM Computing Surveys (CSUR) 14(3), 473–530 (1982)
Snavely, A., Tullsen, D.M.: Symbiotic job scheduling for a simultaneous multithreaded processor. ACM SIGARCH Computer Architecture News 28(5), 234–244 (2000)
Tullsen, D.M.: Simulation and modeling of a simultaneous multithreading processor. In: Int. CMG Conference (1996)
Tullsen, D., Eggers, S., Levy, H.: Simultaneous Multithreading: Maximizing On-Chip Parallelism. In: 22nd Annual International Symposium on Computer Architecture (June 1995)
Yamamoto, W., Serrano, M., Talcott, A., Wood, R., Nemirosky, M.: Performance estimation of multistreamed, superscalar processors. In: Twenty-Seventh Hawaii Internation Conference on 1994
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kleanthous, M., Sazeides, Y., Dikaiakos, M.D. (2011). Extrinsic and Intrinsic Text Cloning. In: Varbanescu, A.L., Molnos, A., van Nieuwpoort, R. (eds) Computer Architecture. ISCA 2010. Lecture Notes in Computer Science, vol 6161. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24322-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-24322-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24321-9
Online ISBN: 978-3-642-24322-6
eBook Packages: Computer ScienceComputer Science (R0)