Abstract
Since the beginning of the information era, the demand for computing power has been unstoppable. Whenever the technology advances enough to fulfill the needs of a time, new and more complex problems arise, such that the technology is again insufficient to solve them. In the past, the increase of the performance happened mainly due to instruction level parallelism (ILP), with the introduction of several pipeline stages, out-of-order and speculative execution. The increase of the clock rate frequency was also a important way to improve performance. However, the available ILP exploited by compilers and architectures is reaching its limits (Caparros Cabezas and Stanley-Marbell, Parallelism and data movement characterization of contemporary application classes. In: ACM symposium on parallelism in algorithms and architectures (SPAA), 2011). The increase of clock frequency is also reaching its limits because it raises the energy consumption, which is an important issue for current and future architectures (Tolentino and Cameron, IEEE Comput 45(1):95–97, 2012).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Azimi R, Tam DK, Soares L, Stumm M (2009) Enhancing Operating system support for multicore processors by using hardware performance monitoring. ACM SIGOPS Oper Syst Rev 43(2):56–65. https://doi.org/10.1145/1531793.1531803
Bach M, Charney M, Cohn R, Demikhovsky E, Devor T, Hazelwood K, Jaleel A, Luk CK, Lyons G, Patil H, Tal A (2010) Analyzing parallel programs with pin. IEEE Comput 43(3):34–41
Barrow-Williams N, Fensch C, Moore S (2009) A communication characterisation of splash-2 and parsec. In: IEEE international symposium on workload characterization (IISWC), pp 86–97. https://doi.org/10.1109/IISWC.2009.5306792
Bienia C, Kumar S, Singh JP, Li K (2008b) The PARSEC benchmark suite: characterization and architectural implications. In: International conference on parallel architectures and compilation techniques (PACT), pp 72–81
Borkar S, Chien AA (2011) The future of microprocessors. Commun ACM 54(5):67–77
Caparros Cabezas V, Stanley-Marbell P (2011) Parallelism and data movement characterization of contemporary application classes. In: ACM symposium on parallelism in algorithms and architectures (SPAA)
Casavant TL, Kuhl JG (1988) A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans Softw Eng 14(2):141–154
Chishti Z, Powell MD, Vijaykumar TN (2005) Optimizing replication, communication, and capacity allocation in CMPs. ACM SIGARCH Comput Archit News 33(2):357–368. https://doi.org/10.1145/1080695.1070001
Corbet J (2012a) AutoNUMA: the other approach to NUMA scheduling. http://lwn.net/Articles/488709/
Corbet J (2012b) Toward better NUMA scheduling. http://lwn.net/Articles/486858/
Coteus PW, Knickerbocker JU, Lam CH, Vlasov YA (2011) Technologies for exascale systems. IBM J Res Develop 55(5):14:1–14:12. https://doi.org/10.1147/JRD.2011.2163967
Cruz EHM, Diener M, Pilla LL, Navaux POA (2016c) Hardware-assisted thread and data mapping in hierarchical multicore architectures. ACM Trans Archit Code Optim 13(3):1–25. https://doi.org/10.1145/2975587
Dashti M, Fedorova A, Funston J, Gaud F, Lachaize R, Lepers B, Quéma V, Roth M (2013) Traffic management: a holistic approach to memory placement on NUMA systems. In: Architectural support for programming languages and operating systems (ASPLOS), pp 381–393
Diener M, Cruz EHM, Navaux POA, Busse A, HeißHU (2014) kMAF: automatic kernel-level management of thread and data affinity. In: International conference on parallel architectures and compilation techniques (PACT), pp 277–288
Diener M, Cruz EHM, Navaux POA, Busse A, HeißHU (2015a) Communication-aware process and thread mapping using online communication detection. Parallel Comput 43(March):43–63
Diener M, Cruz EHM, Pilla LL, Dupros F, Navaux POA (2015b) Characterizing communication and page usage of parallel applications for thread and data mapping. Perform Eval 88–89(June):18–36
Feliu J, Sahuquillo J, Petit S, Duato J (2012) Understanding cache hierarchy contention in CMPs to improve job scheduling. In: International parallel and distributed processing symposium (IPDPS). https://doi.org/10.1109/IPDPS.2012.54
Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM, Sahay V, Kambadur P, Barrett B, Lumsdaine A (2004) Open MPI: goals, concept, and design of a next generation MPI implementation. In: Recent advances in parallel virtual machine and message passing interface
Intel (2012) 2nd generation Intel core processor family. Tech. Rep., September
Jin H, Frumkin M, Yan J (1999) The OpenMP implementation of NAS parallel benchmarks and its performance. Tech. Rep., October, NASA
LaRowe RP, Holliday MA, Ellis CS (1992) An analysis of dynamic page placement on a NUMA multiprocessor. ACM SIGMETRICS Perform Eval Rev 20(1):23–34
Marathe J, Thakkar V, Mueller F (2010) Feedback-directed page placement for ccNUMA via hardware-generated memory traces. J Parallel Distrib Comput 70(12):1204–1219
Martin MMK, Hill MD, Sorin DJ (2012) Why on-chip cache coherence is here to stay. Commun ACM 55(7):78. https://doi.org/10.1145/2209249.2209269
OpenMP (2013) OpenMP application program interface. Tech. Rep., July
Tolentino M, Cameron K (2012) The optimist, the pessimist, and the global race to exascale in 20 megawatts. IEEE Comput 45(1):95–97
Torrellas J (2009) Architectures for extreme-scale computing. IEEE Comput 42(11):28–35
Verghese B, Devine S, Gupta A, Rosenblum M (1996) OS support for improving data locality on CC-NUMA compute servers. Tech. Rep., February
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 The Author(s), under exclusive licence to Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
H. M. Cruz, E., Diener, M., O. A. Navaux, P. (2018). Introduction. In: Thread and Data Mapping for Multicore Systems. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-91074-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-91074-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91073-4
Online ISBN: 978-3-319-91074-1
eBook Packages: Computer ScienceComputer Science (R0)