Introduction

Eduardo H. M. Cruz¹⁷,
Matthias Diener¹⁸ &
Philippe O. A. Navaux¹⁹

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

520 Accesses

Abstract

Since the beginning of the information era, the demand for computing power has been unstoppable. Whenever the technology advances enough to fulfill the needs of a time, new and more complex problems arise, such that the technology is again insufficient to solve them. In the past, the increase of the performance happened mainly due to instruction level parallelism (ILP), with the introduction of several pipeline stages, out-of-order and speculative execution. The increase of the clock rate frequency was also a important way to improve performance. However, the available ILP exploited by compilers and architectures is reaching its limits (Caparros Cabezas and Stanley-Marbell, Parallelism and data movement characterization of contemporary application classes. In: ACM symposium on parallelism in algorithms and architectures (SPAA), 2011). The increase of clock frequency is also reaching its limits because it raises the energy consumption, which is an important issue for current and future architectures (Tolentino and Cameron, IEEE Comput 45(1):95–97, 2012).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

eBook: USD 15.99; Price excludes VAT (USA)

Softcover Book: USD 15.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Azimi R, Tam DK, Soares L, Stumm M (2009) Enhancing Operating system support for multicore processors by using hardware performance monitoring. ACM SIGOPS Oper Syst Rev 43(2):56–65. https://doi.org/10.1145/1531793.1531803
Bach M, Charney M, Cohn R, Demikhovsky E, Devor T, Hazelwood K, Jaleel A, Luk CK, Lyons G, Patil H, Tal A (2010) Analyzing parallel programs with pin. IEEE Comput 43(3):34–41
Google Scholar
Barrow-Williams N, Fensch C, Moore S (2009) A communication characterisation of splash-2 and parsec. In: IEEE international symposium on workload characterization (IISWC), pp 86–97. https://doi.org/10.1109/IISWC.2009.5306792
Bienia C, Kumar S, Singh JP, Li K (2008b) The PARSEC benchmark suite: characterization and architectural implications. In: International conference on parallel architectures and compilation techniques (PACT), pp 72–81
Google Scholar
Borkar S, Chien AA (2011) The future of microprocessors. Commun ACM 54(5):67–77
Google Scholar
Caparros Cabezas V, Stanley-Marbell P (2011) Parallelism and data movement characterization of contemporary application classes. In: ACM symposium on parallelism in algorithms and architectures (SPAA)
Google Scholar
Casavant TL, Kuhl JG (1988) A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans Softw Eng 14(2):141–154
Google Scholar
Chishti Z, Powell MD, Vijaykumar TN (2005) Optimizing replication, communication, and capacity allocation in CMPs. ACM SIGARCH Comput Archit News 33(2):357–368. https://doi.org/10.1145/1080695.1070001
Corbet J (2012a) AutoNUMA: the other approach to NUMA scheduling. http://lwn.net/Articles/488709/
Corbet J (2012b) Toward better NUMA scheduling. http://lwn.net/Articles/486858/
Coteus PW, Knickerbocker JU, Lam CH, Vlasov YA (2011) Technologies for exascale systems. IBM J Res Develop 55(5):14:1–14:12. https://doi.org/10.1147/JRD.2011.2163967
Cruz EHM, Diener M, Pilla LL, Navaux POA (2016c) Hardware-assisted thread and data mapping in hierarchical multicore architectures. ACM Trans Archit Code Optim 13(3):1–25. https://doi.org/10.1145/2975587
Dashti M, Fedorova A, Funston J, Gaud F, Lachaize R, Lepers B, Quéma V, Roth M (2013) Traffic management: a holistic approach to memory placement on NUMA systems. In: Architectural support for programming languages and operating systems (ASPLOS), pp 381–393
Google Scholar
Diener M, Cruz EHM, Navaux POA, Busse A, HeißHU (2014) kMAF: automatic kernel-level management of thread and data affinity. In: International conference on parallel architectures and compilation techniques (PACT), pp 277–288
Google Scholar
Diener M, Cruz EHM, Navaux POA, Busse A, HeißHU (2015a) Communication-aware process and thread mapping using online communication detection. Parallel Comput 43(March):43–63
Google Scholar
Diener M, Cruz EHM, Pilla LL, Dupros F, Navaux POA (2015b) Characterizing communication and page usage of parallel applications for thread and data mapping. Perform Eval 88–89(June):18–36
Google Scholar
Feliu J, Sahuquillo J, Petit S, Duato J (2012) Understanding cache hierarchy contention in CMPs to improve job scheduling. In: International parallel and distributed processing symposium (IPDPS). https://doi.org/10.1109/IPDPS.2012.54
Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM, Sahay V, Kambadur P, Barrett B, Lumsdaine A (2004) Open MPI: goals, concept, and design of a next generation MPI implementation. In: Recent advances in parallel virtual machine and message passing interface
Google Scholar
Intel (2012) 2nd generation Intel core processor family. Tech. Rep., September
Google Scholar
Jin H, Frumkin M, Yan J (1999) The OpenMP implementation of NAS parallel benchmarks and its performance. Tech. Rep., October, NASA
Google Scholar
LaRowe RP, Holliday MA, Ellis CS (1992) An analysis of dynamic page placement on a NUMA multiprocessor. ACM SIGMETRICS Perform Eval Rev 20(1):23–34
Google Scholar
Marathe J, Thakkar V, Mueller F (2010) Feedback-directed page placement for ccNUMA via hardware-generated memory traces. J Parallel Distrib Comput 70(12):1204–1219
Google Scholar
Martin MMK, Hill MD, Sorin DJ (2012) Why on-chip cache coherence is here to stay. Commun ACM 55(7):78. https://doi.org/10.1145/2209249.2209269
OpenMP (2013) OpenMP application program interface. Tech. Rep., July
Google Scholar
Tolentino M, Cameron K (2012) The optimist, the pessimist, and the global race to exascale in 20 megawatts. IEEE Comput 45(1):95–97
Google Scholar
Torrellas J (2009) Architectures for extreme-scale computing. IEEE Comput 42(11):28–35
Google Scholar
Verghese B, Devine S, Gupta A, Rosenblum M (1996) OS support for improving data locality on CC-NUMA compute servers. Tech. Rep., February
Google Scholar

Download references

Author information

Authors and Affiliations

Federal Institute of Parana (IFPR), Paranavai, Parana, Brazil
Eduardo H. M. Cruz
University of Illinois at Urbana-Champaign, Urbana, IL, USA
Matthias Diener
Informatics Institute, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil
Philippe O. A. Navaux

Authors

Eduardo H. M. Cruz
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Diener
View author publications
You can also search for this author in PubMed Google Scholar
Philippe O. A. Navaux
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

H. M. Cruz, E., Diener, M., O. A. Navaux, P. (2018). Introduction. In: Thread and Data Mapping for Multicore Systems. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-91074-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-91074-1_1
Published: 05 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91073-4
Online ISBN: 978-3-319-91074-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics