Nothing Special   »   [go: up one dir, main page]

Skip to main content

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

  • 520 Accesses

Abstract

Since the beginning of the information era, the demand for computing power has been unstoppable. Whenever the technology advances enough to fulfill the needs of a time, new and more complex problems arise, such that the technology is again insufficient to solve them. In the past, the increase of the performance happened mainly due to instruction level parallelism (ILP), with the introduction of several pipeline stages, out-of-order and speculative execution. The increase of the clock rate frequency was also a important way to improve performance. However, the available ILP exploited by compilers and architectures is reaching its limits (Caparros Cabezas and Stanley-Marbell, Parallelism and data movement characterization of contemporary application classes. In: ACM symposium on parallelism in algorithms and architectures (SPAA), 2011). The increase of clock frequency is also reaching its limits because it raises the energy consumption, which is an important issue for current and future architectures (Tolentino and Cameron, IEEE Comput 45(1):95–97, 2012).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

eBook
USD 15.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 15.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Azimi R, Tam DK, Soares L, Stumm M (2009) Enhancing Operating system support for multicore processors by using hardware performance monitoring. ACM SIGOPS Oper Syst Rev 43(2):56–65. https://doi.org/10.1145/1531793.1531803

  • Bach M, Charney M, Cohn R, Demikhovsky E, Devor T, Hazelwood K, Jaleel A, Luk CK, Lyons G, Patil H, Tal A (2010) Analyzing parallel programs with pin. IEEE Comput 43(3):34–41

    Google Scholar 

  • Barrow-Williams N, Fensch C, Moore S (2009) A communication characterisation of splash-2 and parsec. In: IEEE international symposium on workload characterization (IISWC), pp 86–97. https://doi.org/10.1109/IISWC.2009.5306792

  • Bienia C, Kumar S, Singh JP, Li K (2008b) The PARSEC benchmark suite: characterization and architectural implications. In: International conference on parallel architectures and compilation techniques (PACT), pp 72–81

    Google Scholar 

  • Borkar S, Chien AA (2011) The future of microprocessors. Commun ACM 54(5):67–77

    Google Scholar 

  • Caparros Cabezas V, Stanley-Marbell P (2011) Parallelism and data movement characterization of contemporary application classes. In: ACM symposium on parallelism in algorithms and architectures (SPAA)

    Google Scholar 

  • Casavant TL, Kuhl JG (1988) A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans Softw Eng 14(2):141–154

    Google Scholar 

  • Chishti Z, Powell MD, Vijaykumar TN (2005) Optimizing replication, communication, and capacity allocation in CMPs. ACM SIGARCH Comput Archit News 33(2):357–368. https://doi.org/10.1145/1080695.1070001

  • Corbet J (2012a) AutoNUMA: the other approach to NUMA scheduling. http://lwn.net/Articles/488709/

  • Corbet J (2012b) Toward better NUMA scheduling. http://lwn.net/Articles/486858/

  • Coteus PW, Knickerbocker JU, Lam CH, Vlasov YA (2011) Technologies for exascale systems. IBM J Res Develop 55(5):14:1–14:12. https://doi.org/10.1147/JRD.2011.2163967

  • Cruz EHM, Diener M, Pilla LL, Navaux POA (2016c) Hardware-assisted thread and data mapping in hierarchical multicore architectures. ACM Trans Archit Code Optim 13(3):1–25. https://doi.org/10.1145/2975587

  • Dashti M, Fedorova A, Funston J, Gaud F, Lachaize R, Lepers B, Quéma V, Roth M (2013) Traffic management: a holistic approach to memory placement on NUMA systems. In: Architectural support for programming languages and operating systems (ASPLOS), pp 381–393

    Google Scholar 

  • Diener M, Cruz EHM, Navaux POA, Busse A, HeißHU (2014) kMAF: automatic kernel-level management of thread and data affinity. In: International conference on parallel architectures and compilation techniques (PACT), pp 277–288

    Google Scholar 

  • Diener M, Cruz EHM, Navaux POA, Busse A, HeißHU (2015a) Communication-aware process and thread mapping using online communication detection. Parallel Comput 43(March):43–63

    Google Scholar 

  • Diener M, Cruz EHM, Pilla LL, Dupros F, Navaux POA (2015b) Characterizing communication and page usage of parallel applications for thread and data mapping. Perform Eval 88–89(June):18–36

    Google Scholar 

  • Feliu J, Sahuquillo J, Petit S, Duato J (2012) Understanding cache hierarchy contention in CMPs to improve job scheduling. In: International parallel and distributed processing symposium (IPDPS). https://doi.org/10.1109/IPDPS.2012.54

  • Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM, Sahay V, Kambadur P, Barrett B, Lumsdaine A (2004) Open MPI: goals, concept, and design of a next generation MPI implementation. In: Recent advances in parallel virtual machine and message passing interface

    Google Scholar 

  • Intel (2012) 2nd generation Intel core processor family. Tech. Rep., September

    Google Scholar 

  • Jin H, Frumkin M, Yan J (1999) The OpenMP implementation of NAS parallel benchmarks and its performance. Tech. Rep., October, NASA

    Google Scholar 

  • LaRowe RP, Holliday MA, Ellis CS (1992) An analysis of dynamic page placement on a NUMA multiprocessor. ACM SIGMETRICS Perform Eval Rev 20(1):23–34

    Google Scholar 

  • Marathe J, Thakkar V, Mueller F (2010) Feedback-directed page placement for ccNUMA via hardware-generated memory traces. J Parallel Distrib Comput 70(12):1204–1219

    Google Scholar 

  • Martin MMK, Hill MD, Sorin DJ (2012) Why on-chip cache coherence is here to stay. Commun ACM 55(7):78. https://doi.org/10.1145/2209249.2209269

  • OpenMP (2013) OpenMP application program interface. Tech. Rep., July

    Google Scholar 

  • Tolentino M, Cameron K (2012) The optimist, the pessimist, and the global race to exascale in 20 megawatts. IEEE Comput 45(1):95–97

    Google Scholar 

  • Torrellas J (2009) Architectures for extreme-scale computing. IEEE Comput 42(11):28–35

    Google Scholar 

  • Verghese B, Devine S, Gupta A, Rosenblum M (1996) OS support for improving data locality on CC-NUMA compute servers. Tech. Rep., February

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 The Author(s), under exclusive licence to Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

H. M. Cruz, E., Diener, M., O. A. Navaux, P. (2018). Introduction. In: Thread and Data Mapping for Multicore Systems. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-91074-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91074-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91073-4

  • Online ISBN: 978-3-319-91074-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics