research-article

Detection of false sharing using machine learning

Authors:

Sanath Jayasena,

Saman Amarasinghe,

Asanka Abeyweera,

Gayashan Amarasinghe,

Himeshi De Silva,

Sunimal Rathnayake,

Yanbin LiuAuthors Info & Claims

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Article No.: 30, Pages 1 - 9

https://doi.org/10.1145/2503210.2503269

Published: 17 November 2013 Publication History

Abstract

False sharing is a major class of performance bugs in parallel applications. Detecting false sharing is difficult as it does not change the program semantics. We introduce an efficient and effective approach for detecting false sharing based on machine learning.

We develop a set of mini-programs in which false sharing can be turned on and off. We then run the mini-programs both with and without false sharing, collect a set of hardware performance event counts and use the collected data to train a classifier. We can use the trained classifier to analyze data from arbitrary programs for detection of false sharing.

Experiments with the PARSEC and Phoenix benchmarks show that our approach is indeed effective. We detect published false sharing regions in the benchmarks with zero false positives. Our performance penalty is less than 2%. Thus, we believe that this is an effective and practical method for detecting false sharing.

References

[1]

L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey and N. R. Tallent, HPCToolkit: Tools for performance analysis of optimized parallel programs, In Concurrency and Computation: Practice and Experience, 22(6):685--701, John Wiley, 2010.

Digital Library

[2]

R. Azimi, M. Stumm and R. Wisniewski, Online performance analysis by statistical sampling of microprocessor performance counters, In Proceedings of 19^th International Conference on Supercomputing (ICS'05), pages 101--110, ACM, 2005.

Digital Library

[3]

E. Berger, K. McKinley, R. Blumofe, and P. Wilson,. Hoard: A scalable memory allocator for multithreaded applications, ACM SIGPLAN Notices, 35(11):117--128, 2000.

Digital Library

[4]

C. Bienia and K. Li. PARSEC 2.0: A new benchmark suite for chip-multiprocessors, In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, 2009.

[5]

W. J. Bolosky and M. L. Scott, False Sharing and its Effect on Shared Memory Performance, In Proceedings of the USENIX Symposium on Experiences with Distributed and Multiprocessor Systems, pp. 57--71, 1993.

Digital Library

[6]

S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci, A Portable Programming Interface for Performance Evaluation on Modern Processors, International Journal of High Performance Computing Applications, Volume 14 Issue 3, pages 189--204, Sage Publications, 2000. (http://icl.cs.utk.edu/papi/)

Digital Library

[7]

M. Burtscher, B.-D. Kim, J. Diamond, J. McCalpin, L. Koesterke, and J. Browne, Perfexpert: An easy-to-use performance diagnosis tool for hpc applications, In Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--11, IEEE Computer Society, 2010.

Digital Library

[8]

J.-H. Chow and V. Sarkar, False sharing elimination by selection of runtime scheduling parameters. In Proceedings of the international Conference on Parallel Processing (ICPP '97), pages 396--403, IEEE Computer Society, 1997.

Digital Library

[9]

S. Eranian, Perfmon2: a flexible performance monitoring interface for Linux, In Proceedings of the 2006 Linux Symposium, Vol. I, pp. 269--288, (http://perfmon2.sourceforge.net)

[10]

S. Eranian, What can performance counters do for memory subsystem analysis?, In Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with ASPLOS '08, pp. 26--30, ACM, 2008.

Digital Library

[11]

M. Gerndt and M. Ott, Automatic performance analysis with periscope, In Concurrency and Computation: Practice and Experience, 22(6):736--748, John Wiley, 2010.

Digital Library

[12]

S. M. Gunther and J. Weidendorfer, Assessing Cache False Sharing Effects by Dynamic Binary Instrumentation, In Proceedings of the Workshop on Binary Instrumentation and Applications (WBIA'09), pages 26--33, ACM, 2009.

Digital Library

[13]

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, The WEKA Data Mining Software: An Update, ACM SIGKDD Explorations Newsletter, Volume 11 Issue 1, pages 10--18, ACM, 2009.

Digital Library

[14]

R. L. Hyde and B. D. Fleisch, An analysis of degenerate sharing and false coherence, Journal of Parallel and Distributed Computing, 34(2):183--195, 1996.

Digital Library

[15]

Intel Corporation, Avoiding and identifying false sharing among threads, http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads, January 2013.

[16]

Intel Corporation, Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide Part 2, http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html?wapkw=software+developer%E2%80%99s+manual+volume+3b, 2012.

[17]

Intel Corporation, Intel® Performance Tuning Utility 4.0 User Guide, 2011.

[18]

T. Jeremiassen and S. Eggers, Reducing false sharing on shared memory multiprocessors through compile time data transformations, ACM SIGPLAN Notices, 30(8):179--188, 1995.

Digital Library

[19]

V. Khera, R. P. LaRowe and C. S. Ellis, An Architecture-Independent Analysis of False Sharing, Technical Report TR-CS-1993-13, Duke University, 1993.

Digital Library

[20]

D. Levinthal, Performance Analysis Guide for Intel® Core#8482; i7 Processor and Intel® Xeon#8482; 5500 processors, Intel Corporation, http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf, 2009.

[21]

T. Liu and E. D. Berger, SHERIFF: Precise Detection and Automatic Mitigation of False Sharing, In Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications (OOPSLA'11), pages 3--18, ACM, 2011.

Digital Library

[22]

E. Ould-Ahmed-Vall, J. Woodlee, C. Yount, K. Doshi, and S. Abraham, Using model trees for computer architecture performance analysis of software applications, In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS'07), 116--125, IEEE, 2007.

[23]

J. R. Quinlan, C4. 5: Programs for Machine Learning, Morgan Kaufmann, 1992.

Digital Library

[24]

C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis, Evaluating MapReduce for Multi-core and Multiprocessor Systems, In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA'07), pages 13--24, IEEE Computer Society, 2007.

Digital Library

[25]

F. T. Schneider, M. Payer, and T. R. Gross, Online optimizations driven by hardware performance monitoring, In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation (PLDI'07), pages 373--382, ACM, 2007.

Digital Library

[26]

K. Shen, M. Zhong, S. Dwarkadas, C. Li, C. Stewart, and X. Zhang, Hardware counter driven on-the-fly request signatures, In the Proceedings of the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS'08), pages 189--200, ACM, 2008.

Digital Library

[27]

J. Tao and W. Karl, CacheIn: A Toolset for Comprehensive Cache Inspection, In Proceedings of the 5th international conference on Computational Science (ICCS'05) - Volume Part II, 2005, pages 174--181, Springer-Verlag, 2005.

Digital Library

[28]

V. Weaver, The Unofficial Linux Perf Events Web-Page, http://web.eece.maine.edu/~vweaver/projects/perf_events/, February 2013.

[29]

J. Weidendorfer, M. Ott, T. Klug and C. Trinitis, Latencies of conflicting writes on contemporary multicore architectures, In Proceedings of the 9^th international conference on Parallel Computing Technologies (PaCT 2007), pages 318--327, Springer-Verlag, 2007.

Digital Library

[30]

R. M. Yoo, A. Romano, and C. Kozyrakis, Phoenix Rebirth: Scalable MapReduce on a Large-Scale Shared-Memory System, In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC'09), pages 198--207, IEEE Computer Society, 2009.

Digital Library

[31]

W. Yoo, K. Larson, L. Baugh, S. Kim and R. H. Campbell, ADP: Automated diagnosis of performance pathologoes using hardware events, In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems, pages 283--294, ACM, 2012.

Digital Library

[32]

Q. Zhao, D. Bruening and S. Amarasinghe, Umbra: efficient and scalable memory shadowing, In Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization (CGO'10), pages 22--31, ACM, 2010.

Digital Library

[33]

Q. Zhao, D. Koh, S. Raza, D. Bruening, W. Wong and S. Amarasinghe, Dynamic Cache Contention Detection in Multi-threaded Applications, In Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments (VEE'11), pages 27--38, ACM, 2011.

Digital Library

Cited By

Haque Rafi MWilliams KQasem A(2022)Raptor: Mitigating CPU-GPU False Sharing Under Unified Memory Systems2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)10.1109/IGSC55832.2022.9969376(1-8)Online publication date: 24-Oct-2022
https://doi.org/10.1109/IGSC55832.2022.9969376
Rafi MQasem A(2022)Optimal Launch Bound Selection in CPU-GPU Hybrid Graph Applications with Deep Learning2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)10.1109/IGSC55832.2022.9969364(1-7)Online publication date: 24-Oct-2022
https://doi.org/10.1109/IGSC55832.2022.9969364
Liao CLin PVerma GVanderbruggen TEmani MNan ZShen X(2021)HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)10.1109/MLHPC54614.2021.00012(69-80)Online publication date: Nov-2021
https://doi.org/10.1109/MLHPC54614.2021.00012
Show More Cited By

Index Terms

Detection of false sharing using machine learning
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies
2. General and reference
  1. Cross-computing tools and techniques

Recommendations

Huron: hybrid false sharing detection and repair
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

Writing efficient multithreaded code that can leverage the full parallelism of underlying hardware is difficult. A key impediment is insidious cache contention issues, such as false sharing. False sharing occurs when multiple threads from different ...
Cheetah: detecting false sharing efficiently and effectively
CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization

False sharing is a notorious performance problem that may occur in multithreaded programs when they are running on ubiquitous multicore hardware. It can dramatically degrade the performance by up to an order of magnitude, significantly hurting the ...
Featherlight on-the-fly false-sharing detection
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Shared-memory parallel programs routinely suffer from false sharing---a performance degradation caused by different threads accessing different variables that reside on the same CPU cacheline and at least one variable is modified. State-of-the-art tools ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

November 2013

1123 pages

ISBN:9781450323789

DOI:10.1145/2503210

General Chair:
William Gropp
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Satoshi Matsuoka
Tokyo Institute of Technology, Tokyo, Japan

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SC13

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC13: International Conference for High Performance Computing, Networking, Storage and Analysis

November 17 - 21, 2013

Colorado, Denver

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
477
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)9

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Haque Rafi MWilliams KQasem A(2022)Raptor: Mitigating CPU-GPU False Sharing Under Unified Memory Systems2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)10.1109/IGSC55832.2022.9969376(1-8)Online publication date: 24-Oct-2022
https://doi.org/10.1109/IGSC55832.2022.9969376
Rafi MQasem A(2022)Optimal Launch Bound Selection in CPU-GPU Hybrid Graph Applications with Deep Learning2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)10.1109/IGSC55832.2022.9969364(1-7)Online publication date: 24-Oct-2022
https://doi.org/10.1109/IGSC55832.2022.9969364
Liao CLin PVerma GVanderbruggen TEmani MNan ZShen X(2021)HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)10.1109/MLHPC54614.2021.00012(69-80)Online publication date: Nov-2021
https://doi.org/10.1109/MLHPC54614.2021.00012
Khan TZhao YPokam GMozafari BKasikci BMcKinley KFisher K(2019)Huron: hybrid false sharing detection and repairProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314644(453-468)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314644
Yoga ANagarakatte SMcKinley KFisher K(2019)Parallelism-centric what-if and differential analysesProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314621(485-501)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314621
Bouksiaa MTrahay FLescouet AVoron GDulong RGuermouche ABrunet EThomas G(2019)Using Differential Execution Analysis to Identify Thread InterferenceIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.292748130:12(2866-2878)Online publication date: 1-Dec-2019
https://doi.org/10.1109/TPDS.2019.2927481
Helm CTaura K(2019)PerfMemPlus: A Tool for Automatic Discovery of Memory Performance ProblemsHigh Performance Computing10.1007/978-3-030-20656-7_11(209-226)Online publication date: 17-May-2019
https://doi.org/10.1007/978-3-030-20656-7_11
Chabbi MWen SLiu X(2018)Featherlight on-the-fly false-sharing detectionACM SIGPLAN Notices10.1145/3200691.317849953:1(152-167)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178499
Chabbi MWen SLiu XKrall AGross T(2018)Featherlight on-the-fly false-sharing detectionProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178499(152-167)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3178487.3178499
DeLozier CEizenberg AHu SPokam GDevietti JHunter HMoreno JEmer JSanchez D(2017)TMIProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123947(639-650)Online publication date: 14-Oct-2017
https://dl.acm.org/doi/10.1145/3123939.3123947
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents