Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2503210.2503269acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Detection of false sharing using machine learning

Published: 17 November 2013 Publication History

Abstract

False sharing is a major class of performance bugs in parallel applications. Detecting false sharing is difficult as it does not change the program semantics. We introduce an efficient and effective approach for detecting false sharing based on machine learning.
We develop a set of mini-programs in which false sharing can be turned on and off. We then run the mini-programs both with and without false sharing, collect a set of hardware performance event counts and use the collected data to train a classifier. We can use the trained classifier to analyze data from arbitrary programs for detection of false sharing.
Experiments with the PARSEC and Phoenix benchmarks show that our approach is indeed effective. We detect published false sharing regions in the benchmarks with zero false positives. Our performance penalty is less than 2%. Thus, we believe that this is an effective and practical method for detecting false sharing.

References

[1]
L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey and N. R. Tallent, HPCToolkit: Tools for performance analysis of optimized parallel programs, In Concurrency and Computation: Practice and Experience, 22(6):685--701, John Wiley, 2010.
[2]
R. Azimi, M. Stumm and R. Wisniewski, Online performance analysis by statistical sampling of microprocessor performance counters, In Proceedings of 19th International Conference on Supercomputing (ICS'05), pages 101--110, ACM, 2005.
[3]
E. Berger, K. McKinley, R. Blumofe, and P. Wilson,. Hoard: A scalable memory allocator for multithreaded applications, ACM SIGPLAN Notices, 35(11):117--128, 2000.
[4]
C. Bienia and K. Li. PARSEC 2.0: A new benchmark suite for chip-multiprocessors, In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, 2009.
[5]
W. J. Bolosky and M. L. Scott, False Sharing and its Effect on Shared Memory Performance, In Proceedings of the USENIX Symposium on Experiences with Distributed and Multiprocessor Systems, pp. 57--71, 1993.
[6]
S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci, A Portable Programming Interface for Performance Evaluation on Modern Processors, International Journal of High Performance Computing Applications, Volume 14 Issue 3, pages 189--204, Sage Publications, 2000. (http://icl.cs.utk.edu/papi/)
[7]
M. Burtscher, B.-D. Kim, J. Diamond, J. McCalpin, L. Koesterke, and J. Browne, Perfexpert: An easy-to-use performance diagnosis tool for hpc applications, In Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--11, IEEE Computer Society, 2010.
[8]
J.-H. Chow and V. Sarkar, False sharing elimination by selection of runtime scheduling parameters. In Proceedings of the international Conference on Parallel Processing (ICPP '97), pages 396--403, IEEE Computer Society, 1997.
[9]
S. Eranian, Perfmon2: a flexible performance monitoring interface for Linux, In Proceedings of the 2006 Linux Symposium, Vol. I, pp. 269--288, (http://perfmon2.sourceforge.net)
[10]
S. Eranian, What can performance counters do for memory subsystem analysis?, In Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with ASPLOS '08, pp. 26--30, ACM, 2008.
[11]
M. Gerndt and M. Ott, Automatic performance analysis with periscope, In Concurrency and Computation: Practice and Experience, 22(6):736--748, John Wiley, 2010.
[12]
S. M. Gunther and J. Weidendorfer, Assessing Cache False Sharing Effects by Dynamic Binary Instrumentation, In Proceedings of the Workshop on Binary Instrumentation and Applications (WBIA'09), pages 26--33, ACM, 2009.
[13]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, The WEKA Data Mining Software: An Update, ACM SIGKDD Explorations Newsletter, Volume 11 Issue 1, pages 10--18, ACM, 2009.
[14]
R. L. Hyde and B. D. Fleisch, An analysis of degenerate sharing and false coherence, Journal of Parallel and Distributed Computing, 34(2):183--195, 1996.
[15]
Intel Corporation, Avoiding and identifying false sharing among threads, http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads, January 2013.
[16]
Intel Corporation, Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide Part 2, http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html?wapkw=software+developer%E2%80%99s+manual+volume+3b, 2012.
[17]
Intel Corporation, Intel® Performance Tuning Utility 4.0 User Guide, 2011.
[18]
T. Jeremiassen and S. Eggers, Reducing false sharing on shared memory multiprocessors through compile time data transformations, ACM SIGPLAN Notices, 30(8):179--188, 1995.
[19]
V. Khera, R. P. LaRowe and C. S. Ellis, An Architecture-Independent Analysis of False Sharing, Technical Report TR-CS-1993-13, Duke University, 1993.
[20]
D. Levinthal, Performance Analysis Guide for Intel® Core#8482; i7 Processor and Intel® Xeon#8482; 5500 processors, Intel Corporation, http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf, 2009.
[21]
T. Liu and E. D. Berger, SHERIFF: Precise Detection and Automatic Mitigation of False Sharing, In Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications (OOPSLA'11), pages 3--18, ACM, 2011.
[22]
E. Ould-Ahmed-Vall, J. Woodlee, C. Yount, K. Doshi, and S. Abraham, Using model trees for computer architecture performance analysis of software applications, In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS'07), 116--125, IEEE, 2007.
[23]
J. R. Quinlan, C4. 5: Programs for Machine Learning, Morgan Kaufmann, 1992.
[24]
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis, Evaluating MapReduce for Multi-core and Multiprocessor Systems, In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA'07), pages 13--24, IEEE Computer Society, 2007.
[25]
F. T. Schneider, M. Payer, and T. R. Gross, Online optimizations driven by hardware performance monitoring, In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation (PLDI'07), pages 373--382, ACM, 2007.
[26]
K. Shen, M. Zhong, S. Dwarkadas, C. Li, C. Stewart, and X. Zhang, Hardware counter driven on-the-fly request signatures, In the Proceedings of the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS'08), pages 189--200, ACM, 2008.
[27]
J. Tao and W. Karl, CacheIn: A Toolset for Comprehensive Cache Inspection, In Proceedings of the 5th international conference on Computational Science (ICCS'05) - Volume Part II, 2005, pages 174--181, Springer-Verlag, 2005.
[28]
V. Weaver, The Unofficial Linux Perf Events Web-Page, http://web.eece.maine.edu/~vweaver/projects/perf_events/, February 2013.
[29]
J. Weidendorfer, M. Ott, T. Klug and C. Trinitis, Latencies of conflicting writes on contemporary multicore architectures, In Proceedings of the 9th international conference on Parallel Computing Technologies (PaCT 2007), pages 318--327, Springer-Verlag, 2007.
[30]
R. M. Yoo, A. Romano, and C. Kozyrakis, Phoenix Rebirth: Scalable MapReduce on a Large-Scale Shared-Memory System, In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC'09), pages 198--207, IEEE Computer Society, 2009.
[31]
W. Yoo, K. Larson, L. Baugh, S. Kim and R. H. Campbell, ADP: Automated diagnosis of performance pathologoes using hardware events, In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems, pages 283--294, ACM, 2012.
[32]
Q. Zhao, D. Bruening and S. Amarasinghe, Umbra: efficient and scalable memory shadowing, In Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization (CGO'10), pages 22--31, ACM, 2010.
[33]
Q. Zhao, D. Koh, S. Raza, D. Bruening, W. Wong and S. Amarasinghe, Dynamic Cache Contention Detection in Multi-threaded Applications, In Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments (VEE'11), pages 27--38, ACM, 2011.

Cited By

View all
  • (2022)Raptor: Mitigating CPU-GPU False Sharing Under Unified Memory Systems2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)10.1109/IGSC55832.2022.9969376(1-8)Online publication date: 24-Oct-2022
  • (2022)Optimal Launch Bound Selection in CPU-GPU Hybrid Graph Applications with Deep Learning2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)10.1109/IGSC55832.2022.9969364(1-7)Online publication date: 24-Oct-2022
  • (2021)HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)10.1109/MLHPC54614.2021.00012(69-80)Online publication date: Nov-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
  • General Chair:
  • William Gropp,
  • Program Chair:
  • Satoshi Matsuoka
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. false sharing
  2. machine learning
  3. performance events

Qualifiers

  • Research-article

Funding Sources

Conference

SC13
Sponsor:

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)9
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Raptor: Mitigating CPU-GPU False Sharing Under Unified Memory Systems2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)10.1109/IGSC55832.2022.9969376(1-8)Online publication date: 24-Oct-2022
  • (2022)Optimal Launch Bound Selection in CPU-GPU Hybrid Graph Applications with Deep Learning2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)10.1109/IGSC55832.2022.9969364(1-7)Online publication date: 24-Oct-2022
  • (2021)HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)10.1109/MLHPC54614.2021.00012(69-80)Online publication date: Nov-2021
  • (2019)Huron: hybrid false sharing detection and repairProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314644(453-468)Online publication date: 8-Jun-2019
  • (2019)Parallelism-centric what-if and differential analysesProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314621(485-501)Online publication date: 8-Jun-2019
  • (2019)Using Differential Execution Analysis to Identify Thread InterferenceIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.292748130:12(2866-2878)Online publication date: 1-Dec-2019
  • (2019)PerfMemPlus: A Tool for Automatic Discovery of Memory Performance ProblemsHigh Performance Computing10.1007/978-3-030-20656-7_11(209-226)Online publication date: 17-May-2019
  • (2018)Featherlight on-the-fly false-sharing detectionACM SIGPLAN Notices10.1145/3200691.317849953:1(152-167)Online publication date: 10-Feb-2018
  • (2018)Featherlight on-the-fly false-sharing detectionProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178499(152-167)Online publication date: 10-Feb-2018
  • (2017)TMIProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123947(639-650)Online publication date: 14-Oct-2017
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media