research-article

Producing wrong data without doing anything obviously wrong!

Authors:

Todd Mytkowicz,

Matthias Hauswirth,

Peter F. SweeneyAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 37, Issue 1

Pages 265 - 276

https://doi.org/10.1145/2528521.1508275

Published: 07 March 2009 Publication History

Abstract

This paper presents a surprising result: changing a seemingly innocuous aspect of an experimental setup can cause a systems researcher to draw wrong conclusions from an experiment. What appears to be an innocuous aspect in the experimental setup may in fact introduce a significant bias in an evaluation. This phenomenon is called measurement bias in the natural and social sciences.

Our results demonstrate that measurement bias is significant and commonplace in computer system evaluation. By significant we mean that measurement bias can lead to a performance analysis that either over-states an effect or even yields an incorrect conclusion. By commonplace we mean that measurement bias occurs in all architectures that we tried (Pentium 4, Core 2, and m5 O3CPU), both compilers that we tried (gcc and Intel's C compiler), and most of the SPEC CPU2006 C programs. Thus, we cannot ignore measurement bias. Nevertheless, in a literature survey of 133 recent papers from ASPLOS, PACT, PLDI, and CGO, we determined that none of the papers with experimental results adequately consider measurement bias.

Inspired by similar problems and their solutions in other sciences, we describe and demonstrate two methods, one for detecting (causal analysis) and one for avoiding (setup randomization) measurement bias.

References

[1]

Alaa R. Alameldeen and David A. Wood. Variability in architectural simulations of multi-threaded workloads. In IEEE HPCA,pages 7--18, 2003.

Digital Library

[2]

Nathan L. Binkert, Ronald G. Dreslinski, Lisa R. Hsu, Kevin T. Lim, Ali G. Saidi, and Steven K. Reinhardt. The m5 simulator: Modeling networked systems. IEEE Micro, 26(4):52--60, 2006.

Digital Library

[3]

S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA, New York, NY, USA, October 2006. ACM Press.

Digital Library

[4]

Stephen M Blackburn, Perry Cheng, and Kathryn S McKinley. Myths and realities: The performance impact of garbage collection. In SIGMETRICS, pages 25--36. ACMPress, 2004.

Digital Library

[5]

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In SC, Dallas, Texas, November 2000.

Digital Library

[6]

Amer Diwan, Han Lee, Dirk Grunwald, and Keith Farkas. Energy consumption and garbage collection in low-powered computing. Technical Report CU-CS-930-02, University of Colorado, 1992.

[7]

Andy Georges, Dries Buytaert, and Lieven Eeckhout. Statistically rigorous Java performance evaluation. In OOPSLA, 2007.

Digital Library

[8]

Intel. Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide. http://www.intel.com/products/processor/manuals/. Order number: 253669--027US, July 2008.

[9]

John P. A. Ioannidis. Contradicted and initially stronger effects in highly cited clinical research. The journal of the American Medical Association (JAMA), 294:218--228, 2005.

[10]

Sam Kash Kachigan. Statistical Analysis: An Interdisciplinary Introduction to Univariate & Multivariate Methods. Radius Press, 1986.

[11]

Tomas Kalibera, Lubomir Bulej, and Petr Tuma. Benchmark precision and random initial state. In Proceedings of the 2005 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS 2005), pages 484--490, San Diego, CA, USA, 2005. SCS.

[12]

W. Korn, P. J. Teller, and G. Castillo. Just how accurate are performance counters? In Proceedings of the IEEE International Conference on Performance, Computing, and Communications (IPCCC'01), pages 303--310, 2001.

[13]

Han Lee, Daniel von Dincklage, Amer Diwan, and J. Eliot B. Moss. Understanding the behavior of compiler optimizations. Softw. Pract. Exper., 36(8):835--844, 2006.

Digital Library

[14]

M. Maxwell, P. Teller, L. Salayandia, and S.Moore. Accuracy of performance monitoring hardware. In Proceedings of the Los Alamos Computer Science Institute Symposium (LACSI'02), October 2002.

[15]

Shirley V. Moore. A comparison of counting and sampling modes of using performance monitoring hardware. In Proceedings of the International Conference on Computational Science-Part II (ICCS'02), pages 904--912, London, UK, 2002. Springer-Verlag.

Digital Library

[16]

Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 1st edition, 2000. Standard Performance Evaluation Corporation. SPEC CPU2006 Benchmarks. http://www.spec.org/cpu2006/.

Digital Library

[17]

Dan Tsafrir, Keren Ouaknine, and Dror G. Feitelson. Reducing performance evaluation sensitivity and variability by input shaking. In MASCOTS, 2007.

Digital Library

Cited By

Grayson SAguilar FMilewicz RKatz DMarinov D(2024)A benchmark suite and performance analysis of user-space provenance collectorsProceedings of the 2nd ACM Conference on Reproducibility and Replicability10.1145/3641525.3663627(85-95)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3641525.3663627
Abdelhafez HHalawa HAlmoallim AAhmadi APattabiraman KRipeanu M(2022)Characterizing Variability in Heterogeneous Edge Systems: A Methodology & Case Study2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00016(107-121)Online publication date: Dec-2022
https://doi.org/10.1109/SEC54971.2022.00016
Bokhari MAlexander B(2022)A Hybrid Distributed EA Approach for Energy Optimisation on SmartphonesEmpirical Software Engineering10.1007/s10664-022-10188-527:6Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1007/s10664-022-10188-5
Show More Cited By

Index Terms

Producing wrong data without doing anything obviously wrong!
1. General and reference
  1. Cross-computing tools and techniques
    1. Design

Recommendations

Producing wrong data without doing anything obviously wrong!
ASPLOS 2009

This paper presents a surprising result: changing a seemingly innocuous aspect of an experimental setup can cause a systems researcher to draw wrong conclusions from an experiment. What appears to be an innocuous aspect in the experimental setup may in ...
Producing wrong data without doing anything obviously wrong!
ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

This paper presents a surprising result: changing a seemingly innocuous aspect of an experimental setup can cause a systems researcher to draw wrong conclusions from an experiment. What appears to be an innocuous aspect in the experimental setup may in ...
Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery
MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture

Control and data speculation are widely used to improve processor performance. Correct speculation can reduce execution time, but incorrect speculation can lead to increased execution time and greater energy consumption. This paper proposes a mechanism ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 37, Issue 1

ASPLOS 2009

March 2009

346 pages

ISSN:0163-5964

DOI:10.1145/2528521

Issue’s Table of Contents

ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
March 2009
358 pages
ISBN:9781605584065
DOI:10.1145/1508244
General Chair:
Mary Lou Soffa
University of Virginia, USA
,
Program Chair:
Mary Jane Irwin
Penn State University, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2009

Published in SIGARCH Volume 37, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

270
Total Citations
View Citations
5,834
Total Downloads

Downloads (Last 12 months)391
Downloads (Last 6 weeks)63

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Grayson SAguilar FMilewicz RKatz DMarinov D(2024)A benchmark suite and performance analysis of user-space provenance collectorsProceedings of the 2nd ACM Conference on Reproducibility and Replicability10.1145/3641525.3663627(85-95)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3641525.3663627
Abdelhafez HHalawa HAlmoallim AAhmadi APattabiraman KRipeanu M(2022)Characterizing Variability in Heterogeneous Edge Systems: A Methodology & Case Study2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00016(107-121)Online publication date: Dec-2022
https://doi.org/10.1109/SEC54971.2022.00016
Bokhari MAlexander B(2022)A Hybrid Distributed EA Approach for Energy Optimisation on SmartphonesEmpirical Software Engineering10.1007/s10664-022-10188-527:6Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1007/s10664-022-10188-5
Calzarossa MMassari LTessera DBourcier JJiang ZBezemer CCortellessa V(2021)Performance Monitoring GuidelinesCompanion of the ACM/SPEC International Conference on Performance Engineering10.1145/3447545.3451195(109-114)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3447545.3451195
Fursin G(2021)Collective knowledge: organizing research projects as a database of reusable components and portable workflows with common interfacesPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences10.1098/rsta.2020.0211379:2197Online publication date: 29-Mar-2021
https://doi.org/10.1098/rsta.2020.0211
Alam MSingh ABhattacharya SPratihar KMukhopadhyay D(2020)In-situ Extraction of Randomness from Computer Architecture Through Hardware Performance CountersSmart Card Research and Advanced Applications10.1007/978-3-030-42068-0_1(3-19)Online publication date: 9-Mar-2020
https://doi.org/10.1007/978-3-030-42068-0_1
Beyer DLöwe SWendler P(2019)Reliable benchmarking: requirements and solutionsInternational Journal on Software Tools for Technology Transfer (STTT)10.1007/s10009-017-0469-y21:1(1-29)Online publication date: 6-Feb-2019
https://dl.acm.org/doi/10.1007/s10009-017-0469-y
Gramoli V(2017)The Information Needed for Reproducing Shared Memory ExperimentsEuro-Par 2016: Parallel Processing Workshops10.1007/978-3-319-58943-5_48(596-608)Online publication date: 28-May-2017
https://doi.org/10.1007/978-3-319-58943-5_48
Brünink MRosenblum DZimmermann TCleland-Huang JSu Z(2016)Mining performance specificationsProceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering10.1145/2950290.2950314(39-49)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1145/2950290.2950314
Moreno MSantos RLima GMoreno MSoares LOssowski S(2016)Deepening the separation of concerns in the implementation of multimedia systemsProceedings of the 31st Annual ACM Symposium on Applied Computing10.1145/2851613.2851769(1337-1343)Online publication date: 4-Apr-2016
https://dl.acm.org/doi/10.1145/2851613.2851769
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents