Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Producing wrong data without doing anything obviously wrong!

Published: 07 March 2009 Publication History

Abstract

This paper presents a surprising result: changing a seemingly innocuous aspect of an experimental setup can cause a systems researcher to draw wrong conclusions from an experiment. What appears to be an innocuous aspect in the experimental setup may in fact introduce a significant bias in an evaluation. This phenomenon is called measurement bias in the natural and social sciences.
Our results demonstrate that measurement bias is significant and commonplace in computer system evaluation. By significant we mean that measurement bias can lead to a performance analysis that either over-states an effect or even yields an incorrect conclusion. By commonplace we mean that measurement bias occurs in all architectures that we tried (Pentium 4, Core 2, and m5 O3CPU), both compilers that we tried (gcc and Intel's C compiler), and most of the SPEC CPU2006 C programs. Thus, we cannot ignore measurement bias. Nevertheless, in a literature survey of 133 recent papers from ASPLOS, PACT, PLDI, and CGO, we determined that none of the papers with experimental results adequately consider measurement bias.
Inspired by similar problems and their solutions in other sciences, we describe and demonstrate two methods, one for detecting (causal analysis) and one for avoiding (setup randomization) measurement bias.

References

[1]
Alaa R. Alameldeen and David A. Wood. Variability in architectural simulations of multi-threaded workloads. In IEEE HPCA,pages 7--18, 2003.
[2]
Nathan L. Binkert, Ronald G. Dreslinski, Lisa R. Hsu, Kevin T. Lim, Ali G. Saidi, and Steven K. Reinhardt. The m5 simulator: Modeling networked systems. IEEE Micro, 26(4):52--60, 2006.
[3]
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA, New York, NY, USA, October 2006. ACM Press.
[4]
Stephen M Blackburn, Perry Cheng, and Kathryn S McKinley. Myths and realities: The performance impact of garbage collection. In SIGMETRICS, pages 25--36. ACMPress, 2004.
[5]
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In SC, Dallas, Texas, November 2000.
[6]
Amer Diwan, Han Lee, Dirk Grunwald, and Keith Farkas. Energy consumption and garbage collection in low-powered computing. Technical Report CU-CS-930-02, University of Colorado, 1992.
[7]
Andy Georges, Dries Buytaert, and Lieven Eeckhout. Statistically rigorous Java performance evaluation. In OOPSLA, 2007.
[8]
Intel. Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide. http://www.intel.com/products/processor/manuals/. Order number: 253669--027US, July 2008.
[9]
John P. A. Ioannidis. Contradicted and initially stronger effects in highly cited clinical research. The journal of the American Medical Association (JAMA), 294:218--228, 2005.
[10]
Sam Kash Kachigan. Statistical Analysis: An Interdisciplinary Introduction to Univariate & Multivariate Methods. Radius Press, 1986.
[11]
Tomas Kalibera, Lubomir Bulej, and Petr Tuma. Benchmark precision and random initial state. In Proceedings of the 2005 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS 2005), pages 484--490, San Diego, CA, USA, 2005. SCS.
[12]
W. Korn, P. J. Teller, and G. Castillo. Just how accurate are performance counters? In Proceedings of the IEEE International Conference on Performance, Computing, and Communications (IPCCC'01), pages 303--310, 2001.
[13]
Han Lee, Daniel von Dincklage, Amer Diwan, and J. Eliot B. Moss. Understanding the behavior of compiler optimizations. Softw. Pract. Exper., 36(8):835--844, 2006.
[14]
M. Maxwell, P. Teller, L. Salayandia, and S.Moore. Accuracy of performance monitoring hardware. In Proceedings of the Los Alamos Computer Science Institute Symposium (LACSI'02), October 2002.
[15]
Shirley V. Moore. A comparison of counting and sampling modes of using performance monitoring hardware. In Proceedings of the International Conference on Computational Science-Part II (ICCS'02), pages 904--912, London, UK, 2002. Springer-Verlag.
[16]
Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 1st edition, 2000. Standard Performance Evaluation Corporation. SPEC CPU2006 Benchmarks. http://www.spec.org/cpu2006/.
[17]
Dan Tsafrir, Keren Ouaknine, and Dror G. Feitelson. Reducing performance evaluation sensitivity and variability by input shaking. In MASCOTS, 2007.

Cited By

View all
  • (2024)A benchmark suite and performance analysis of user-space provenance collectorsProceedings of the 2nd ACM Conference on Reproducibility and Replicability10.1145/3641525.3663627(85-95)Online publication date: 18-Jun-2024
  • (2022)Characterizing Variability in Heterogeneous Edge Systems: A Methodology & Case Study2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00016(107-121)Online publication date: Dec-2022
  • (2022)A Hybrid Distributed EA Approach for Energy Optimisation on SmartphonesEmpirical Software Engineering10.1007/s10664-022-10188-527:6Online publication date: 1-Nov-2022
  • Show More Cited By

Index Terms

  1. Producing wrong data without doing anything obviously wrong!

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 37, Issue 1
    ASPLOS 2009
    March 2009
    346 pages
    ISSN:0163-5964
    DOI:10.1145/2528521
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
      March 2009
      358 pages
      ISBN:9781605584065
      DOI:10.1145/1508244
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 March 2009
    Published in SIGARCH Volume 37, Issue 1

    Check for updates

    Author Tags

    1. bias
    2. measurement
    3. performance

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)391
    • Downloads (Last 6 weeks)63
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A benchmark suite and performance analysis of user-space provenance collectorsProceedings of the 2nd ACM Conference on Reproducibility and Replicability10.1145/3641525.3663627(85-95)Online publication date: 18-Jun-2024
    • (2022)Characterizing Variability in Heterogeneous Edge Systems: A Methodology & Case Study2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00016(107-121)Online publication date: Dec-2022
    • (2022)A Hybrid Distributed EA Approach for Energy Optimisation on SmartphonesEmpirical Software Engineering10.1007/s10664-022-10188-527:6Online publication date: 1-Nov-2022
    • (2021)Performance Monitoring GuidelinesCompanion of the ACM/SPEC International Conference on Performance Engineering10.1145/3447545.3451195(109-114)Online publication date: 19-Apr-2021
    • (2021)Collective knowledge: organizing research projects as a database of reusable components and portable workflows with common interfacesPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences10.1098/rsta.2020.0211379:2197Online publication date: 29-Mar-2021
    • (2020)In-situ Extraction of Randomness from Computer Architecture Through Hardware Performance CountersSmart Card Research and Advanced Applications10.1007/978-3-030-42068-0_1(3-19)Online publication date: 9-Mar-2020
    • (2019)Reliable benchmarking: requirements and solutionsInternational Journal on Software Tools for Technology Transfer (STTT)10.1007/s10009-017-0469-y21:1(1-29)Online publication date: 6-Feb-2019
    • (2017)The Information Needed for Reproducing Shared Memory ExperimentsEuro-Par 2016: Parallel Processing Workshops10.1007/978-3-319-58943-5_48(596-608)Online publication date: 28-May-2017
    • (2016)Mining performance specificationsProceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering10.1145/2950290.2950314(39-49)Online publication date: 1-Nov-2016
    • (2016)Deepening the separation of concerns in the implementation of multimedia systemsProceedings of the 31st Annual ACM Symposium on Applied Computing10.1145/2851613.2851769(1337-1343)Online publication date: 4-Apr-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media