Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Assessing Human Error Against a Benchmark of Perfection

Published: 27 July 2017 Publication History

Abstract

An increasing number of domains are providing us with detailed trace data on human decisions in settings where we can evaluate the quality of these decisions via an algorithm. Motivated by this development, an emerging line of work has begun to consider whether we can characterize and predict the kinds of decisions where people are likely to make errors.
To investigate what a general framework for human error prediction might look like, we focus on a model system with a rich history in the behavioral sciences: the decisions made by chess players as they select moves in a game. We carry out our analysis at a large scale, employing datasets with several million recorded games, and using chess tablebases to acquire a form of ground truth for a subset of chess positions that have been completely solved by computers but remain challenging for even the best players in the world.
We organize our analysis around three categories of features that we argue are present in most settings where the analysis of human error is applicable: the skill of the decision-maker, the time available to make the decision, and the inherent difficulty of the decision. We identify rich structure in all three of these categories of features, and find strong evidence that in our domain, features describing the inherent difficulty of an instance are significantly more powerful than features based on skill or time.

References

[1]
Robert P. Abelson. 1985. A variance explanation paradox: When a little is a lot. Psychological Bulletin 97, 1 (1985), 129.
[2]
Richard Bellman. 1965. On the application of dynamic programing to the determination of optimal play in chess and checkers. Proceedings of the National Academy of Sciences 53, 2 (1965), 244.
[3]
Merim Bilalić, Peter McLeod, and Fernand Gobet. 2008. Inflexibility of experts—Reality or myth? Quantifying the einstellung effect in chess masters. Cognitive Psychology 56, 2 (2008), 73--102.
[4]
Tamal Biswas and Kenneth W. Regan. 2015a. Measuring level-K reasoning, satisficing, and human error in game-play data. In Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA). IEEE, 941--947.
[5]
Tamal Biswas and Kenneth W. Regan. 2015b. Quantifying depth and complexity of thinking and knowledge. In Proceedings of International Conference on Agents and Artificial Intelligence (ICAART).
[6]
Roger Brown. 1973. A First Language: The Early Stages. Harvard University Press.
[7]
Neil Charness. 1992. The impact of chess research on cognitive science. Psychological Research 54, 1 (1992), 4--9.
[8]
William G. Chase and Herbert A. Simon. 1973. Perception in chess. Cognitive Psychology 4, 1 (1973), 55--81.
[9]
Adriaan D. De Groot. 1978. Thought and Choice in Chess, Vol. 4. Walter de Gruyter.
[10]
Arpad E. Elo. 1978. The Rating of Chessplayers, Past and Present. Arco Pub.
[11]
Reuben Fine. 1941. Basic Chess Endings, Number 3. D. McKay Co.
[12]
Ralf Herbrich, Tom Minka, and Thore Graepel. 2006. Trueskill: A bayesian skill rating system. In Proceedings of Advances in Neural Information Processing Systems. 569--576.
[13]
Peter Jansen. 1990. Problematic Positions and Speculative Play. In Computers, Chess, and Cognition. Springer, 169--181.
[14]
Edward E. Jones and Victor A. Harris. 1967. The attribution of attitudes. Journal of Experimental Social Psychology 3, 1 (1967), 1--24.
[15]
Barry Kirwan. 1993. Human Reliability Assessment. Wiley Online Library.
[16]
Danny Kopec. 1990. Advances in Man-Machine Play. In Computers, Chess, and Cognition. Springer, 9--32.
[17]
Himabindu Lakkaraju, Jon Kleinberg, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2016. Human decisions and machine predictions. (2016).
[18]
Himabindu Lakkaraju, Jure Leskovec, Jon Kleinberg, and Sendhil Mullainathan. 2015. A bayesian framework for modeling human evaluations. In Proceedings of the 2015 SIAM International Conference on Data Mining. SIAM, 181--189.
[19]
Vladimir Makhnychev and Victor Zakharov. 2012. Lomonosov tablebases. Accessed November 01, 2016 from http://tb7.chessok.com.
[20]
John McCarthy. 1990. Chess as the Drosophila of AI. In Computers, Chess, and Cognition. Springer, 227--237.
[21]
Richard D. McKelvey and Thomas R. Palfrey. 1998. Quantal response equilibria for extensive form games. Experimental Economics 1, 1 (1998), 9--41.
[22]
Nicole M. McNeil. 2007. U-shaped development in math: 7-year-olds outperform 9-year-olds on equivalence problems. Developmental Psychology 43, 3 (2007), 687.
[23]
Eugene Nalimov. 2005. Nalimov tablebases. Accessed November 01, 2016 from www.k4it.de/?topic=egtb\8lang=en.
[24]
Laura L. Namy, Aimee L. Campbell, and Michael Tomasello. 2004. The changing role of iconicity in non-verbal symbol learning: A U-shaped trajectory in the acquisition of arbitrary gestures. Journal of Cognition and Development 5, 1 (2004), 37--57.
[25]
Kenneth W. Regan and Tamal Biswas. 2013. Psychometric modeling of decision making via game play. In Proceedings of 2013 IEEE Conference on Computational Intelligence in Games (CIG). IEEE, 1--8.
[26]
David E. Rumelhart and James L. McClelland. 1985. On Learning the Past Tenses of English Verbs. Technical Report. DTIC Document.
[27]
Gavriel Salvendy. 2012. Handbook of Human Factors and Ergonomics. John Wiley 8 Sons.
[28]
Herbert Simon. 1957. Models of man; social and rational. Wiley.
[29]
Herbert Simon and William Chase. 1988. Skill in chess. In Computer Chess Compendium. Springer, 175--188.
[30]
Niels A. Taatgen and John R. Anderson. 2002. Why do children learn to say broke? A model of learning the past tense without feedback. Cognition 86, 2 (2002), 123--155.
[31]
Amos Tversky and Daniel Kahneman. 1975. Judgment under uncertainty: Heuristics and biases. In Utility, Probability, and Human Decision Making. Springer, 141--162.

Cited By

View all

Index Terms

  1. Assessing Human Error Against a Benchmark of Perfection

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 11, Issue 4
    Special Issue on KDD 2016 and Regular Papers
    November 2017
    419 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3119906
    • Editor:
    • Jie Tang
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 July 2017
    Accepted: 01 January 2017
    Received: 01 November 2016
    Published in TKDD Volume 11, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Blunder prediction
    2. human decision-making

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Google Research Grant
    • Simons Investigator Award
    • Facebook Faculty Research Grant
    • ARO MURI Grant

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)49
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Positive and negative explanation effects in human–agent teamsAI and Ethics10.1007/s43681-023-00396-04:1(47-56)Online publication date: 10-Jan-2024
    • (2023)Human Satisfaction in Ad Hoc Human-Agent TeamsArtificial Intelligence in HCI10.1007/978-3-031-35894-4_15(207-219)Online publication date: 23-Jul-2023
    • (2022)Complexity and ChoiceSSRN Electronic Journal10.2139/ssrn.4098324Online publication date: 2022
    • (2022)Evaluating Human and Agent Task Allocators in Ad Hoc Human-Agent TeamsCoordination, Organizations, Institutions, Norms, and Ethics for Governance of Multi-Agent Systems XV10.1007/978-3-031-20845-4_11(167-184)Online publication date: 9-May-2022
    • (2020)Behavlet Analytics for Player Profiling and Churn PredictionHCI International 2020 – Late Breaking Papers: Cognition, Learning and Games10.1007/978-3-030-60128-7_46(631-643)Online publication date: 19-Jul-2020
    • (2019)How Data Science Workers Work with DataProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300356(1-15)Online publication date: 2-May-2019
    • (2017)Dependency Anomaly Detection for Heterogeneous Time Series: A Granger-Lasso Approach2017 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW.2017.155(1090-1099)Online publication date: Nov-2017
    • (undefined)Complexity and ChoiceSSRN Electronic Journal10.2139/ssrn.3878469

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media