To test if belief elicitation helps participants make true discoveries, we devised questions to seed ground-truth models. These known models enable us to code participants’ qualitative inferences (specifically, the implied bivariate relationship) for correctness. We started by formulating an initial set of 40 prompt questions, each concerning the relationship between two quantitative variables. The questions were designed to probe common knowledge: Half featured variables that were expected to show no relationship, while the other half were expected to exhibit either positive or negative correlation. Table
1 illustrates example questions. To empirically anchor the responses to those questions in common beliefs, we recruited 40 crowdworkers from Amazon Mechanical Turk. Each worker was tasked with providing their belief on all 40 questions using the elicitation device illustrated in Figure
1. In effect, every worker provided two model parameters (
μ and
σ) in response to each question. We averaged the parameters across all workers (separately for each question), thus yielding a crowd wisdom response for every prompt. We subsequently used the mean slope to select a subset of 16 questions from the initial 40 to be used as stimuli in our experiments. Of those 16 questions, 8 questions demonstrated a crowd belief of no relationship between the prompt variables, 4 suggested a positive relationship, and 4 a negative relationship. In other words, half the prompts dictated a ‘null’ ground truth while the other half specified a correlation (either positive or negative). This prompt selection was based on the crowd wisdom. Specifically, we considered questions with an average crowd slope of
μcrowd > 0.26 to reflect a wisdom of positive correlation. Accordingly, we set
μ in the ground truth model for those questions to 0.5. Conversely, we considered a mean
μcrowd < − 0.26 to indicate a negative relationship and accordingly set the corresponding ground-truth model to
μ = −0.5. We considered questions with an average slope of − 0.12 <
μcrowd < 0.12 to indicate a lack of expected relationship between the two variables (i.e., a null model), setting the ground-truth slope to zero. We found that, across all questions, workers ascribed very similar uncertainty levels to their belief (
σcrowd). Therefore, for all ground truth models, we set
σ to 0.29, the observed mean slope uncertainty.