Figure 1: “Three diagrams are shown describing single-pass, multi-pass, and hierarchical multi-pass routing logic. For single-pass, all set of labels are given to one worker. For multi-pass, the labels are partitioned into groups (1,2,...) and given to separate workers (A,B,...). For hierarchical multi-label, the top level labels are given to one worker, who’s annotations determine whether or not the child labels will be given as a new group to annotators downstream. This example shows the case where the first worker labels TFFTF, and downstream there are two tasks set up for new workers to label the children of label 1 and label 4, respectively.”
Figure 2: “The figure shows an example passage that reads: ’The minister of fear (the CDC) was working overtime peddling doom and gloom, knowing that frightened people do not make rational decisions — nothing sells vaccines like panic.’”
Figure 3: “Four diagrams are shown side by side. In each diagram there are a set of checkboxes or radio buttons indicating how the labels will be presented to the user. Binary label (the leftmost diagram) contains a simple question ’1.1?’ and below it a radio button reading ’yes’ or ’no’. Multi-label contains a simple list of checkboxes labeled ’1.1, 1.2,...’. Hierarchical multi-label v1 contains staggered checkboxes where the leftmost checkboxes read ’1, 2’ and the boxes immediately under these are tucked under them, reading ’1.1,...’ for the parent label ’1’, and ’2.1,...’ for the parent label ’2’. Hierarchical multi-label v2 contains both the radio button setup from the leftmost diagram, as well as the checkboxes from multi-label underneath them.”
Figure 4: “A table is shown with column headers reading ’interface design’, ’greater than or equal to 1 tutorial Q’, ’greater than or equal to full tutorial’, ’greater than or equal to took exam’, and ’greater than or equal to 1 datapoint’. The table shows values for all 6 labeling schemes.”
Figure 5: “A table is shown with values for precision, recall, and F1 score. These metrics are given for each of the 6 labeling schemes, and a random baseline is shown at the bottom. Hrchl-pass multi has bold font at the f1 score indicating it is the highest value in that column: 0.56.”
Figure 6: “A two-part table is shown with column headers ’model factor’, ’estimate’, ’95
\(\%\) CI’, ’SE’, and ’p-value’. The first part of the table (top) has a subheading that reads ’labeling_scheme (baseline=multi-pass random multi-label)’, and the second part of the table (bottom) has a subheading that reads ’additional numerical factors’. The first part includes 5 of the 6 labeling schemes, while the bottom includes new factors such as ’time_started’, ’percentage_easy’, and ’true_positive_freq’. Some values in the table are denoted ’*’ which represents a p-value below 0.001.”
Figure 7: “The figure shows an example passage that reads: ’Pregnant women given vaccine have babies with more health problems’”
Figure 8: “A bar chart is shown giving the value for worker F1 score on the X axis ranging from 0 to 1, and the difficulty on the Y axis. The labels, from top to bottom, on the Y axis read ’immediate author agreement’, ’author agreement after providing rationale’, and ’author agreement after discussion’.”
Figure 9: “A scatter plot shows orange and blue dots generally following a linearly positive relationship. The orange dots are labeled ’hrchl-pass’ and the blue dots ’multi-pass’. On the X axis: ’Frequency of True Positives shown to workers’. On the Y axis: Worker F1. Each blue dot is paired with an orange dot through an arrow which is drawn between them pointing towards the orange dot.”
Figure 10: “A line plot is shown with dashed lines between three dots. The dots are lined up at tickmarks labeled ’sensitive’, ’majority’, and ’unanimous’. This is repeated for all 6 labeling schemes, leaving 6 connected dotted lines all in different colors. Every one of the lines follows an inverted V shape, with their highest point being over the ’majority’ tick mark.”