Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu

Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement

2022, Collabra: Psychology

In this pre-registered study on a representative Polish sample (n = 1751), we aimed to test two potential critical issues with the Ascent of Humans scale. First, we tested whether the scores may be influenced by peripheral and previously undiscussed properties of the measurement: position of the slider-scale dot and the pattern of groups’ display. Second, we tested whether participation in Ascent of Humans measurement may influence the attitudes towards out-groups, making participants more prejudiced. All our predictions were conclusively disconfirmed. Additionally, we explored the distribution of Ascent of Humans, discovering large inflation of scores indicating the absence of dehumanisation. We discuss implications of our findings for improving theoretical grounds of dehumanisation and its measurement.

Izydorczak, K., Grzyb, T., & Dolinski, D. (2022). Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement. Collabra: Psychology, 8(1). https://doi.org/10.1525/collabra.33297 Social Psychology Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement Kamil Izydorczak 1 1 a , Tomasz Grzyb 1 , Dariusz Dolinski 1 Faculty of Psychology in Wrocław, SWPS University of Social Sciences and Humanities, Wrocław, Poland Keywords: dehumanisation, prejudice, Ascent of Humans, Blatant dehumanisation, methodology, ethics, measurement, validity https://doi.org/10.1525/collabra.33297 Collabra: Psychology Vol. 8, Issue 1, 2022 Introduction Since the Ascent of Humans (AoH) scale was introduced in 2015, it has been used in 16 published studies and mentioned in 389 articles (based on Google Scholar citations of Kteily et al., 2015 as of August 2, 2021). Findings based on these methods have been cited by the Washington Post (Kteily & Bruneau, 2015) and numerous online media sources. Considering its impact, novelty, and unorthodox approach to measure dehumanisation, critical analysis of this method by an independent research team could be a valuable contribution as no such analysis has been published yet. This study investigates whether results obtained by this scale could be biased and whether the measurement could impact views toward an out-group, rather than simply measuring them. Dehumanisation and Its Measurement Defining and measuring the degree of humanity attributed to groups and individuals is a goal of social and scientific importance. Categorising individuals as ‘human beings’ is a predicate of their inclusion in a circle of moral consideration (Leyens et al., 2003) and in a group of privileged legal status (Bastian et al., 2011). The dynamics of humanisation and dehumanisation could also shape state policy regarding the expansion or limitation of rights and inclusion/exclusion from mainstream society and culture (Esses et al., 2008; Tileagă, 2007). a Researchers’ interest in dehumanisation is also sparked by its historical importance. It is evident that dehumanisation accompanies the horrors of intergroup and international conflicts that we most certainly strive to avoid. Research often invokes examples of Tutsi and Hutu, German Nazis (Haslam, 2006), or more recent examples, such as the ongoing Israeli-Palestinian conflict (Bruneau & Kteily, 2017; Kteily et al., 2015). Although it is still unknown whether dehumanisation leads to aggression or vice versa, the co-occurrence is clear. Therefore, researchers hope that examining intergroup dehumanisation will lead to the understanding and prevention of intergroup atrocities. In summary, there are many reasons why researchers seek to measure dehumanisation. Nonetheless, addressing the question of how to do it is complicated, and the history of such endeavours is brief— the field of social psychology has been empirically measuring dehumanisation for less than two decades (Castano & Kofta, 2009). When discussing the measurement of dehumanisation, two distinctive approaches (indirect and direct) can be distinguished, each of which comes with benefits and risks. The indirect approach appeared first. The pioneering and influential work of Leyens and colleagues (2000) on emotional infrahumanisation established the field of empirical studies and measurements. In infrahumanisation, the degree of humanness is defined through differences in the attribution of secondary emotions between the in-group and the out-group (Leyens et al., 2007). A subsequent indirect approach was introduced in the concepts of mechanistic Correspondence concerning this this article should be addressed to Kamil Izydorczak, SWPS University of Social Sciences and Humanities Faculty of Psychology in Wroclaw, Aleksandra Ostrowskiego 30b, 50-505 Wroclaw, Poland. Contact: kizydorczak@swps.edu.pl Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 In this pre-registered study on a representative Polish sample (n = 1751), we aimed to test two potential critical issues with the Ascent of Humans scale. First, we tested whether the scores may be influenced by peripheral and previously undiscussed properties of the measurement: position of the slider-scale dot and the pattern of groups’ display. Second, we tested whether participation in Ascent of Humans measurement may influence the attitudes towards out-groups, making participants more prejudiced. All our predictions were conclusively disconfirmed. Additionally, we explored the distribution of Ascent of Humans, discovering large inflation of scores indicating the absence of dehumanisation. We discuss implications of our findings for improving theoretical grounds of dehumanisation and its measurement. Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement Ascent of Humans—Methodological and Ethical Aspects The AoH measurement is preferable over subtle measurements because researchers are not forced to make arbitrary decisions about what makes someone ‘human’. Moreover, the measurement provides an opportunity to examine previously under-researched, overt forms of dehumanisation. However, it has limitations. By allowing the humanness to be freely interpreted by the respondents, researchers limit their possibility of understanding, what is the exact substance of the attitude which respondents express. This poses a particular problem in the case of dehumanisation measurement since ‘humanness’ is especially prone to distinct interpretations (GinerSorolla et al., 2021). This leads to questioning how results generated by AoH should be interpreted. It is assumed that results reflect existing and consciously held beliefs about lesser degrees of humanness. However, the possibility that besides respondents’ beliefs, the social situation of the measurement along with its features can impact the results, remains unexamined. According to the tacit, but fundamental, assumption of classical test theory (Novick, 1966), the measurement process does not influence the measured variable. It reflects the ‘real result’ with a smaller or larger margin of error, but it does not make the real result itself, smaller or larger. Unfortunately, in the domain of psychological questionnaires, such consequences cannot be excluded. When asked about certain matters, respondents form an opinion even though they have no real interest or knowledge of the topic, and such opinions may easily shift (Sigelman & Thomas, 1984). Furthermore, they may also express ‘opinions on non-existent topics’, a phenomenon known in political science and consumer research as ‘pseudo-opinions’ (Bishop et al., 1980). This does not mean that participants draw their responses from a vacuum. They base them on general convictions or political stances (Sturgis & Smith, 2010). Questionnaires that produce pseudo-opinions do not measure ‘nothing,’ nor do they measure what they overtly inquire about. Another problem with measurements in psychological research is the dependence of results on circumstantial variables created by the measurement situation itself. Measurement, just like any other research procedure, is a social situation in which people do not simply express their inner, authentic, and spontaneous tendencies. Each time people are asked about something, they do not merely respond to the stated question. They also respond to imagined or actual expectations of social situations (Rosenthal, 1963). Although the researcher or developer of the method may strive to avoid suggesting the hypotheses or expressing any expectations, participants may subjectively perceive them and act accordingly. Another means by which measurement can result in much more than just capturing an existing state is the anchoring mechanism. When we are asked to make a statement or guess about something, our judgments are unconsciously affected by the subtle clues provided in the question (Tversky & Kahneman, 1974). Most typically a cue can be an initial reference point ‘X’ given in a question such as: ‘Is it more than “X” or less than “X”?’ There is a great deal of evidence indicating that people tend to evaluate close to ‘X,’ even if ‘X’ is markedly distant from the true value (Furnham & Boo, 2011). Anchors can also be more subtle, even subliminal (Re- Collabra: Psychology 2 Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 and animalistic dehumanisation (Haslam, 2006), where the degree of humanness was defined by traits the general public believes to be ‘uniquely human’ (not shared with animals) and characteristic of ‘human nature’ (absent in automata). Under the indirect approach, respondents do not explicitly evaluate how human-like an individual or group seems. Instead, researchers identify and develop a list of traits they believe are qualities of human beings. Respondents are then asked to rate individuals based on the degree they believe someone possesses them. Researchers are able to understand exactly what concept of humanity respondents are invoking as it is the same one that the researchers developed. This makes the measurement more reliable and valid. Nonetheless, there is a major drawback: it is up to the researcher to establish what it means to be human. There is a possibility that responding to the listed traits or properties does not equate to concluding humanness as a whole. Even if certain participants evaluate a group to be very low on each of the qualities, they might disagree if asked whether they considered a group non-human. The AoH scale (Kteily et al., 2015) is the latest development in measuring dehumanisation and represents a direct approach. Researchers allow respondents to formulate their own definition of humanity and directly ask them how human they think a given group is. The AoH measurement was introduced in response to the need to investigate the most blatant forms of dehumanisation. While straightforward, aggressive forms of dehumanisation spark interest in the topic, most studies investigate its subtle forms (Kteily et al., 2015). Subtle measurements are valid, reliable, and theoretically well-grounded, however, they miss a crucial element in intergroup hostility: overtly thinking about others as animals. To address this gap, Kteily and colleagues (2015) proposed a one-item scale. It includes a direct question about the degree of humanity/animality. Responses are indicated using a slider scale located below a schematic illustration of human evolution. The proposed method is ‘brief, face-valid and intuitive and it theoretically (…) captures a number of important characteristics of blatant dehumanization’ (Kteily et al., 2015, p. 4) Extensive research, with some garnering increased public attention, following the AoH approach, has demonstrated that the method addresses a theoretically and socially salient issue. As it turns out, blatant dehumanisation not only remains prevalent among many societies but also predicts violent attitudes better than subtle measurements (Kteily et al., 2015). Multiple studies have demonstrated a correlation between results of the AoH scale and theoretically expected beliefs, opinions, and traits (e.g. Kteily et al., 2015, Kteily & Bruneau, 2017, Bruneau et al., 2018). Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement can make them more cognitively available, which may affect subsequent processes of judgement. We assume that anchoring, implicit assumptions, and associative priming may impact results of questionnaires because respondents are subjected to the immanent processes of social and cognitive information processing, not because they are directly affected by the researchers’ intentions. All these features may not be consciously or intentionally introduced by researchers, however as they are subjectively perceived, they play an important role. Research Problem We argue that the peripheral features of the AoH scale, which are not theoretically justified, may substantially affect results. If this is the case, it could be problematic to identify the degree to which results generated by the measurement reflect the ‘real level’ of a latent value, and to what degree they are a by-product of a complex measurement situation encompassing cognitive and social features. First, we would like to note the issue of the initial placement of the dot on the slider scale below the AoH illustration. According to the illustration in the paper introducing the method (Kteily et al., 2015), the dot is placed on the extreme left, under the picture of the least developed creature – a quadrupedal monkey. The same dot position was used in the questionnaire file for online research, which was shared with us by courtesy of Nour S. Kteily (private correspondence, 2018), and in many subsequent illustrations from papers using the AoH scale. While the authors of the first paper describing the method discuss some peripheral elements of the measurement (such as instruction), they do discuss to the position of the dot, which may also be an important feature. We theorise that the choice of initial dot position may have nontrivial, theoretically important consequences for the measurement through changes in the implicit premises about the level of humanity and changes in the meaning of moving the dot. Placing the dot at the extreme right would create a default ‘100%’ level of humanity, which could reflect the premise that all groups are biologically complete human beings. In this case, moving the dot would mean diminishing the initial full humanity of the group, ergo dehumanising it. Placing the dot on the extreme left, chosen by the authors, sets the default level of humanity as “0%”, which could suggest a different theoretical assumption (e.g. that humanity is a “hard to earn” status). In this case, the respondent decides how much humanity to add above the initial ‘zero’, and therefore moving the dot means humanisation. It can be argued that the dot should be placed in the middle, as this gives respondents the same degree of choice when moving left and right, or that there should be no dot on the screen at all, which seems the most theoreticallyneutral option. Whatever position is chosen, this property of the measurement could benefit from theoretical reflection and justification. Moreover, important empirical consequences of the extreme left position could be suspected. Through anchoring mechanisms, such a placement could lower the Collabra: Psychology 3 Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 itsma-van Rooijen & L. Daamen, 2006); thus, it is reasonable to suspect that the type or presentation of the research topic can form a reference point that helps people to find ‘the right answer’ (Strack et al., 2016). For the AoH measurement, the following questions could be posed: What is the influence of the initial position indicator on the slider scale? What is the influence of the combined display of an in-group on out-groups on a single screen? Moreover, we would like to challenge the implicit assumption that asking whether people are fully human is harmless and morally neutral. This issue is most important from an ethical perspective. It is possible that, at least partly, awareness of social norms is what keeps people from endorsing and expressing prejudice. When these norms are dismantled, for example, through the influence of an authority figure or a shift in political discourse, prejudice intensifies among members of a given society, and they re-evaluate their views. When norms about prejudice seem to be more permissive, individuals think of themselves as less prejudiced, as they compare themselves to a more bigoted ‘average citizen’ (Crandall et al., 2018). We argue that posing a question about the degree of humanity can signal norms, as it provides clear permission to think about others in a blatantly dehumanising way. By asking this question, the questioner establishes a premise that differences in humanness may exist, and that expressing views about them is reasonable and appropriate. Notably, the AoH scale does not provide the respondent with an opportunity to become aware of this premise and respond to it (e.g. in the form of a pre-question ‘Do you believe that there are differences in the level of humanness among groups of people?’). Instead, the scale follows the default implicit assumption that the respondent subscribes to the notion of varying degrees of humanness. Theoretically, respondents can express a view indicating no differences in the degree of humanness, but the presented default assumption may lead them away from this view. The influence of ‘defaults’ has been demonstrated in critically important decisions with real-life consequences, such as organ donation (Johnson & Goldstein, 2004). Similar patterns are expected in less engaging situations, such as the anonymous completion of an online questionnaire. Furthermore, as mentioned earlier, it has been empirically demonstrated that people can act in accordance with implicit assumptions of questionnaires, for example, by stating opinions about non-existent topics or presenting knowledge about matters they have previously declared a lack of knowledge. Another reason why we believe that the AoH measurement can affect respondents’ views on an out-group is the phenomenon of associative/context priming (DeCoster & Claypool, 2004; Zeelenberg et al., 2003). It has been demonstrated that when two stimuli are presented simultaneously, one can prime associations with the other. The associations between derogated out-groups and different animals are common. They are constrained by social norms, but individuals can easily encounter them outside the mainstream media, even if they may not endorse them. Henceforth, animal-out-group associations are present in the memory and displaying a visual that links them Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement To test this hypothesis, we introduce two conditions. In the control condition, the display pattern from the original study is retained, which means that the groups are presented simultaneously, one below the other, in random order. In the experimental condition, the random sequence of groups is retained, but every group is displayed separately with no possibility of seeing previously given scores. Third, we examine the impact of participating in the AoH measurement on attitudes toward out-groups. We suppose that participating in the AoH scale can shift beliefs about an out-group, such that after responding to the scale, individuals may hold more dehumanising views (H4) and more prejudice (H5) toward the groups which they were asked about. To test these hypotheses, we measure the level of prejudice and infrahumanisation at the end of all AoH trials. Scores for prejudice and infrahumanisation are compared after completing the AoH scale with the results of the control group, who will respond to a bogus questionnaire of similar length and structure, free of intergroup and human/ animal connotations. In addition to the third research problem, we address how the impact of the AoH scale can be compared to the impact of a similar prejudice-related scale. If the AoH can influence attitudes toward groups, can the same be said about other, similar measurements? To test this, we introduce another condition with a ‘Feeling thermometer’ scale. The ‘Feeling thermometer’ is similar to the AoH scale. It utilises a slider scale and encompasses a metaphorical way of expressing a positive or negative attitude. It differs in that it does not lift any social taboo, and neither image nor instruction contains any suggestion of generic, essential differences between social groups. Therefore, we suppose that infrahumanisation of out-groups would be greater after responding to the AoH than the ‘Feeling thermometer’ scale (H6). The results of this study are valuable, regardless of whether hypotheses were confirmed. Every instance in which the hypotheses are proven wrong could be interpreted in favour of robustness and ethical feasibility of the method. Note that if the measure proves to be unaffected by the anchoring effect or by cognitive clues suggesting the researchers’ expectations, it could be treated as evidence in favour of both the reality of blatant dehumanisation and the reliability of the method. If all hypotheses are proven wrong, it could mean that the AoH measurement follows assumptions of the classical test theory in the sense that it does not influence the measured variable. It could also mean that the measured disposition towards a group is generally well established so that it manifests itself in the same way regardless of changes in the measurement situation. Method To test the hypotheses, we conducted an experimental study involving participants via an online panel. The analysis was performed using the Bayesian approach, with all hypotheses pre-registered via the Open-Science Framework using the template by van’t Veer and Giner-Sorolla (2016). All materials and data are freely available through an online repository (https://osf.io/c5k8q/). Collabra: Psychology 4 Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 score, as the initial position of the dot can serve as an anchoring point in the evaluation process (Furnham & Boo, 2011; Reitsma-van Rooijen & L. Daamen, 2006; Tversky & Kahneman, 1974). Another potential issue is the display of the groups. In the original method, all evaluated groups were displayed on the same page. This feature of the measurement situation has also been left undiscussed, while we argue that it may be important for results. Considering measurement as a social situation in which participants may seek to guess hidden expectations and rules, we argue that displaying the groups together along with the instructions which read: ‘Some people think that people can vary in how human-like they seem (…)’ can result in the impression that the expectation of the task is to indicate the differences. First, such instruction can serve as social proof for the validity of the idea that people present different levels of humanness. Second, when all the groups are presented together, participants can more easily diversify their responses, without remembering them. Summing up, the display pattern where respondents could easily see all their answers, along with instructions encouraging to indicate differences, could result in increased variability of scores. Considering all these reasons, we argue that participating in the AoH measurement can affect views about others. By removing a social taboo, introducing the premise that differences in degrees of humanness exist, and strengthening and invoking associative primes between humans and animals, the AoH measurement could induce dehumanisation rather than just measure it. To address these concerns, we investigate three research problems. First, we evaluate whether the initial placement of the dot affects scores obtained by the AoH scale. To do so, we manipulate the dot’s position, creating three conditions. In the control condition, the dot is placed where it appears originally, on the extreme left. In the two experimental conditions, the dot is placed in the middle and extreme right. We hypothesise that because of the anchoring-adjustment heuristic (Furnham & Boo, 2011; Tversky & Kahneman, 1974), the middle position will result in substantially higher scores than the left position (H1), while the right position will yield higher results than the middle position (H2). We suppose that this effect will occur only with respect to highly derogated out-groups because of the ceiling effect— scores for a favoured out-group may be too high to be heightened further. From recent public opinion polls, we conclude that the most disregarded out-groups for the intended population are Muslim refugees, Arabs, Roma, and Russians (Omyła-Rudzka, 2019; Stefaniak et al., 2017). Therefore, we propose the first two hypotheses with respect to them. Second, we investigate the role of the display pattern of groups in creating variability among results for different groups. Due to the perceived social expectation mechanism and cognitive availability, combined with anchoring heuristics, we expect that the mean within-subject variance will be higher when groups are displayed all at once. We hypothesise that the scores for different groups will be differentiated when groups are displayed together (H3). Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement Deviations from Pre-registered Protocol Regarding the missing data handling, we decided to deviate from pre-registered protocol. It turned out that our pre-registered criteria for data exclusion proved inadequate to meaningfully detect the low-effort and suspicious responses and there are better alternatives possible. Here are lists of changes along with their justifications: Participants and Data Gathering Measurements Participants constitute a sample of the Polish population, representative of age, gender, and educational attainment. The population structure was sourced from the government’s statistical office and representativeness was obtained through targeted sampling. Participants were recruited via online panel (‘Ariadna’). All participants received reward points from the panel and provided informed consent. The sample composition and recruitment method reflect the design in Bruneau et al. (2018). The desired sample was estimated using Bayes factor design analysis with fixed ‘n,’ described by Schönbrodt and Wagenmakers (2017). We planned to recruit 200 partici- We used three questionnaires: AoH, Infrahumanisation and Feeling thermometer. These methods were used to evaluate eight groups: Poles (in-group), Germans, Russians, Roma, Arabs, Muslim refugees, Czechs, and Americans. Additionally, we created a bogus measurement which was intended to serve as a control condition task in place of the AoH scale. Ascent of humans. The measurement of blatant dehumanisation was first introduced in a study by Kteily et al. (2015). Since then, it has been used in various forms and under different names. Originally the scale was dubbed the ‘Ascent of man’, although most recent papers refer to it as Collabra: Psychology 5 Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 1. Instead of using open questions to screen-out suspicious responses, we used a quality-check tool, provided by Qualtrics - “ExpertReview”. This tool analyses re-captcha scores, time of completion, duplicate responses and pattern of missing responses to identify low-quality data. We decided that this automatic tool would serve our goal much better than our arbitrary, qualitative analysis. At the time of pre-registration, we were not aware of this tool being available. 2. We decided to drop the initial idea of “forcing” responses because of the panels’ recommendation against such measures. Instead, we opted for “requesting response” - if the participant left some item unanswered, they saw a completion request. The respondent could ignore the message and proceed, consciously leaving some questions unanswered. We decided that in such a case, responses could be reasonably treated as low-effort and dropped from the analysis. 3. We decided to drop the exclusion criteria regarding “(…) participants whose time of completing the questionnaire is extremely above or below typical (under and above 3 SD)”. After inspecting our results, we found around 50, unevenly distributed outliers, some of them very extreme, which clearly indicated breaks in the survey completion. The standard deviation proved to be so high, that it could not form meaningful cut-off points. Furthermore, we discovered no unrealistically fast answers, and extremely long answers had not differed in quality as judged by other criteria (missing answers, ExpertReview). We concluded that since breaks do not indicate low-quality answers and cut-off criteria would be either meaningless (3 SD) or too arbitrary (alternative method chosen after datainspection), we should not use time-based criteria for data exclusion at all. We included completion time in our database to allow independent evaluation if desired. pants for each of the seven conditions. The hypotheses tests were assumed to be conclusive when BF ≥ 6. This value was chosen as it is commonly interpreted as moderate support for a hypothesis (van Doorn et al., 2019), which we find to be conclusive enough to achieve the scientific goals of the study. To compute the probability of obtaining compelling evidence given BF = 6, n = 200, and ES = 0.4, we performed a Monte Carlo simulation using the R-package ‘BFDA’ (Schönbrodt & Stefan, 2018). The simulation was repeated 10,000 times, with the default Cauchy prior (zero-centred, r = 0.707). We chose an effect size of 0.4 because the mean effect size of the difference between the ingroup and outgroups in the Bruneau et al. (2018) study was ES = 0.61. We decided that detecting an effect of peripheral properties that were more than half the size of the effect of the focal test would be a significant finding from a theoretical and practical perspective. Under H1, the probability of a false negative result was < 0.01%, while that of inconclusive results was 8.7%. Under H0, the probability of a false-positive result was 0.5%, while that of inconclusive results was 31.8%. Note that the actual n was higher for testing hypotheses 1–3, as we used two or three conditions per side, resulting in 400/400 and 600/600 comparisons (see Figures 2 and 3). Our final sample was larger than we planned because of the additional volume added from the research panel. We decided to include additional participants to maximise the utility of the used resources. We excluded 49 participants with missing answers in non-demographic questions. Additionally, we excluded one participant with suspicious ID which did not match the pattern of the Panel’s ID. Qualtrics ExpertReview quality detection system indicated eight possible records from bots, but these records contained missing answers as well, so no respondents were excluded solely on this particular criterion. The final sample consisted of 1751 participants (927 females, 810 males, 14 missing answers, Mage = 42.65, SDage = 14.13, ranging from 17 to 85, 14 missing answers). The participants’ levels of education were: primary – 11.5%, vocational – 19.9%, secondary – 33.5% , higher – 34.3%, 14 missing answers. The participants’ places of residence were: village – 39.7%, small city (up to 20k residents) – 9.3%, medium city (20k-99k residents) – 18.3%, large city (100k or more) – 31.9%, 14 missing answers. Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement Figure 1. Illustration above the slider scale in “Ascent of Humans” measurement. in-group. The particular version of the method used in this study follows the prejudice measurement from Bruneau, Kteily, and Laustsen (2018). Infrahumanisation. Infrahumanisation was measured by the list of emotions originally developed by Demoulin et al. (2004) and adapted and normalised by Bilewicz, Mikołajczak, Kumagai, and Castano (2010). Based on ratings given by the respondents in the adaptation study, the research team, assisted by expert judges, chose 20 emotions, with 5 for each category: high humanity/low desirability, high humanity/high desirability, low humanity/low desirability, and low humanity/high desirability. The list was chosen with consideration to humanity/desirability scores, but also so that it does not contain redundant or obscure words. Respondents rated the extent to which they believed the members of the group ‘X’ are, in general, likely to feel the given emotion, on a seven-point scale. The full list of emotions and the list chosen for this study are available at https://osf.io/c5k8q/. Bogus scale. To conclude the influence of evaluating groups via the AoH scale, the participants in control conditions needed to be engaged in a task similar to AoH, but free of in-group/out-group and low/high humanity associations. In the control condition, participants were asked to evaluate eight different brands of mobile phones (Samsung, Apple, Huawei, LG, Alcatel, HTC, Sony Ericsson, Motorola) in terms of how innovative and modern they seemed. The instructions read: ‘Some people think that brands of a mobile phone vary in how innovative and modern they seem. According to this view, some brands seem highly innovative, whereas others seem to be derivative and archaic. Using the sliders below indicates how innovative you consider the brand to be’. Participants then saw an image of five mobile phones presented from the oldest to the most contemporary smartphone, and they were asked to evaluate the eight mobile phone brands (see Supplementary Materials or OSF repository: https://osf.io/c5k8q/) Research Design We randomly assigned participants to one of the eight experimental conditions. In six (3×2) conditions, participants first completed the AoH scale with one of three dot positions (left, middle, right) combined with one of two display patterns (joint, separated). Subsequently, participants completed the ‘feeling thermometer’ and ‘infrahumanisation’ measurements Collabra: Psychology 6 Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 AoH, following recommendations from Kteily and Bruneau (2017) to make the name more inclusive. Using Google Scholar, EBSCO, ResearchGate, and Mendeley search engines, we identified 16 works published between 2015 and 2019 using a version of the AoH scale. Of these, 12 studies were peer-reviewed papers, one was a doctoral dissertation, one was a working paper, and one was a research report from an academic research centre, while one was a conference paper announced as scheduled for publishing in a peer-reviewed journal (a list of the considered works is to be found in the Supplementary Materials and OSF repository: https://osf.io/c5k8q/). After reviewing the sources, we concluded that the studies varied in the details of the measurement. For instance, some used reference points underneath the slider scale, while others did not. Differences were also found in the instructions presented. Most often, none of the measurement properties were directly described in full detail. They had to be deducted from presented pictures, examined from uploaded research materials, or confirmed via contact with the authors. To reach conclusions about what the most ‘standard’ method would look like, we combined our insights from the source review with information from direct contact and the obtained study materials. We concluded that although there is no precise, full consensus regarding the design of the Ascent of Humans scale, the most common features have been: lack of a reference point underneath the slider scale, initial position of the dot at the extreme left, multiple groups per screen display, randomised group display order, and instructions which read: ‘Some people think that people can vary in how humanlike they seem. According to this view, some people seem highly evolved, whereas others seem no different than lower animals. Using the sliders below, indicate how evolved you consider the group of people to be.’ What remains unchanged throughout all investigated studies is the picture used. To the best of our knowledge, it has always been the same black-and-white graphic, depicting five silhouettes ranging from a quadrupedal monkey to an anatomically contemporary human (see: Figure 1) The dehumanisation score for each group was obtained by subtracting the rating of an out-group from the rating of the in-group. Based on these facts, we established the AoH scale with all of the properties described above as our reference point for experimental manipulations. In our analysis, we used two types of AoH scores. The relative AoH score (AoHrel) was computed by subtracting the score of the outgroup from that of the in-group. A higher AoHrel value indicates stronger dehumanisation. The absolute score (AoHabs) is the degree of humanity attributed to the group, and it can assume values from 0 to 100 (full humanity). Prejudice. Prejudice was assessed using a feeling thermometer, a commonly used method in which participants are asked ‘How warm (favourable) or cold (unfavourable) do you feel towards the following groups?’ Answers are given on a 5-point scale (with two presented anchors: 1 = very unfavourable, 5 = very favourable; Haddock et al., 1993). Relative prejudice toward each group was computed by subtracting the score of an out-group from the score of the Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement in a randomised order. In the seventh condition, participants first completed a bogus scale measurement followed by the ‘feeling thermometer’ and ‘infrahumanisation’ scale in a randomised order. In the eighth condition, participants first completed the ‘feeling thermometer’ scale followed by ‘infrahumanisation’ scale. The order of groups was randomized across all conditions and scales. The number of participants in each group were: AoH joint display/left dot (n = 239), AoH joint display/middle dot (n = 221), AoH joint display/right dot (n = 222), AoH seperate display/left dot (n = 223), AoH seperate display/middle dot (n = 217), AoH seperate display/right dot (n = 225), ‘bogus scale’ (n = 225), ‘feeling thermometer’ (n = 229). The research plan for each group is summarized in Figure 2. Data Analysis Data analysis was conducted using the Bayesian approach. Due to the absence of previous related studies, we used default priors with a zero-centred Cauchy distribution, r = .707. As previously mentioned, a Bayesian factor of six in favour of either null or alternative hypotheses was considered conclusive. See Figure 3 for detailed list of statistical procedure and key variables in all hypotheses. No outliers were identified in terms of the time of response or any otherwise suspicious answers. 49 respondents were removed due to missing answers in dependant variables measures, one respondent was removed from the database due to atypical respondent ID and unusual order of question display, which suggested an error in Qualtrics engine or online panel software. Results Here, we present the analyses of the pre-registered hypotheses along with non-pre-registered exploratory analyses. All analyses of pre-registered hypotheses are supplemented with a Bayesian factor robustness check – a method that allows testing the sensitivity of the Bayesian factor to different widths of priors distributions. Plots for these checks can be found in the OSF repository (https://osf.io/ c5k8q/). Pre-registered Analyses All the pre-registered analyses were Bayesian MannWhitney-U for independent samples (van Doorn et al., 2019). In accordance with the pre-registered plan, we decided to use ‘U’ tests due to discrepancies between distributions of all dependent variables and the normal distribution. Specifically, all distributions were extremely left-skewed, with the mode being equal to the maximum score of the scale (100). In Figure 4, we present the combined distribution of AoHabs scores for all four tested groups (Arabs, Muslim refugees, Roma, Russians). The distributions of AoHabs scores for each group followed roughly the same shape. To formally confirm or reject hypotheses, we used the pre-registered criteria of BF > 6. The prior probability is a zero-centred Cauchy distribution with a scale parameter of .707 in all cases. Sliders’ scale dot position and the AoH score. We hypothesised (H1) that the AoHabs score for the left dot position (n = 452) would be substantially lower than that for the middle (n = 419). The null hypothesis was δ = 0, and the alternative was directional: δ < 0. We obtained the following results: • Inconclusive results for Roma: BF01= 2.44, posterior effect size distribution was centred around Glass’s δ = -.11, 95% CI [-.24, -.01] Collabra: Psychology 7 Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 Figure 2. Diagram of experimental conditions and procedure sequence. Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 Figure 3. Summary of hypotheses with corresponding groups, variables and planned analyses. • Inconclusive results for Russians: BF01 = 4.04, posterior effect size distribution was centred around Glass’s δ = -.09, 95% CI [-.22, -.01] • Data in favour of the H0 for Arabs: BF01 = 7.15, posterior effect size distribution was centred around Glass’s δ = -.07, 95% CI [-.20, -.01] • Data in favour of the H0 for Muslim refugees: BF01 = 10.17, posterior effect size distribution was centred around Glass’s δ = -.06, 95% CI [-.17, - <.01] Analogically, we expected (H2) that the AoHabs score for the middle (n = 419) dot would be substantially lower than the score for the left dot (n = 421). The null hypothesis was δ = 0, and the alternative hypothesis was directional: δ < 0. We obtained the following results: • Data in favour of the H0 for Roma: BF01 = 19.89, posterior effect size distribution was centred around Glass’s δ = -.03, 95% CI [-.13, - <.01] • Data in favour of the H0 for Russians: BF01 = 23.61, posterior effect size distribution was centred around Glass’s δ = -.03, 95% CI [-.12, - <.01] • Data in favour of the H0 for Arabs: BF01 = 18.11, posterior effect size distribution was centred around Glass’s δ = -.04, 95% CI [-.14, - <.01] • Data in favour of the H0 for Muslim refugees: BF01 = 15.79, posterior effect size distribution was centred around Glass’s δ = -.04, 95% CI [-.14, - <.01] Given our criteria, both hypotheses regarding the influence of dot position were either disconfirmed or inconclusive. Group display pattern and the within-subject variance of AoHabs score. We verified the hypothesis that when groups are displayed on a single screen, one below Figure 4. Distribution of absolute AoH score for all groups combined. the other, the AoHabs scores will be more varied than when groups are displayed on a single screen (H3). To test this, we computed the within-subject variance for all the groups’ scores and then tested the difference in variances between the joint display (n = 651) and separate-display groups (n = 649). The null hypothesis was δ = 0, and the alternative was directional: δ < 0. The data was strongly in favour of the null hypothesis: BF01 = 51.84, posterior effect size distribution was centred around Glass’s δ = .02, 95% CI [.00, .05]. Impact of participating in the AoH measurement on attitudes toward out-groups. With respect to the second problem, we verified three hypotheses: Collabra: Psychology 8 Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement Table 1. Bayesian Mann-Whitney U Test for comparison of AoH and bogus group on infrahumanisation (H4). Posterior median effect size (δ) Lower 95 CI Upper 95 CI < 0.01 0.24 0.12 0.01 0.30 0.08 < 0.01 0.24 0.10 0.01 0.28 BF₊₀ ₊ BF₀₊₊ W Rhat Arabs 0.13 7.47 25571.50 1.00 0.08 Roma 0.33 3.04 27217.50 1.00 Russians 0.15 6.74 26018.00 1.00 Muslim refugees 0.23 4.40 27103.00 1.00 Note. For all tests, the alternative hypothesis specifies that group AoH is greater than group bogus. Note. Result based on data augmentation algorithm with 5 chains of 1000 iterations. Table 2. Bayesian Mann-Whitney U Test for comparison of AoH and bogus group on feeling thermometer (H5). Upper 95 CI 0.04 < 0.01 0.16 1.00 0.08 < 0.01 0.24 1.00 0.06 < 0.01 0.20 1.00 0.05 < 0.01 0.19 BF₀₊₊ W Rhat Arabs 0.06 17.80 24337.50 1.00 Roma 0.14 7.14 26240.00 Russians 0.09 11.05 24956.00 Muslim refugees 0.08 11.71 25161.50 Note. For all tests, the alternative hypothesis specifies that group AoH is greater than group bogus. Note. Result based on data augmentation algorithm with 5 chains of 1000 iterations. • H4: Participating in AoH measurement will result in higher infrahumanisation scores toward out-groups when compared with participating in the bogus scale measurement. • H5: Participating in AoH measurement will result in higher feeling thermometer scores toward out-groups when compared with participating in the bogus scale measurement (note that a higher feeling thermometer score indicates more prejudice toward out-group). • H6: Participating in AoH measurement will result in higher infrahumanisation scores toward out-groups when compared with participating in feeling thermometer measurements. We tested a group of participants previously engaged in the standard AoH measurement (left dot, joint display) versus the group who completed a bogus scale (see p. 21 and Figure 2) or feeling thermometer scale. For all three hypotheses, the null hypothesis was δ = 0, and the alternative was δ > 0. Infrahumanisation scores for all four out-groups proved to be marginally influenced or independent of prior engagement in the AoH measurement. The Bayesian factor in favour of the null hypothesis ranged from BF01 = 7.47 for Arabs and BF01 = 3.04 for Roma. This indicates that evidence from the data ranged from inconclusiveness to moderate support for the null hypothesis (Table 1). Feeling thermometer scores were also unaffected by prior engagement in the AoH versus the bogus scale. The Bayesian Factor in favour of the null hypothesis ranged from BF01 = 7.14 for the Roma and BF01 = 17.80 for Arabs, which provided moderate to strong support for the null hypothesis (Lee & Wagenmakers, 2013; Table 2). The last pre-registered hypothesis stated that participating in AoH measurement will have a stronger influence on out-group derogation than participating in a somewhat similar slider-based measurement: the feeling thermometer. The null hypothesis was δ = 0 and the alternative was δ > 0. In all four tested out-groups, the Bayesian Factor favoured the null hypothesis, but only in two of them, BF reached a conclusiveness threshold (BF01 = 8.79 for Muslim refugees and BF01 = 13.01 for Roma). The Bayesian factors for Russians and Arabs are inconclusive. In summary, evidence suggests that we should shift our beliefs towards the notion that participants previously engaged in AoH measurement are just as likely to infrahumanise as those who responded to the feeling thermometer scale (Table 3). Notably, owing to the sample plan analysis (see section Participants and Data Gathering), we know that inconclusiveness is substantially more probable under the true null hypothesis than the alternative. Another plausible interpretation for the inconclusive results is that some effects may exist, but their sizes are below the minimum effect of interest. Exploratory Analyses In addition to the pre-registered analysis, we decided to explore the database in search of additional valuable insights and inspiration for future research. We decided to explore three areas: (1) relationships between AoH, prejudice, and infrahumanisation, (2) the prevalence of blatant dehumanisation of various out-groups, and (3) the distribution of AoH scores. Collabra: Psychology 9 Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 Posterior median effect size (δ) Lower 95 CI BF₊₀ ₊ Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement Table 3. Bayesian Mann-Whitney U Test for comparison of AoH and ‘thermo’ group on infrahumanisation. Posterior median effect size (δ) Lower 95 CI Upper 95 CI 0.17 0.02 0.35 1.00 0.05 0.00 0.19 1.00 0.09 0.01 0.26 1.00 0.06 0.00 0.22 BF₊₀ ₊ BF₀₊₊ W Rhat Arabs 0.99 1.01 28428.00 1.01 Roma 0.08 13.01 25712.50 Russians 0.19 5.18 27060.00 Muslim refugees 0.11 8.97 26468.00 Note. For all tests, the alternative hypothesis specifies that group AoH is greater than group bogus. Note. Result based on data augmentation algorithm with 5 chains of 1000 iterations. Table 4. Mean relative AoH scores in current study versus in the study by Bruneau et al. (2018). Germans Current study (Poland) Bruneau et al., 2018, Study 1 (Czech Republic) Bruneau et al., 2018, Study 2 (Hungary) Relationship between Blatant Dehumanisation, Infrahumanisation and Prejudice. Measures of blatant dehumanisation, infrahumanisation, and prejudice proved to be interrelated. Due to the highly skewed distribution of all variables, we used a non-parametric Kendall’s tau-b coefficients with default prior distribution (zero-centred, beta = 1). The strongest relationship was between blatant dehumanisation (AoHrel) and prejudice (feeling thermometer). The correlation for all out-groups combined was rτ(9008) = .36, 95% CI [.35, .37], BF10 > 1000. The correlation between AoHrel and infrahumanisation was also significant, but much smaller, rτ(9008) = .06, 95% CI [.05, .07], BF10 > 1000. These results replicate the pattern identified in previous studies, in which AoH scores proved to be highly correlated with measurements of explicit prejudice and mildly correlated with other measurements of dehumanisation (Kteily et al., 2015; Kteily & Bruneau, 2017). Moreover, the infrahumanisation score was correlated with the feeling thermometer scale: rτ(9008) = .11, 95% CI [.10, .12], BF10 > 1000. Interestingly, the more the out-group was negatively perceived, the stronger the association between blatant dehumanisation and prejudice. For the most disfavoured groups, Muslim refugees, Arabs, and Roma, the correlations were rτ (1283) = .40, 95% CI [.37, .44], BF10 > 1000; rτ (1287) = .34, 95% CI [.31, .38], BF10 > 1000; and rτ(1285) = .33, 95% CI [.30, .37], BF10 > 1000, respectively. For most favourably viewed Americans, this effect was about half the size: rτ(1291) = .18, 95% CI [.14, .21], BF10 > 1000. Prevalence of blatant dehumanisation of various out-groups and distribution of scores. Our choice of outgroups, population, and measurement methods was based on the study by Bruneau et al. (2018). Thus, we compare our results with those of this work. We present two types of AoH scores: relative and absolute. The relative AoH score (AoHrel) was computed by subtracting the score of the outgroup from that of the in-group. A higher AoHrel value indicates stronger dehumanisation. The absolute score Muslim Refugees Roma Russians -2.07 18.57 13.41 8.69 .5 37.5 38.7 11.8 0.0 26.0 27.6 -- (AoHabs) is the degree of humanity attributed to the group, and it can assume values from 0 to 100 (full humanity). In accordance with our expectations, the four groups that we assumed to be negatively perceived stood out from other groups in AoHrel scores. Similar to the results obtained by Bruneau et al. (2018) on Central European samples (Hungary and the Czech Republic), Muslim refugees (M = 18.6, SD = 28.86), and Roma (M = 13.46, SD = 25.16) proved to be most blatantly dehumanised. However, the degree of dehumanisation was smaller than that in the original study (Table 4). Regarding groups which we assumed to be positively perceived (Czechs, Germans, and Americans), we found no substantial evidence for widespread dehumanisation. Moreover, Germans and Czechs were estimated to be even slightly more human than the in-group (AoHrel for Germans: M = -2.07, SD =20.01, Czechs: M = -.15, SD = 19.24). We examined the average scores, but a quick glimpse at the distribution plots led us to the conclusion that Mean or any other measure of central tendency neglects important information. Figure 5 shows the distribution of AoHabs scores. The panels are sorted in descending order of the mean AoHabs. The top-left panel displays the distribution for the most humanised group (Germans) and the bottom-right, the least humanised (Muslim refugees). Most noteworthy, we observed extreme inflation of the ‘100’ and adjacent scores for each group. Even for the most dehumanised group (Muslim refugees), 29.84% of all scores equalled 100. For the ingroup (Poles), 48.74% of scores equalled 100, and for the most humanised group (Germans), 51.44%. Beside the highly inflated peak at ‘100’, the distribution was close to uniform, with some small peaks at values: ‘0’, ‘25’, ‘50’ and ‘75’. In summary, we can identify three distinctive features of the AoHabs distribution: 1. The scores are always strongly concentrated on the Collabra: Psychology 10 Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 Mean AoHrel Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement highest possible value 2. Lower values are distributed along minimally leftskewed, almost horizontal lines 3. There are small peaks at the four evenly spaced areas We suppose that these peaks are caused by silhouettes above those areas (see Figure 4). These pictures may serve as distinct, visible cues. After all, the anchoring mechanism may have been in play, but the anchors turned out to be pictures rather than slider-dots. Figure 6 shows the violin plots of the distribution of the AoHrel scores sorted by the increasing mean AoHrel score. The plots do not resemble ‘violins’, because they represent a peculiar distribution. What is striking is the completely different shape of the distribution for positively perceived (Germans, Czechs, and Americans) and negatively perceived out-groups (Russians, Arabs, Roma, and Muslim refugees). For the first three out-groups, we can see a massive concentration of the results around ‘0’. These ‘disks’ in the middle represent a large portion of scores showing virtually no relative dehumanisation. When it comes to four negatively perceived out-groups, we can see that AoHrel = 0 is only mildly dominant and scores slightly below and above zero are quite common as well. Furthermore, one can notice that even in the case of the highest mean AoHrel score (represented by the dots), the cluster of central-tendency scores remain in the same place (around 0). It is the shape of this cluster and the small amount of the above-central tendency scores that make the difference in the mean score. What theoretical insights can be obtained from this visual analysis? The first and most important information is that a low average AoH score for an out-group does not indicate a general consensus about their lower degree of “humanity” - it indicates less universal agreement that they are fully human. While full humanity was always the most common score, the difference between the more and less dehumanised out-groups was due to the proportion of in-group members who do not express this dominant view. The second insight is that the complete lack of discrimination of the outgroups is not uncommon. Even in the case of most unfavourably viewed groups, there is still a significant proportion of people who do not dehumanise them. Furthermore, the in-group is also subjected to absolute dehumanisation (more than 50% of the respondents viewed their in-group as less than fully human). Discussion This study aimed to address the methodological and ethical issues associated with the AoH measurement through a transparent, pre-registered experimental procedure. The results of these tests were overwhelmingly disproving when it came to our concerns. First, we hypothesised that the raw score of the AoH measurement can be substantially influenced by the sliderscale dot position or by the pattern of the group display. If our hypothesis has been confirmed, we would state that the AoH score may create a specific impression rather than capture pre-existing beliefs. Consequently, we interpret the falsification of our hypotheses as a reason to shift our beliefs toward the notion that the results of AoH measurement stem from sources other than the peripheral properties of the measurement. Overall, these results should be interpreted as evidence against the notion that AoH scores are just artefacts of a particular measurement method. Second, and perhaps more importantly, we found a strong, conclusive disproval of our ethical and methodolog- Collabra: Psychology 11 Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 Figure 5. Distribution of absolute AoH scores for all groups. Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement ical concerns regarding the influence of participating in AoH measurement. We hypothesised that participating in AoH measurement can strengthen prejudice, resulting in a more negative perception of the out-group in the following measurements. If the hypothesis was confirmed, it would pose serious ethical concerns and cast doubt on the pre-existing body of theoretical validity evidence. After filling out the AoH questionnaire, respondents did not express a more negative and dehumanising view of the out-groups. This discovery weakens our main ethical concern: by giving such questionnaires to the public, we might induce prejudice. Furthermore, this study provides more confidence regarding AoH scores to be a good predictor of multiple negative attitudes toward out-groups. We proved that correlations between AoH scores and other prejudicerelated measurements do not stem from the uncontrolled causal effect, but rather from underlying relationships. In addition to our main pre-registered hypothesis, we share novel insights into many characteristics of the measurement. Above all, we were able to systematically evaluate the prevalence of blatant dehumanisation in a given population. We conclude that despite dehumanisation being visible on the mean scores for out-groups, a substantial fraction of the respondents did not dehumanise out-groups at all. After inspecting the distribution of results, it may be observed that scores indicating full humanity were massively inflated. Such a point-inflated distribution indicates the dual mechanism of responses – one mechanism account for the difference between the inflated score and the rest of the distribution and the second mechanism underlies the variability within the rest of the distribution. For instance, investigating cigarette smoking habits by asking ‘how many cigarettes do you smoke weekly?’ would obtain a technically continuous variable, however, analysing it just as such would be incomplete. The difference between ‘0’ and ‘1’ is the difference between a non-smoker and a regular smoker, and a massive inflation of ‘0’ scores in the population may be observed. The best approach would be to treat the difference between ‘0’ and ‘1’, and the variance in the rest of the scale as two separate phenomena. This will allow us to include qualitative differences between dehumanising and non-dehumanising individuals (analogous to ‘smokers’ and ‘nonsmokers’), which will not only reflect AoH scores more accurately but also provide a better insight into the relationships with other variables. There are statistical techniques that allow the modelling of such variables in a dual way. (e.g. hurdle models or zero-inflated Poisson, see: Green, 2021). Apart from methodological aspects, the distribution of the scores provides valuable theoretical information. The percentage of respondents displaying no out-group derogation was substantially higher with AoH measurement than with other measurements from this domain. This implies that this prejudicial view is comparatively rare. Perhaps the central claim behind the development of the AoH scale – that blatant dehumanisation is still prevalent in contemporary society needs an important complement. Blatant dehumanisation is present, yes, but is not universal, and not nearly as common as more subtle prejudice. We believe that this may be the reason why AoH is a better predictor of out-group aggression or discrimination. Out of all widely used methods, AoH may be the best at capturing a firm, consciously held prejudice. In that respect, AoH may bridge an important gap by examining blatant dehumanisa- Collabra: Psychology 12 Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 Figure 6. Violin plots of relative AoH score for all tested out-groups. Out-groups are presented in the order of ascending mean score. Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement tion. Recent research on prejudice is often said to concentrate too much on the subtle, unconscious biases on the expanse of overtly hurtful, self-conscious, and active racism, sexism, etc., which are still an important social issue. Limitations and Future Directions Contributions Contributed to conception and design: KI. TG, DD. Contributed to acquisition of data: KI. Contributed to analysis and interpretation of data: KI. Drafted and/or revised the article: KI. TG, DD. Approved the submitted version for publication: KI. TG, DD. Competing Interests Authors declare no competing interests regarding presented work. Acknowledgements We would like to thank Michał Bilewicz for providing a valuable insights and suggestions regarding the design of the study and the measurements we could use. Funding Information The study has been funded by SWPS University of Social Sciences and Humanities, Faculty of Psychology (grant competition nr 1/2019/2020) from the subvention of the Ministry of Science and Higher Education, Republic of Poland. Data Accessibility Statement All data, reproducible files for data analyses and experimental materials are publicly accessible via Open Science Collabra: Psychology 13 Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 Currently, the line of research on dehumanisation has been questioned (Over, 2021). The main concerns are theoretical: How exactly is dehumanisation defined? To what extent could it drive inter-group violence? Are the comparisons to animals universally derogative and specifically attributed to out-groups? Over (2021) argues that the proponents of dehumanisation research do not provide enough evidence to support the notion that dehumanisation was a driving factor for violence and discrimination or that historically persecuted out-groups were consequently perceived as less human. Over (2021) suggests that the main driving force behind inter-group atrocities is an extremely negative out-group perception, often focused on the arcs which make sense only when applied to human beings (traitors, schemers). Over (2021) argues that comparisons to animals are present only when they serve to enhance and consolidate these negative connotations. Consequently, when individuals associate certain out-groups with animals, it may not necessarily mean that they think of the members as less human. This may mean that they hold strong, negative views about these out-groups and that they often came across messages that embed these views in some animal metaphors, which have now become a part of an association-net around this out-group. Therefore, does AoH measurement provide evidence that a substantial portion of individuals think of others as not fully, biological humans? We believe that this is not necessarily the case. Our findings refute a critical point whose confirmation would indicate that AoH scores and correlations with related concepts are largely artefacts. In this sense, we have provided evidence that AoH scores represent a certain psychological reality. However, the question remains as to what exactly this method measures. The first paper by Kteily and colleagues (2015) examined only convergent and predictive validity, and to the best of our knowledge, no published, peer-reviewed work since the method’s introduction has addressed measurement validity and reliability. Our work has significant limitations when examining the accuracy of the AoH scale as well. First, we used only a self-report questionnaire and did not control for or mitigate the social desirability of the responses. Secondly, other possible problems and important questions about the scale were not addressed, e.g. could it confuse perceptions of humanity with perceptions of ‘ape-ness’ or masculinity? (the pictures only depict human males, and being human is directly juxtaposed with being an ape). Another limitation of the conclusions of our study is the dependent variables used. To maintain comparability, we chose two methods (feeling thermometer and infrahumanisation) that have been widely used in conjunction with the AoH. However, these methods also have their limitations. The validity of the ‘feelings thermometer’ as a measure of prejudice is not a topic widely discussed in the literature - it is much more often used to validate other scales than in the context of testing its own validity. The infrahumanisation index on the other hand has been shown to have moderately low test-retest reliability (r = .46, Kteily et al. 2015, p. 910). This latter point may not be crucial in the context of our results, as we were more interested in infrahumanisation as a state than a trait, but it may limit the interpretation of the infrahumanisation score as a measure of entrenched attitudes towards outgroups. Summing up, the next important topic regarding Ascent of Humans scale is establishing whether it examines actual views of non-metaphorical, biological inferiority, or is it a well-calibrated, one-item measurement of extreme prejudice. In both cases, the method may be a valuable tool, but we believe that more research is needed to establish whether results can be interpreted at face value. One such crucial research could be testing the predictive, discriminant validity of the blatant dehumanisation construct. If this theoretical construct is substantially different from negative attitudes, it should be possible to name an outcome that is different for highly dehumanised outgroups than for extremely negatively perceived ones. Such a study, especially with pre-registered plans and predictions, could be an important input to the current discussion regarding dehumanisation. Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement Framework. (https://osf.io/c5k8q/) Submitted: September 29, 2021 PDT, Accepted: March 04, 2022 PDT Ethics Approval Statement Study was approved by the SWPS University of Social Sciences and Humanities ethics review board. Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CCBY-4.0). View this license’s legal deed at http://creativecommons.org/licenses/by/4.0 and legal code at http://creativecommons.org/licenses/by/4.0/legalcode for more information. Collabra: Psychology 14 Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement References Green, J. A. (2021). Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression. Health Psychology and Behavioral Medicine, 9(1), 436–455. https://doi.org/10.1080/21642850.202 1.1920416 Haddock, G., Zanna, M. P., & Esses, V. M. (1993). Assessing the structure of prejudicial attitudes: The case of attitudes toward homosexuals. Journal of Personality and Social Psychology, 65(6), 1105–1118. h ttps://doi.org/10.1037/0022-3514.65.6.1105 Haslam, N. (2006). Dehumanization: An Integrative Review. Personality and Social Psychology Review, 10(3), 252–264. https://doi.org/10.1207/s15327957ps pr1003_4 Johnson, E. J., & Goldstein, D. G. (2004). Defaults and Donation Decisions. Transplantation, 78(12), 1713–1716. https://doi.org/10.1097/01.tp.000014978 8.10382.b2 Kteily, N., & Bruneau, E. (2015, September 18). Americans see Muslims as less than human. No wonder Ahmed was arrested. Washington Post. http s://www.washingtonpost.com/posteverything/wp/201 5/09/18/americans-see-muslims-as-less-than-huma n-no-wonder-ahmed-was-arrested/ Kteily, N., & Bruneau, E. (2017). Darker demons of our nature: The need to (Re)focus attention on blatant forms of dehumanization. Current Directions in Psychological Science, 26(6), 487–494. https://doi.org/ 10.1177/0963721417708230 Kteily, N., Bruneau, E., Waytz, A., & Cotterill, S. (2015). The ascent of man: Theoretical and empirical evidence for blatant dehumanization. Journal of Personality and Social Psychology, 109(5), 901–931. htt ps://doi.org/10.1037/pspp0000048 Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian cognitive modeling: A practical course. Cambridge University Press. https://doi.org/10.1017/cbo9781139 087759 Leyens, J.-P., Cortes, B., Demoulin, S., Dovidio, J. F., Fiske, S. T., Gaunt, R., Paladino, M.-P., RodriguezPerez, A., Rodriguez-Torres, R., & Vaes, J. (2003). Emotional prejudice, essentialism, and nationalism: The 2002 Tajfel Lecture. European Journal of Social Psychology, 33(6), 703–717. https://doi.org/10.1002/ej sp.170 Leyens, J.-P., Demoulin, S., Vaes, J., Gaunt, R., & Paladino, M. P. (2007). Infra-humanization: The Wall of Group Differences. Social Issues and Policy Review, 1(1), 139–172. https://doi.org/10.1111/j.1751-2409.20 07.00006.x Leyens, J.-P., Paladino, P. M., Rodriguez-Torres, R., Vaes, J., Demoulin, S., Rodriguez-Perez, A., & Gaunt, R. (2000). The Emotional Side of Prejudice: The Attribution of Secondary Emotions to Ingroups and Outgroups. Personality and Social Psychology Review, 4(2), 186–197. https://doi.org/10.1207/s15327957pspr 0402_06 Collabra: Psychology 15 Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 Bastian, B., Laham, S. M., Wilson, S., Haslam, N., & Koval, P. (2011). Blaming, praising, and protecting our humanity: The implications of everyday dehumanization for judgments of moral status. British Journal of Social Psychology, 50(3), 469–483. htt ps://doi.org/10.1348/014466610x521383 Bilewicz, M., Mikołajczak, M., Kumagai, T., & Castano, E. (2010). Which emotions are uniquely human? Understanding of emotion words across three cultures. In B. Bokus (Ed.), Studies in the Psychology of Language and Communication (pp. 275–285). Matrix. Bishop, G. F., Oldendick, R. W., Tuchfarber, A. J., & Bennett, S. E. (1980). Pseudo-Opinions on Public Affairs. Public Opinion Quarterly, 44(2), 198–209. http s://doi.org/10.1086/268584 Bruneau, E., & Kteily, N. (2017). The enemy as animal: Symmetric dehumanization during asymmetric warfare. PLOS ONE, 12(7), e0181422. https://doi.org/1 0.1371/journal.pone.0181422 Bruneau, E., Kteily, N., & Laustsen, L. (2018). The unique effects of blatant dehumanization on attitudes and behavior towards Muslim refugees during the European ‘refugee crisis’ across four countries. European Journal of Social Psychology, 48(5), 645–662. https://doi.org/10.1002/ejsp.2357 Castano, E., & Kofta, M. (2009). Dehumanization: Humanity and its Denial. Group Processes & Intergroup Relations, 12(6), 695–697. https://doi.org/1 0.1177/1368430209350265 Crandall, C. S., Miller, J. M., & White, M. H., II. (2018). Changing Norms Following the 2016 U.S. Presidential Election: The Trump Effect on Prejudice. Social Psychological and Personality Science, 9(2), 186–192. h ttps://doi.org/10.1177/1948550617750735 DeCoster, J., & Claypool, H. M. (2004). A Meta-Analysis of Priming Effects on Impression Formation Supporting a General Model of Informational Biases. Personality and Social Psychology Review, 8(1), 2–27. ht tps://doi.org/10.1207/s15327957pspr0801_1 Demoulin, S., Leyens, J., Paladino, M., Rodriguez‐Torres, R., Rodriguez‐Perez, A., & Dovidio, J. (2004). Dimensions of “uniquely” and “non‐uniquely” human emotions. Cognition & Emotion, 18(1), 71–96. https://doi.org/10.1080/02699 930244000444 Esses, V. M., Veenvliet, S., Hodson, G., & Mihic, L. (2008). Justice, Morality, and the Dehumanization of Refugees. Social Justice Research, 21(1), 4–25. http s://doi.org/10.1007/s11211-007-0058-4 Furnham, A., & Boo, H. C. (2011). A literature review of the anchoring effect. The Journal of Socio-Economics, 40(1), 35–42. https://doi.org/10.1016/j.socec.2010.1 0.008 Giner-Sorolla, R., Burgmer, P., & Demir, N. (2021). Commentary on Over (2021): Well-Taken Points About Dehumanization, but Exaggeration of Challenges. Perspectives on Psychological Science, 16(1), 24–27. https://doi.org/10.1177/1745691620953 788 Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement Strack, F., Bahník, Š., & Mussweiler, T. (2016). Anchoring: Accessibility as a cause of judgmental assimilation. Current Opinion in Psychology, 12, 67–70. https://doi.org/10.1016/j.copsyc.2016.06.005 Sturgis, P., & Smith, P. (2010). Fictitious Issues Revisited: Political Interest, Knowledge and the Generation of Nonattitudes. Political Studies, 58(1), 66–84. https://doi.org/10.1111/j.1467-9248.2008.0077 3.x Tileagă, C. (2007). Ideologies of moral exclusion: A critical discursive reframing of depersonalization, delegitimization and dehumanization. British Journal of Social Psychology, 46(4), 717–737. https://doi.org/1 0.1348/014466607x186894 Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131. https://doi.org/10.1126/science.185.415 7.1124 van Doorn, J., van den Bergh, D., Bohm, U., Dablander, F., Derks, K., Draws, T., Etz, A., Evans, N. J., Gronau, Q. F., Haaf, J. M., Hinne, M., Kucharský, Š., Ly, A., Marsman, M., Matzke, D., Raj, A., Sarafoglou, A., Stefan, A., Voelkel, J. G., & Wagenmakers, E.-J. (2019). The JASP Guidelines for Conducting and Reporting a Bayesian Analysis. Preprint. https://doi.or g/10.31234/osf.io/yqxfr van ’t Veer, A. E., & Giner-Sorolla, R. (2016). Preregistration in social psychology—A discussion and suggested template. Journal of Experimental Social Psychology, 67, 2–12. https://doi.org/10.1016/j.jesp.20 16.03.004 Zeelenberg, R., Pecher, D., & Raaijmakers, J. G. W. (2003). Associative repetition priming: A selective review and theoretical implications. In J. S. Marsolek & J. Chad (Eds.), Rethinking implicit memory (pp. 261–283). https://dare.uva.nl Collabra: Psychology 16 Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 Novick, M. R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3(1), 1–18. https://doi.org/10.1016/0022-2 496(66)90002-2 Omyła-Rudzka, M. (2019). Stosunek do innych narodów. Komunikat z badań, [Attitude towards other nations. Research report.] (No. 17/2019). Centrum Badania Opinii Społecznej. https://cbos.pl/SPISKOM.POL/201 9/K_017_19.PDF Over, H. (2021). Seven Challenges for the Dehumanization Hypothesis. Perspectives on Psychological Science, 16(1), 3–13. https://doi.org/10.1 177/1745691620902133 Reitsma-van Rooijen, M., & L. Daamen, D. D. (2006). Subliminal anchoring: The effects of subliminally presented numbers on probability estimates. Journal of Experimental Social Psychology, 42(3), 380–387. http s://doi.org/10.1016/j.jesp.2005.05.001 Rosenthal, R. (1963). On the social psychology of the psychological experiment: The experimenter’s hypothesis as unintended determinant of experimental results. American Scientist, 51(2), 268–283. http://www.jstor.org/stable/27838693 Schönbrodt, F. D., & Stefan, A. M. (2018). BFDA: An R package for Bayes factor design analysis (version 0.3). ht tps://github.com/nicebread/BFDA Schönbrodt, F. D., & Wagenmakers, E.-J. (2017). Bayes Factor Design Analysis: Planning for compelling evidence. Psychonomic Bulletin & Review, 25(1), 128–142. https://doi.org/10.3758/s13423-017-1230-y Sigelman, L., & Thomas, D. (1984). Opinion Leadership & the Crystallization of Nonattitudes: Some Experimental Results. Polity, 16(3), 484–493. https://d oi.org/10.2307/3234561 Stefaniak, A., Malinowska, K., & Witkowska, M. (2017). Kontakt międzygrupowy i dystans społeczny w Polskim Sondażu Uprzedzeń [Intergroup contact and social distance in Polish Prejudice Survey], 3, 25. http://cbu.ps ychologia.pl Ascent of Humans: Investigating Methodological and Ethical Concerns About the Measurement Supplementary Materials S1. The List of Considered Works Using Ascent of Humans Download: https://collabra.scholasticahq.com/article/33297-ascent-of-humans-investigating-methodological-andethical-concerns-about-the-measurement/attachment/84708.docx?auth_token=CxKZL3LLQoufT42MuhaD S2. The Illustration of the Bogus Scale (Evolution of Mobile Phones) Download: https://collabra.scholasticahq.com/article/33297-ascent-of-humans-investigating-methodological-andethical-concerns-about-the-measurement/attachment/85022.jpg?auth_token=CxKZL3LLQoufT42MuhaD Downloaded from http://online.ucpress.edu/collabra/article-pdf/8/1/33297/498131/collabra_2022_8_1_33297.pdf by guest on 23 March 2022 Peer Review History Download: https://collabra.scholasticahq.com/article/33297-ascent-of-humans-investigating-methodological-andethical-concerns-about-the-measurement/attachment/85023.docx?auth_token=CxKZL3LLQoufT42MuhaD Collabra: Psychology 17