Audiovisual Correlates of Interrogativity: A Comparative Analysis of Catalan and Dutch

We investigate the extent to which French polar questions and continuation statements, two types of utterances with similar morphosyntactic and intonational forms but different pragmatic functions, can be distinguished in conversational data based on phonetic and visual bodily information. We show that the two utter- ance types can be distinguished well over chance level by automatic classification models including several phonetic and visual cues. We also show that a consid- erable amount of relevant phonetic and visual information is present before the last portion of the utterances, potentially assisting early speech act recognition by addressees. These findings indicate that bottom-up phonetic and visual cues may play an important role during the production and recognition of speech acts alongside top-down contextual information.

It is well known that speakers rely on prosodic and gestural features at the time of producing and understanding verbal irony. Yet little research has examined (a) how gestures manifest themselves in spontaneous speech, both during and after ironic utterances; and (b) how the presence of the so-called ‘gestural codas’ (audiovisual cues produced after the ironic utterance) influences irony detection. In Experiment 1, spontaneously produced verbal irony utterances generated between pairs of friends in conversational dyads were analyzed for semantic, prosodic and visual contrasts. Results show that ironic utterances contrast with immediately preceding non-ironic utterances, both in terms of prosody and gesture. Experiment 2 tested the contribution of the presence vs. absence of such ‘gestural codas’ to the perception of verbal irony. An irony rating task was conducted in which participants were audiovisually presented with a set of ambiguous discourse contexts followed by a set of matching ironic and non-ironic utterances presented in two conditions, namely without coda and with coda. Results show that subjects detected the speaker’s ironic intent significantly better when post-utterance codas were present (88%) than when they were not (56%), thus confirming the hypothesis that visual information produced after ironic sentences is a key factor in the identification of the speaker’s ironic intent.

This paper presents the results of three perceptual experiments investigating the role of auditory and visual channels for the identification of statements and echo questions in Brazilian Portuguese. Ten Brazilian speakers (five male) were video-recorded (frontal view of the face) while they produced a sentence ("Como você sabe"), either as a statement (meaning "As you know.") or as an echo question (meaning "As you know?"). Experiments were set up with the two different intonation contours. Stimuli were presented in conditions with clear and degraded audio as well as congruent and incongruent information from both channels. Results show that Brazilian listeners were able to distinguish statements and questions prosodically and visually, with auditory performances being dominant over visual ones. In noisy conditions, the visual channel improved the interpretation of prosodic cues robustly, while it degraded them in conditions where the visual information is incongruent with the auditory information. This study confirms the previous findings on auditory and visual integration within speech perception, also when applied to prosodic patterns.

J Nonverbal Behav DOI 10.1007/s10919-013-0162-0 ORIGINAL PAPER Audiovisual Correlates of Interrogativity: A Comparative Analysis of Catalan and Dutch Joan Borràs-Comes • Constantijn Kaland • Pilar Prieto Marc Swerts • Springer Science+Business Media New York 2013 Abstract Languages employ different strategies to mark an utterance as a polar (yes–no) question, including syntax, intonation and gestures. This study analyzes the production and perception of information-seeking questions and broad focus statements in Dutch and Catalan. These languages use intonation for marking questionhood, but Dutch also exploits syntactic variation for this purpose. A production task revealed the expected languagespecific auditory differences, but also showed that gaze and eyebrow-raising are used in this distinction. A follow-up perception experiment revealed that perceivers relied greatly on auditory information in determining whether an utterance is a question or a statement, but accuracy was further enhanced when visual information was added. Finally, the study demonstrates that the concentration of several response-mobilizing cues in a sentence is positively correlated with the perceivers’ ratings of these utterances as interrogatives. Keywords Catalan Information-seeking questions Speech perception Gaze Dutch J. Borràs-Comes (&) P. Prieto Departament de Traducció i Ciències del Llenguatge, Campus de la Comunicació, Poblenou, Universitat Pompeu Fabra, C/Roc Boronat, 138, 08018 Barcelona, Spain e-mail: joan.borras@upf.edu P. Prieto e-mail: pilar.prieto@upf.edu C. Kaland M. Swerts Tilburg University, Tilburg, The Netherlands e-mail: c.c.l.kaland@uvt.nl M. Swerts e-mail: m.g.j.swerts@uvt.nl P. Prieto Institució Catalana de Recerca i Estudis Avançats (ICREA), Universitat Pompeu Fabra, Barcelona, Spain 123 J Nonverbal Behav Introduction The world’s languages have different grammatical means to mark an utterance as a polar (yes–no) question (e.g., Are you hungry? Does the shop open on Saturday?), including the use of different lexical items or morphemes, changes in the syntactic structure, or prosodic and gestural marking. While declaratives are considered to be the unmarked sentence type, primarily used to convey information with no special illocutionary force markers (Levinson 2010, p. 2742), questions are primarily used to seek information. Cross-linguistically, morphosyntactic features have been shown to constitute a common way to identify polar questions. Among these strategies, we find the presence of question particles (est-ce que in French, ne in Latin), a specific interrogative word order (as in most Germanic languages), or a combination of such strategies. As Dryer (2008) states, most languages using these morphosyntactic strategies also employ a distinct intonation pattern, though some do not (e.g., Imbabura Quechua, spoken in Ecuador). Prosody is also a very common resource to signal polar questions across languages. It can be used to assign question status to a declarative-formatted sentence (Stivers and Rossano 2010, for Italian), even in those languages that use morphosyntactic strategies (e.g., declarative questions; Englert 2010, for Dutch). Bolinger (1989) argued that the presence of high pitch in questions may even be considered a linguistic universal (i.e., the fact that the average pitch in questions tends to be higher than the average pitch in nonquestions). Moreover, Cruttenden (1981) suggested that the universal dichotomy between falling and rising tunes may be associated with the abstract notions of closed (for falls) vs. open status (for rises).1 However, some recent descriptive studies like Englert’s (2010) have pointed out that this prosodic feature is not exclusively tied to interrogativity, but is also a common device for signaling continuation in statements or, at the level of discourse, both turn-giving and turn-keeping. In contrast with Bolinger’s claim mentioned above, Rialland’s (2007) analysis of 78 Central African languages showed that question prosodies without any high-pitched correlates are widespread and include falling intonations or low tones, lengthening, breathy termination, and open vowels. Though the analysis of morphosyntactic and prosodic markers of polar questions has received considerable attention in the linguistics literature, less is known about the relevance of nonverbal cues. Nonetheless, various studies in the last three decades have taken into account the potential importance of eye gaze and certain facial and manual gestures. In fact, backchannel signals like facial expressions, head movements, and gaze, seem to be critically linked to listeners’ attention, perception, and comprehension (Peters et al. 2005; Lysander and Horton 2012). Argyle and Cook (1976) argued that gaze serves three main purposes during face-to-face communication: seeking information, receiving signals that accompany the speech, and controlling the flow of the conversation. Cosnier’s (1991) study of French spontaneous speech revealed that the gestural traits that characterize information-seeking questions are those that normally accompany informative verbal expressions, namely, eye gaze to the interlocutor, head elevation, an optional suspended hand gesture facing the interlocutor, and a variety of facial expressions which are then frozen while the speaker awaits a response. Cosnier in fact argued that gaze is as important as intonation and pauses for question marking and turn-taking. As Vilhjalmsson pointed out (1997, pp. 21–22), since the primary function of the eyes is to gather sensory input, the most 1 As Snow and Balog (2002, p. 1031) suggest, closed would encompass ‘‘specific meanings like final, complete, or the intention of statements’’, whereas open may have ‘‘the grammatical meaning of nonfinal in utterance-internal contexts and the pragmatic meaning of a yes/no question in utterance-final contexts’’. 123 J Nonverbal Behav obvious function of gaze is perhaps information-seeking, since the speaker will at least look at the listener when feedback is expected. Eyebrow movements have also been associated with questioning, though the results appear to be somewhat inconclusive. For instance, Srinivasan and Massaro (2003) made use of ‘‘talking heads’’ (synthetic representations of a human face) in which they varied specific auditory and visual characteristics to investigate whether these could differentiate statements from declarative questions in English. They found that both eyebrow raising and head tilting could increase the perceivers’ detection of a question, though participants tended to rely more on auditory cues. However, Flecha-Garcı́a’s (2010) analysis of English spontaneous speech materials found that speakers do not use eyebrow raises in questions more often than in other types of utterances. Yet, incidentally, she also suggested that eyebrow raises may add a questioning meaning to any utterance—somewhat like adding a tag question (e.g., isn’t it?) at the end—even if the utterance does not express a question or request, whether verbally or prosodically (Flecha-Garcı́a 2010, p. 553). In line with this cross-linguistic variation, recent studies prefer to look at question marking as the set of features that contribute to response mobilization (Stivers and Rossano 2010, p. 29; Haan 2002). Stivers and Rossano (2010) found for both English and Italian that no single feature is present in all cases and thus conclude that no such feature appears to be intrinsic to the action of requesting information (p. 8). They state that if an assessment is accompanied by several response-mobilizing features, this increases the response relevance of the action (p. 28). From a cross-linguistic point of view, even though speakers of different languages rely on different question marking correlates, the same responsemobilizing resources—gaze, lexico-morphosyntax, prosody, as well as contextual epistemic asymmetry—seem to be available across languages, ethnicities, and cultures (Stivers and Rossano 2010, p. 29). In general, Rossano (2010) observed a trade-off relationship between mobilizing cues, and observed that Italian speakers tend to look more often at recipients when those utterances do not have a clear intonational marking. In addition, he found that speakers looked more at recipients during polar questions and alternative questions than during wh-questions, which can also be linked to the fact that the latter show a greater use of interrogative verbal cues than the other two types of questions (i.e., whwords). Moreover, Levinson (2010; see also Stivers 2010) has shown that pragmatic inference is a cross-linguistic cue for interrogativity detection and can even represent the main question marker in a language (Levinson 2010, for Yélı̂ Dnye). If the speaker makes a statement about anything of which the recipient has greater knowledge, this routinely attracts the recipient’s response (Labov and Fanshel 1977; Pomerantz 1980). To our knowledge, no controlled experimental studies have been undertaken to explore what role verbal and nonverbal cues play in the production and perception of questions and whether there exists a trade-off relationship between different mobilizing correlates. To date, the majority of descriptions have been based on the analysis of controlled or natural corpora, and some perception studies have assessed the audiovisual identification of ‘biased’ questions (i.e., those conveying, for instance, counter-expectation, incredulity, or surprise), most of them by means of synthetic materials (Borràs-Comes and Prieto 2011; Borràs-Comes et al. 2011; Crespo-Sendra et al. 2013; House 2002; Srinivasan and Massaro 2003). There are still a number of open questions that have not received a complete answer, such as: Can we differentiate an information-seeking polar question from a broad focus statement by means of visual information alone? How does visual information contribute to question identification when added to auditory information? Does the simultaneous use of several questioning cues increase the perceiver’s identification of an utterance as a 123 J Nonverbal Behav question? Do nonverbal cues have a major role in those languages in which intonation and syntactic cues do not play a defining role? The present study aims to compare interrogativity-marking strategies in Dutch and Catalan, two European languages that have been argued to rely on different resources for this distinction. One the one hand, Dutch polar interrogatives are characterized by subject/ verb inversion, without making use of an auxiliary verb as is the case in English (Dut. Heeft hij een snor?, lit. ‘Has he a moustache?’, ‘Does he have a moustache?’; Englert 2010, p. 2668). By contrast, grammatical subjects are generally not produced in Catalan interrogatives (Cat. Té bigoti?, lit. ‘Has moustache?’, ‘Does he have a moustache?’). In terms of prosody, speakers of Dutch appear to draw on the overall set of phonological devices of their language for question-marking, though certain configurations are more likely to occur in questions than in statements, as happens with rising intonational contours (Haan 2002, p. 214). By contrast, in order to convey information-seeking polar questions most Catalan dialects have been claimed to use a specific intonational contour which consists of a low pitch accent followed by a rising boundary tone (Prieto et al. 2013).2 Drawing on Rossano’s (2010) hypothesis, we expect that Catalan speakers will use a greater number of prosodic and gestural cues than Dutch speakers, since the latter use an additional syntactic strategy to mark questions (see also Geluykens 1988). The present study has two related goals. First, we aim to describe the combination of syntactic, prosodic, and gestural cues used by Dutch and Catalan speakers for the marking of broad focus statements and information-seeking polar questions. In order to collect a series of broad focus statements and information-seeking polar questions for our perception experiment, we conducted a production task using two variants of the Guess Who game. As Ahmad et al. (2011) point out, the dynamic nature of games make them a good tool for investigating human communication in different experimental setups, especially if the outcome of a game is controllable in a systematic manner. The second goal of the study is to test whether and how listeners of the two languages differentiate questions from statements, as well as to evaluate the relative importance of the different cues used in production and perception. A set of the stimuli obtained by means of the production task was therefore used as stimulus materials for a test in which participants had to guess whether an utterance was a statement or a question. Participants were presented with materials in three perceptual conditions: one in which only the auditory information was available, another one in which only the visual information was available, and a third one which presented simultaneously the full auditory and visual information of the actual recordings. This identification test allowed us to assess the relevance of the various features and their potential interaction effects. Method for Recording Stimuli Participants Eighteen Dutch speakers (11 male, 7 female) and sixteen Central Catalan speakers (1 male, 15 female) participated in the production task. Participants played the game in pairs, taking turns in adopting the different available roles. Participants only played the game with other 2 Catalan can also mark interrogativity through the expletive particle que (cf. est-ce que in French and é que in Portuguese), which is especially found for Central, Balearic, and North-western Catalan in confirmationseeking questions (i.e., not in information-seeking questions; Prieto and Rigau 2007). 123 J Nonverbal Behav native speakers of their own language. They showed clear signs of engagement in the task and some of them even wanted to play more rounds than those indicated by the experimenter. All subjects were undergraduates at either the Tilburg University, The Netherlands, or the Universitat Pompeu Fabra in Barcelona, Spain. All participants played both variants of the game. Procedure In order to elicit statements and questions in a natural manner, we used two digital variants of the ‘‘Guess Who’’ board game as created by Suleman Shahid, from Tilburg University, and some colleagues (see Ahmad et al. 2011). In this game, participants were presented with a board containing 24 colored drawings of human faces. These faces differed regarding various parameters, such as gender or color of their skin, hair, and eyes. Some faces were bald, some had beards or moustaches, and some were wearing hats, glasses, or earrings. In the traditional version of ‘‘Guess Who’’, the purpose of the game is to try to guess the opponent’s mystery person before he or she guesses yours.3 Given our need to elicit either statements or questions, we asked participants to play one of two different variations of the game. In the question-elicitation variation, participant A had to ask Participant B questions to try to determine the mystery person on B’s face card. Players took turns asking questions about the physical features of their respective ‘‘mystery persons’’ in an effort to eliminate the wrong candidates. Subjects were instructed to be truthful when answering questions about the mystery person. The winner was the player who guessed his/her mystery person first. In the statement-elicitation variation of the game, participants took turns making statements about their mystery person, while the other player listened and eliminated all characters that did not exhibit a particular feature. Again, it was the player who guessed the identity of their ‘‘mystery person’’ first that won.4 Note that both participants within a pair took turns in the course of both variations of the game and therefore both provided examples of questions and statements. Setup Participants sat in the same room, facing each other across a table and in front of two laptop computers arranged so that they could not see each other’s screen. Two camcorders were placed in such a way that they could record the upper part of each participant’s body. Before the start of each experiment, the camera was raised or lowered according to the participant’s height. Once the participants were seated, the experimenter gave spoken instructions, telling the participants about the game and procedure to be followed for each 3 This experimental setup provides a clear advantage over real situations. As Richardson et al. (2009) state, a question typically implies turn transition, and several studies have shown that gaze is related with turngiving (Argyle and Cook 1976; Duncan and Fiske 1977; Kendon 1967, 1990). Moreover, Englert (2010) has shown for Dutch that questioners rely overwhelmingly on speaker gaze (90 %) for next speaker selection. Thus, in order to describe the nonverbal patterns that characterize questions one has to focus on those cases in which gaze plays no addressee-selection role, and this is controlled in our study since participants are engaged in dyadic situations. 4 In order to increase the number of interactions and communication flow between participants—and to avoid continuation rises in the intonation patterns they produced—we added an additional rule to the game: at the end of each turn, players had to try to guess the mystery person’s name. This additional set of questions was not subjected to analysis. As one of the reviewers suggests, this set of guesses might be used in a follow-up study on the auditory and visual aspects of (un)certainty in questions. 123 J Nonverbal Behav variation. Each game lasted approximately 20 min, the time it took for both variants of the game to be played and won (4–6 times each). Method for General Perception Experiment Participants In the perception experiment, twenty Dutch listeners and twenty Catalan listeners (10 males, 30 females) rated the selection of 70 stimuli in their own language as being statements or questions. Though the stimuli were excerpts from recordings made during the first experiment, none of the participants in the first experiment took part in the second one. The stimuli were presented in three different conditions in a within-subjects design: Auditory-Only (AO), Visual-Only (VO), and AudioVisual (AV). In order to control for a possible learning effect, the AV condition was always the last to be presented to the participants, and the order of the two unimodal conditions was counterbalanced among subjects. Materials From the production recordings, 35 statements and 35 questions related to gender (e.g., It is a man vs. Is it a man?) were selected for each language in order to be included in the subsequent rating task. The selection was limited to this type of utterances in order to constrain their semantics, and the gender issue was selected because these utterances were found to be more frequent overall in the recordings. The final set of materials came from 17 Dutch speakers and 15 Central Catalan speakers. Whenever possible, we guaranteed that each speaker provided a similar number of statements and questions. Since some of the participants of the production task did not provide gender-related utterances, they had to be excluded when selecting the materials for the perception task. Perception Procedure The target 70 stimuli were presented to each group of same-language participants in a randomized order. Stimuli were presented to subjects using a desktop computer equipped with headphones. Subjects were instructed to pay attention to the stimuli and decide which interpretation was more likely for each stimulus by pressing the corresponding computer key for statement and question: ‘A’/‘P’ (afirmació, pregunta) for Catalan, and ‘S’/‘V’ (stelling, vraag) for Dutch. No feedback was given on the ‘‘correctness’’ of their responses. Participants could take as much time as they wanted to make a decision, but could not return to an earlier stimulus once they had made a decision on it. The experiment was set up by means of E-Prime version 2.0 (Psychology Software Tools Inc., 2009), which allowed us to record responses automatically. A new stimulus was presented only after a response to the previous one had been given. The experiment was set up in a quiet research room at either Tilburg University and or the Universitat Pompeu Fabra, respectively. It lasted approximately 17 min. The total number of responses obtained was 8,400 (70 stimuli 9 20 subjects 9 3 conditions 9 2 languages). 123 J Nonverbal Behav General Perception Results Figure 1 shows the mean correct identification rates of the perception experiment broken down by Language (Dutch, Catalan), Condition (AO, VO, AV), and Intention (statement, question). The results in the graph show that participants in both languages were able to identify the two categories above chance level in all three presentation conditions. However, materials that included auditory information (i.e., AO and AV) were consistently more reliable conveyors of question identification. A Generalized Linear Mixed Model (GLMM) analysis was run with the correct identification of the utterance category as the dependent variable, with language, condition, intention, and all the possible interactions as fixed factors and Subject and Item (Speaker) as random factors. Main effects for Language (F (1, 155) = 6.58, p = .01, r = .044) and Condition (F (2, 8388) = 417.40, p \ .001) were found, but not for Intention (F (1, 152) = 0.46, p = .50, r = .037). Two interactions were also found to be significant: Language 9 Condition (F (2, 8388) = 21.50, p \ .001) and Condition 9 Intention (F (2, 8388) = 33.48, p \ .001). Bonferroni post hoc tests were extracted in order to know the direction of the significant main effects and interactions. They show an effect of condition such that AV [ AO [ VO (all paired comparisons, p \ .001), that is, even though the difference between AO and AV conditions might not be clear in the graph, there is a significant difference between the two in the number of correct identifications. This pattern of results suggests that the visual channel provides additional cues that are useful to solve momentary ambiguities of the utterances coming from the presence of falling intonation or syntactic unspecification. Concerning the interaction Language 9 Condition, Dutch participants were more accurate than Catalan participants only when auditory information was available: AO (p = .002) Fig. 1 Mean correct identification rate as a function of Language, Condition, and Type of utterance. VO video only, AO audio only, and AV audio ? video 123 J Nonverbal Behav and AV (p \ .001), and not in VO (p = .53). Concerning the interaction Condition 9 Intention, statements were more accurately identified than questions in VO condition (p = .001), but questions were more accurately identified than statements in AV condition (p = .006), and no differences were found in the AO condition (p = .19). In sum, the perception results shown here reveal that participants could identify questions and statements above chance level in all conditions. Specifically, participants’ responses were better when auditory information was present, but a beneficial effect of visual cues was also shown when they were added to the auditory ones. In addition, Dutch participants’ perception of auditory materials was found to be better than that of Catalan participants, with less of a difference between language groups when they were presented with VO materials, which allows us to hypothesize that language differences were most pronounced when the auditory components of the experiment materials were involved. Importantly, our results show that when only visual information is present, statements are better identified than questions (though the converse pattern holds in the AV condition). These questions are further investigated in the next section, where we analyze the perceived materials in terms of their specific auditory and visual features. In the perception experiment, twenty Dutch listeners and twenty Catalan listeners (10 males, 30 females) rated the selection of 70 stimuli in their own language as being statements or questions. Though the stimuli were excerpts from recordings made during the first experiment, none of the participants in the first experiment took part in the second one. The stimuli were presented in three different conditions in a within-subjects design: Auditory-Only (AO), Visual-Only (VO), and AudioVisual (AV). In order to control for a possible learning effect, the AV condition was always the last to be presented to the participants, and the order of the two unimodal conditions was counterbalanced among subjects. Method for Auditory and Visual Cues Experiment Labeling Procedure With the aim of assessing the discrimination power of prosodic and gestural cues, the first two authors of the article—native speakers of Catalan and Dutch, respectively, but with some knowledge of each other’s language—independently coded the selected audiovisual materials (a total of 70 utterances) in terms of the following cues (based on Cosnier 1991): • • • • order of the sentence constituents (SV, VS, V) intonation (falling or rising boundary tone; i.e., L % versus H %)5 gaze to interlocutor (presence, absence) eyebrow raising movement (presence, absence) A featural analysis of the selected materials was performed in order to ensure that participants were faced with different kinds of syntactic forms, intonation patterns and the 5 Please note that the rising intonation category includes both Cat_ToBI continuation rises of the type M % and H %, as well as the HH % type, more typical of questions. These three types of rising boundary tones can be easily mistaken for one another when transcribing short and isolated sentences. 123 J Nonverbal Behav presence of gaze to interlocutor and eyebrow raising, which would allow us to assess whether those parameters played any role in subject judgments.6 The inter-transcriber agreement between the two labelers’ coding was quantified by means of the Cohen’s kappa coefficient (Cohen 1960), which gave an overall coefficient of .84, which means that the strength of the agreement was very good (Landis and Koch 1977). The coefficient was .86 for Dutch and .82 for Catalan. Concerning the different cues, it was .72 for the boundary contour, .91 for gaze, and .70 for eyebrow raising. Disagreements were resolved by consensus. Audiovisual Properties of the Selected Materials Table 1 presents the results of the presence of cues found in the database. Regarding Syntax, the subject was omitted in all Catalan sentences, which only displayed the verb and predicate (Cat. És una dona, lit. ‘Is a woman’, ‘It is a woman’). In turn, all Dutch statements presented a SV order (Dut. ‘t is een vrouw, ‘It is a woman’) and all Dutch questions presented a VS order (Dut. is ‘t een vrouw?, ‘Is it a woman?’). In terms of Intonation, the same pattern of results was attained for statements in the two languages, showing a great number of falling tones (mostly L* L % and some H* L %). Rising tones (L* H %) were found more often in Dutch questions than in Dutch statements (though Dutch questions exhibited a larger number of falling tones than rising tones; see Geluykens 1988). In turn, Catalan showed a clear majority of questions produced with a rising tone (L* H %, as in the case of Dutch). Concerning the two visual cues labeled (presence of gaze, eyebrow-raising), the two languages showed similar distributions of their uses in statements and questions. Crucially, the presence of gaze and eyebrow-raising were found to be more present in questions. Overall, Catalan speakers also seem to use more non-syntactic cues than Dutch speakers.7 Auditory and Visual Cues Perception Results Unimodal Perception of Auditory and Visual Features The lack of syntactic marking in Catalan (i.e., zero degrees of freedom) makes it impossible for us to compute the interactions in which Language and Syntax are implied (see Table 2 for means). In order to know the effect of both syntax and intonation within Dutch, a language-specific GLMM analysis of the AO task was performed, with identification as the dependent variable, Syntax, Contour, and their interaction as fixed effects, and Subject and Speaker as random factors. All factors were significant: Syntax (F (1, 107) = 331.192, p \ .001, r = .024), Contour (F (1, 32) = 16.989, p \ .001, r = .093), and their 6 As one of our reviewers suggests, it would be interesting to correlate the presence of listener gaze (and also mutual gaze between speaker and listener) with the presence of speaker eyebrow raisings and other gestures. Probably these phenomena could be better explored with a different experimental set up (e.g., without screens in between). 7 The parameter ‘‘eye gaze’’ refers to speaker’s gaze to interlocutor. One may argue that the use of a paradigm like the ‘Guess Who’ boardgame would elicit less mutual gaze than normal face-to-face conversations, because of the presence of the board. In fact, there is nothing in the set-up that forces participants to look at the board or at the other participant in neither the declarative nor the question conditions. They can freely choose when to look at the other. Interestingly, despite these conditions, we found differences in the presence of gaze to interlocutor from one intention to the other. 123 J Nonverbal Behav Table 1 Number of utterances containing the four labeled cues, for each sentence meaning, in Dutch and Catalan Dutch Statements Catalan Questions Statements Questions VS order 0 35 0 0 Rising intonation 4 13 4 33 Eye gaze 9 21 12 24 Eyebrow raising 5 9 6 16 Total number of utterances per cell = 35 Table 2 Proportion of sentences identified as questions by presence or absence of each audiovisual cue Dutch Absence Catalan Presence Absence VS order .14 (.34) .85 (.36) .48 (.50) Rising intonation .42 (.49) .70 (.46) .13 (.33) Presence – .80 (.40) Eye gaze .33 (.47) .71 (.45) .35 (.48) .60 (.49) Eyebrow raising .46 (.50) .63 (.48) .38 (.49) .70 (.48) interaction (F (1, 59) = 6.087, p = .02). Bonferroni paired contrasts crucially showed that the interaction Syntax 9 Contour was related to the fact that a rising contour caused more question identifications when applied to a SV structure (p \ .001), but not when applied to a VS structure (p = .18). As for the perception of intonation differences, a GLMM analysis was conducted on the results of the AO task, with identification as the dependent variable, Language, Contour, and their interaction as fixed effects, and Subject and Speaker as random factors. There were main effects for Language (F (1, 26) = 11.67, p = .002, r = .116), Contour (F (1, 2796) = 601.41, p \ .001, r = .136), and their interaction (F (1, 2796) = 79.25, p \ .001). The significant interaction is due to the fact that Catalan listeners rated more falling contours as statements than Dutch listeners (p \ .001), but this difference does not hold for rising contours (p = .33), suggesting that rising contours are perceived equally often as question-conveyors for both language groups. This is consistent with the patterns found in production. Another GLMM analysis was conducted on the results of the VO task, with identification as the dependent variable, and Subject and Speaker as random factors. The fixed effects were Language, Gaze, Eyebrow, and all their possible interactions. Main effects were found for Gaze (F (1, 2080) = 283.04, p \ .001, r = .068), Eyebrow (F (1, 2792) = 21.04, p = .004, r = .059) and Language (F (1, 37) = 8.88, p = .005, r = .026). Two interactions were also found to be significant: Gaze 9 Eyebrow (F (1, 2792) = 16.09, p \ .001), and the triple interaction Gaze 9 Eyebrow 9 Language (F (1, 2792) = 4.43, p = .04). The main effects of Gaze and Eyebrow are related to the patterns observed in production, i.e., that the presence of these cues increased ‘question’ responses. The main effect of Language suggests that Dutch participants gave overall more ‘question’ responses than Catalan participants. As for the Gaze 9 Eyebrow interaction, Eyebrow had a significant effect on ‘question’ identification when in the presence of gaze (p \ .001), but 123 J Nonverbal Behav not in its absence (p = .68). Regarding the triple interaction, a Language difference is found, such that Dutch participants provided more ‘question’ responses than Catalan participants when Gaze (p = .003) and Eyebrow (p = .006) appeared alone in the perceived materials, but not when these features co-appeared (p = .33) or were both absent (p = .06). In sum, the presence of specific syntactic marking in questions had an effect on the subjects’ decisions, as well as the presence of rising intonational contours. Moreover, an interaction between syntactic form and intonation was found such that rising contours triggered more question identifications when applied to utterances with an unmarked syntactic structure (i.e., SV sentences, not showing an explicit question structure). As for nonverbal parameters, we found that subjects rated significantly as questions those utterances in which the speaker looked to his/her interlocutor and produced eyebrow raising, with an interaction between both factors such that eyebrow raising had a significant effect on ‘question’ identification when in the presence of gaze. Auditory and Visual Features Combined A main question related to cue interaction is whether the presence of different cues related to questioning can significantly increase the detection of questions. To this end we created a new variable that contained the sum of the different cues to questioning found in both languages (i.e., VS syntax, rising intonation contour, presence of gaze, and eyebrow raising). A Pearson correlation (2-tailed) was conducted between the number of interrogative cues and the identification responses. The test identified a positive correlation of .736 in the case of Dutch and a correlation of .709 in the case of Catalan (in both cases, p \ .001), which means that there is a high correlation for both languages between the incremental presence of cues to questioning and the participants’ ‘question’ responses. General Discussion and Conclusions The first goal of the present study was to describe the syntactic, prosodic, and gestural strategies used by Dutch and Catalan speakers for marking information-seeking polar questions and broad focus statements. These two languages have been argued to mark interrogativity in two different ways. Whereas Dutch polar questions are characterized by the use of a syntactic verb fronting strategy and optional intonational marks (e.g., Hij heeft een baard vs. Heeft hij een baard?, lit. ‘He has a beard’ vs. ‘Has he a beard?’), the main strategy to mark polar questions in Catalan is through the use of specific intonational patterns on the same syntactic structure (e.g., Té barba vs. Té barba?, lit. ‘Has beard’ vs. ‘Has beard?’). On the one hand, the fact that Dutch indeed has a systematic syntactic strategy as described in the literature was confirmed by the results of our production task. As for prosody, both languages showed a great number of rising tones in questions, though Catalan (because of the lack of any lexico-morphosyntactic distinction) showed a stronger effect of intonation for interrogativity marking. Concerning gestures, both languages showed similar distributions of the use of gaze and eyebrow raisings, which were mainly found in questions. The second and main goal of this investigation was to test whether listeners of the two languages could differentiate questions from statements in the different presentation conditions (AO, VO, AV), as well as to evaluate the relevance of the different cues used in perception. The results of our perception experiment with 20 Dutch listeners and 20 123 J Nonverbal Behav Catalan listeners confirmed that participants can identify questions and statements above chance level in all conditions. Importantly, perceivers showed a great reliance on auditory information, but also showed that (a) visual-only utterances were classified above chance; and (b) better accuracy in responses was exhibited when visual information was added to auditory information. This result confirms the importance of nonverbal cues in speakers’ identification of pragmatic intentions but also suggests a higher importance of auditory cues in the perception of interrogativity. Focusing on the auditory-only perception, Dutch participants were found to be more accurate than Catalan participants, which can be linked to the fact that Dutch uses an unambiguous syntactic strategy. With respect to the perceptual importance of syntax and intonation in Dutch, an analysis of the Dutch listeners’ perception of AO information revealed that both factors were significant. Moreover, there was an interaction between the two, in the sense that rising contours led to more ‘question’ identification responses only when applied to an unmarked (SV) syntactic structure. This demonstrates that when both are available syntax has greater importance relative to intonation. When focusing on the visual-only perception, gaze played an especially strong role in ‘question’ identification responses in both languages. This is in line with Rossano’s (2010) production results for Italian, which showed that the occurrence of speaker gaze towards the recipient in dyadic interactions increases the likelihood of obtaining a response. As for eyebrow raising, a secondary role was found such that it powered ‘question’ responses only when in the presence of gaze. More crucially, in the AV presentation, we found a positive correlation between the concentration of mobilizing cues in a sentence and its rating as an interrogative utterance, for both languages. This result is especially relevant for the theory of response relevance put forward by Stivers and Rossano (2010) and is consistent with previous findings from analyses of end-of-turn cues (see Duncan and Fiske 1977). While suggesting four main response-mobilizing features—namely interrogative lexico-morphosyntax, interrogative prosody, recipient-directed speaker gaze, and recipient-tilted epistemic asymmetry—they argue that the inclusion of multiple response-mobilizing features leads to higher response relevance than the inclusion of fewer or no features. In their own words, ‘‘a request (or an offer or information request) is high in response relevance, but a request designed ‘directly’ (e.g., with interrogative morphosyntax and/or prosody) would be still higher. Similarly, an assessment (or a noticing or announcement) would be low in response relevance. However, if it were designed with multiple response-mobilizing features, this would increase the response relevance of the action’’ (Stivers and Rossano 2010, pp. 27–28). In our data, a higher concentration of lexico-morphosyntactic, prosodic, and gestural cues increases the chances that utterances will be perceived as questions. To our knowledge, the present study provides the first results of a controlled investigation on the perception of information-seeking polar questions compared with broad focus statements in two typologically different languages. First, we have found that auditory information has a greater effect in question identification (auditory cues [ visual cues). As for visual cues, we have empirically shown that both auditory and visual cues play a role in this distinction in both Catalan and Dutch. Specifically, the addition of nonverbal cues to auditory cues enhances the perception of information-seeking questions. Also, a visual-only presentation of the materials led to successful interrogativity detection. In terms of its perceptual relevance, a greater effect was found for gaze compared to eyebrow raising. This pattern of results suggests, at least when taking into account Dutch and Catalan data, a cue value scale for interrogativity marking such that Syntax [ Intonation [ Gaze [ Eyebrow. 123 J Nonverbal Behav In conclusion, this study shows how several verbal and nonverbal cues are systematically used in the production of interrogativity and how they crucially interact in its perception. Acknowledgments We thank Suleman Shahid for his help in setting up the recording sessions, and Igor Jauk for his help in labeling part of the Catalan corpus. We also thank the audiences of The 13th Conference on Laboratory Phonology and The 5th European Conference on Tone and Intonation. This research has been funded by two research grants awarded by the Spanish Ministerio de Ciencia e Innovación, namely HUM2006-13295-C02-01/FILO, FFI2009-07648/FILO and Consolider-Ingenio 2010 Program CSD200700012, by a grant awarded by the Generalitat de Catalunya to the Grup d’Estudis de Prosòdia (2009SGR701), and by a ‘‘Study abroad scholarship for research outside of Catalunya’’ 2010 BE1 00207, awarded by the Generalitat de Catalunya. References Ahmad, M. I., Tariq, H., Saeed, M., Shahid, S., & Krahmer, E. (2011). Guess who? An interactive and entertaining game-like platform for investigating human emotions. In Jacko, J. A. (Ed.), Humancomputer interaction. Towards mobile and intelligent interaction environments (Vol. 3, pp. 543–551). Lecture Notes in Computer Science, 6763. Berlin: Springer. Argyle, M., & Cook, M. (1976). Gaze and mutual gaze. Cambridge: Cambridge University Press. Bolinger, D. L. (1989). Intonation and its uses: Melody in grammar and discourse. London: Edward Arnold. Borràs-Comes, J., & Prieto, P. (2011). ‘Seeing tunes’. The role of visual gestures in tune interpretation. Journal of Laboratory Phonology, 2(2), 355–380. Borràs-Comes, J., Puglesi, C., & Prieto, P. (2011). Audiovisual competition in the perception of counterexpectational questions. In Salvi, G., Beskow, J., Engwall, O., & Al Moubayed, S. (Eds.), Proceedings of the 11th international conference on auditory-visual speech processing (pp. 43–46). Volterra, Italy: KTH Royal Institute of Technology. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. Cosnier, J. (1991). Les gestes de la question. In C. Kerbrat-Orecchioni (Ed.), la question (pp. 163–171). Lyon: Presses Universitaires de Lyon. Crespo-Sendra, V., Kaland, C., Swerts, M., & Prieto, P. (2013). Perceiving incredulity: The role of intonation and facial gestures. Journal of Pragmatics, 47, 1–13. Cruttenden, A. (1981). Falls and rises: Meanings and universals. Journal of Linguistics, 17(1), 77–91. Dryer, M. S. (2008). Polar questions. In Haspelmath, M., Dryer, M. S., Gil, D., & Comrie, B. (Eds.), The world atlas of language structures online (Chapter 116). Munich: Max Planck Digital Library. Retrieved from: http://wals.info/chapter/116. Duncan, S., & Fiske, D. W. (1977). Face-to-face interaction: Research, methods, and theory. New York: Wiley. Englert, C. (2010). Questions and responses in Dutch conversations. Journal of Pragmatics, 42(10), 2666–2684. Flecha-Garcı́a, M. L. (2010). Eyebrow raises in dialogue and their relation to discourse structure, utterance function and pitch accents in English. Speech Communication, 52, 542–554. Geluykens, R. (1988). On the myth of rising intonation in polar questions. Journal of Pragmatics, 12, 467–485. Haan, J. (2002). Speaking of questions. An exploration of Dutch question intonation. LOT Dissertation Series, 52. Utrecht: LOT. House, D. (2002). Perception of question intonation and facial gestures. Fonetik, 44(1), 41–44. Kendon, A. (1967). Some functions of gaze direction in social interaction. Acta Psychologica, 26, 22–63. Kendon, A. (1990). Conducting interaction: Patterns of behavior in focused encounters. New York: Cambridge University Press. Labov, W., & Fanshel, D. (1977). Therapeutic discourse. New York: Academic Press. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174. Levinson, S. C. (2010). Questions and responses in Yélı̂ Dnye, the Papuan language of Rossel Island. Journal of Pragmatics, 42(10), 2741–2755. 123 J Nonverbal Behav Lysander, K., & Horton, W. S. (2012). Conversational grounding in younger and older adults: The effect of partner visibility and referent abstractness in task-oriented dialogue. Discourse Processes, 49(1), 29–60. Peters, C., Pelachaud, C., Bevacqua, E., Mancini, M., & Poggi, I. (2005). A model of attention and interest using gaze behavior. In T. Panayiotopoulos, J. Gratch, R. Aylett, D. Ballin, P. Olivier, & T. Rist (Eds.), Intelligent virtual agents (Vol. 3661, pp. 229–240). Lecture Notes in Computer Science. London: Springer. Pomerantz, A. M. (1980). Telling my side: ‘‘Limited access’’ as a ‘‘fishing’’ device. Sociological Inquiry, 50, 186–198. Prieto, P., Borràs-Comes, J., Crespo-Sendra, V., Roseano, P., Sichel-Bazin, R., & Vanrell, M. M. (2013). Intonational phonology of Catalan and its dialectal varieties. In S. Frota & P. Prieto (Eds.), Intonational variation in Romance. Oxford: Oxford University Press. Prieto, P., & Rigau, G. (2007). The syntax-prosody interface: Catalan interrogative sentences headed by que. Journal of Portuguese Linguistics, 6(2), 29–59. Psychology Software Tools Inc. (2009). E-Prime (version 2.0). Computer Program. Rialland, A. (2007). Question prosody: An African perspective. In C. Gussenhoven & T. Riad (Eds.), Tones and tunes (Vol. 2, pp. 35–62). Berlin: Mouton. Richardson, D. C., Dale, R., & Tomlinson, J. M. (2009). Conversation, gaze coordination, and beliefs about visual context. Cognitive Science, 33(8), 1468–1482. Rossano, F. (2010). Questioning and responding in Italian. Journal of Pragmatics, 42(10), 2756–2771. Snow, D., & Balog, H. L. (2002). Do children produce the melody before the words? A review of developmental intonation research. Lingua, 112, 1025–1058. Srinivasan, R. J., & Massaro, D. W. (2003). Perceiving prosody from the face and voice. Distinguishing statements from echoic questions in English. Language and Speech, 46, 1–22. Stivers, T. (2010). An overview of the question-response system in American English conversation. Journal of Pragmatics, 42(10), 2772–2781. Stivers, T., & Rossano, F. (2010). Mobilizing response. Research on Language and Social Interaction, 43(1), 1–31. Vilhjalmsson, H. H. (1997). Autonomous communicative behaviors in avatars. Unpublished Master’s thesis. Massachusetts Institute of Technology, Cambridge, MA. 123

RELATED PAPERS

RELATED TOPICS

Log In

Audiovisual Correlates of Interrogativity: A Comparative Analysis of Catalan and Dutch

Audiovisual Correlates of Interrogativity: A Comparative Analysis of Catalan and Dutch

Related Papers

RELATED PAPERS

RELATED TOPICS