Nothing Special   »   [go: up one dir, main page]

Julia Hirschberg

Also published as: Julia B. Hirschberg


2024

pdf bib
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
Haozhe Chen | Run Chen | Julia Hirschberg
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

While recent advances in Text-to-Speech (TTS) technology produce natural and expressive speech, they lack the option for users to select emotion and control intensity. We propose EmoKnob, a framework that allows fine-grained emotion control in speech synthesis with few-shot demonstrative samples of arbitrary emotion. Our framework leverages the expressive speaker representation space made possible by recent advances in foundation voice cloning models. Based on the few-shot capability of our emotion control framework, we propose two methods to apply emotion control on emotions described by open-ended text, enabling an intuitive interface for controlling a diverse array of nuanced emotions. To facilitate a more systematic emotional speech synthesis field, we introduce a set of evaluation metrics designed to rigorously assess the faithfulness and recognizability of emotion control frameworks. Through objective and subjective evaluations, we show that our emotion control framework effectively embeds emotions into speech and surpasses emotion expressiveness of commercial TTS services.

pdf bib
Enhancing Pre-Trained Generative Language Models with Question Attended Span Extraction on Machine Reading Comprehension
Lin Ai | Zheng Hui | Zizhou Liu | Julia Hirschberg
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

pdf bib
Defending Against Social Engineering Attacks in the Age of LLMs
Lin Ai | Tharindu Sandaruwan Kumarage | Amrita Bhattacharjee | Zizhou Liu | Zheng Hui | Michael S. Davinroy | James Cook | Laura Cassani | Kirill Trapeznikov | Matthias Kirchner | Arslan Basharat | Anthony Hoogs | Joshua Garland | Huan Liu | Julia Hirschberg
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

pdf bib
EDEN: Empathetic Dialogues for English Learning
Siyan Li | Teresa Shao | Zhou Yu | Julia Hirschberg
Findings of the Association for Computational Linguistics: EMNLP 2024

Dialogue systems have been used as conversation partners in English learning, but few have studied whether these systems improve learning outcomes. Student passion and perseverance, or grit, has been associated with language learning success. Recent work establishes that as students perceive their English teachers to be more supportive, their grit improves. Hypothesizing that the same pattern applies to English-teaching chatbots, we create EDEN, a robust open-domain chatbot for spoken conversation practice that provides empathetic feedback. To construct EDEN, we first train a specialized spoken utterance grammar correction model and a high-quality social chit-chat conversation model. We then conduct a preliminary user study with a variety of strategies for empathetic feedback. Our experiment suggests that using adaptive empathetic feedback leads to higher *perceived affective support*. Furthermore, elements of perceived affective support positively correlate with student grit.

pdf bib
A Survey on Open Information Extraction from Rule-based Model to Large Language Model
Liu Pai | Wenyang Gao | Wenjie Dong | Lin Ai | Ziwei Gong | Songfang Huang | Li Zongsheng | Ehsan Hoque | Julia Hirschberg | Yue Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024

Open Information Extraction (OpenIE) represents a crucial NLP task aimed at deriving structured information from unstructured text, unrestricted by relation type or domain. This survey paper provides an overview of OpenIE technologies spanning from 2007 to 2024, emphasizing a chronological perspective absent in prior surveys. It examines the evolution of task settings in OpenIE to align with the advances in recent technologies. The paper categorizes OpenIE approaches into rule-based, neural, and pre-trained large language models, discussing each within a chronological framework. Additionally, it highlights prevalent datasets and evaluation metrics currently in use. Building on this extensive review, this paper systematically reviews the evolution of task settings, data, evaluation metrics, and methodologies in the era of large language models, highlighting their mutual influence, comparing their capabilities, and examining their implications for open challenges and future research directions.

pdf bib
From Pixels to Personas: Investigating and Modeling Self-Anthropomorphism in Human-Robot Dialogues
Yu Li | Devamanyu Hazarika | Di Jin | Julia Hirschberg | Yang Liu
Findings of the Association for Computational Linguistics: EMNLP 2024

Self-anthropomorphism in robots manifests itself through their display of human-like characteristics in dialogue, such as expressing preferences and emotions. Our study systematically analyzes self-anthropomorphic expression within various dialogue datasets, outlining the contrasts between self-anthropomorphic and non-self-anthropomorphic responses in dialogue systems. We show significant differences in these two types of responses and propose transitioning from one type to the other. We also introduce Pix2Persona, a novel dataset aimed at developing ethical and engaging AI systems in various embodiments. This dataset preserves the original dialogues from existing corpora and enhances them with paired responses: self-anthropomorphic and non-self-anthropomorphic for each original bot response. Our work not only uncovers a new category of bot responses that were previously under-explored but also lays the groundwork for future studies about dynamically adjusting self-anthropomorphism levels in AI systems to align with ethical standards and user expectations.

pdf bib
A Mapping on Current Classifying Categories of Emotions Used in Multimodal Models for Emotion Recognition
Ziwei Gong | Muyin Yao | Xinyi Hu | Xiaoning Zhu | Julia Hirschberg
Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII)

In Emotion Detection within Natural Language Processing and related multimodal research, the growth of datasets and models has led to a challenge: disparities in emotion classification methods. The lack of commonly agreed upon conventions on the classification of emotions creates boundaries for model comparisons and dataset adaptation. In this paper, we compare the current classification methods in recent models and datasets and propose a valid method to combine different emotion categories. Our proposal arises from experiments across models, psychological theories, and human evaluations, and we examined the effect of proposed mapping on models.

pdf bib
Measuring Entrainment in Spontaneous Code-switched Speech
Debasmita Bhattacharya | Siying Ding | Alayna Nguyen | Julia Hirschberg
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

It is well-known that speakers who entrain to one another have more successful conversations than those who do not. Previous research has shown that interlocutors entrain on linguistic features in both written and spoken monolingual domains. More recent work on code-switched communication has also shown preliminary evidence of entrainment on certain aspects of code-switching (CSW). However, such studies of entrainment in code-switched domains have been extremely few and restricted to human-machine textual interactions. Our work studies code-switched spontaneous speech between humans, finding that (1) patterns of written and spoken entrainment in monolingual settings largely generalize to code-switched settings, and (2) some patterns of entrainment on code-switching in dialogue agent-generated text generalize to spontaneous code-switched speech. Our findings give rise to important implications for the potentially “universal” nature of entrainment as a communication phenomenon, and potential applications in inclusive and interactive speech technology.

pdf bib
Multimodal Multi-loss Fusion Network for Sentiment Analysis
Zehui Wu | Ziwei Gong | Jaywon Koo | Julia Hirschberg
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

This paper investigates the optimal selection and fusion of feature encoders across multiple modalities and combines these in one neural network to improve sentiment detection. We compare different fusion methods and examine the impact of multi-loss training within the multi-modality fusion network, identifying surprisingly important findings relating to subnet performance. We have also found that integrating context significantly enhances model performance. Our best model achieves state-of-the-art performance for three datasets (CMU-MOSI, CMU-MOSEI and CH-SIMS). These results suggest a roadmap toward an optimized feature selection and fusion approach for enhancing sentiment detection in neural networks.

pdf bib
Exploring Robustness in Doctor-Patient Conversation Summarization: An Analysis of Out-of-Domain SOAP Notes
Yu-Wen Chen | Julia Hirschberg
Proceedings of the 6th Clinical Natural Language Processing Workshop

Summarizing medical conversations poses unique challenges due to the specialized domain and the difficulty of collecting in-domain training data. In this study, we investigate the performance of state-of-the-art doctor-patient conversation generative summarization models on the out-of-domain data. We divide the summarization model of doctor-patient conversation into two configurations: (1) a general model, without specifying subjective (S), objective (O), and assessment (A) and plan (P) notes; (2) a SOAP-oriented model that generates a summary with SOAP sections. We analyzed the limitations and strengths of the fine-tuning language model-based methods and GPTs on both configurations. We also conducted a Linguistic Inquiry and Word Count analysis to compare the SOAP notes from different datasets. The results exhibit a strong correlation for reference notes across different datasets, indicating that format mismatch (i.e., discrepancies in word distribution) is not the main cause of performance decline on out-of-domain data. Lastly, a detailed analysis of SOAP notes is included to provide insights into missing information and hallucinations introduced by the models.

pdf bib
Using Adaptive Empathetic Responses for Teaching English
Li Siyan | Teresa Shao | Julia Hirschberg | Zhou Yu
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

Existing English-teaching chatbots rarely incorporate empathy explicitly in their feedback, but empathetic feedback could help keep students engaged and reduce learner anxiety. Toward this end, we propose the task of negative emotion detection via audio, for recognizing empathetic feedback opportunities in language learning. We then build the first spoken English-teaching chatbot with adaptive, empathetic feedback. This feedback is synthesized through automatic prompt optimization of ChatGPT and is evaluated with English learners. We demonstrate the effectiveness of our system through a preliminary user study.

2023

pdf bib
DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines
Prakhar Gupta | Yang Liu | Di Jin | Behnam Hedayatnia | Spandana Gella | Sijia Liu | Patrick Lange | Julia Hirschberg | Dilek Hakkani-Tur
Findings of the Association for Computational Linguistics: EMNLP 2023

Dialogue models are able to generate coherent and fluent responses, but they can still be challenging to control and may produce non-engaging, unsafe results. This unpredictability diminishes user trust and can hinder the use of the models in the real world. To address this, we introduce DialGuide, a novel framework for controlling dialogue model behavior using natural language rules, or guidelines. These guidelines provide information about the context they are applicable to and what should be included in the response, allowing the models to generate responses that are more closely aligned with the developer’s expectations and intent. We evaluate DialGuide on three tasks in open-domain dialogue response generation: guideline selection, response generation, and response entailment verification. Our dataset contains 10,737 positive and 15,467 negative dialogue context-response-guideline triplets across two domains - chit-chat and safety. We provide baseline models for the tasks and benchmark their performance. We also demonstrate that DialGuide is effective in the dialogue safety domain, producing safe and engaging responses that follow developer guidelines.

2022

pdf bib
Proceedings of the Massively Multilingual Natural Language Understanding Workshop (MMNLU-22)
Jack FitzGerald | Kay Rottmann | Julia Hirschberg | Mohit Bansal | Anna Rumshisky | Charith Peris | Christopher Hench
Proceedings of the Massively Multilingual Natural Language Understanding Workshop (MMNLU-22)

2021

pdf bib
“Talk to me with left, right, and angles”: Lexical entrainment in spoken Hebrew dialogue
Andreas Weise | Vered Silber-Varod | Anat Lerner | Julia Hirschberg | Rivka Levitan
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

It has been well-documented for several languages that human interlocutors tend to adapt their linguistic productions to become more similar to each other. This behavior, known as entrainment, affects lexical choice as well, both with regard to specific words, such as referring expressions, and overall style. We offer what we believe to be the first investigation of such lexical entrainment in Hebrew. Using two existing measures, we analyze Hebrew speakers interacting in a Map Task, a popular experimental setup, and find rich evidence of lexical entrainment. Analyzing speaker pairs by the combination of their genders as well as speakers by their individual gender, we find no clear pattern of differences. We do, however, find that speakers in a position of less power entrain more than those with greater power, which matches theoretical accounts. Overall, our results mostly accord with those for American English, with a lack of entrainment on hedge words being the main difference.

pdf bib
CHoRaL: Collecting Humor Reaction Labels from Millions of Social Media Users
Zixiaofan Yang | Shayan Hooshmand | Julia Hirschberg
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Humor detection has gained attention in recent years due to the desire to understand user-generated content with figurative language. However, substantial individual and cultural differences in humor perception make it very difficult to collect a large-scale humor dataset with reliable humor labels. We propose CHoRaL, a framework to generate perceived humor labels on Facebook posts, using the naturally available user reactions to these posts with no manual annotation needed. CHoRaL provides both binary labels and continuous scores of humor and non-humor. We present the largest dataset to date with labeled humor on 785K posts related to COVID-19. Additionally, we analyze the expression of COVID-related humor in social media by extracting lexico-semantic and affective features from the posts, and build humor detection models with performance similar to humans. CHoRaL enables the development of large-scale humor detection models on any topic and opens a new path to the study of humor on social media.

pdf bib
Automatic Detection and Prediction of Psychiatric Hospitalizations From Social Media Posts
Zhengping Jiang | Jonathan Zomick | Sarah Ita Levitan | Mark Serper | Julia Hirschberg
Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access

We address the problem of predicting psychiatric hospitalizations using linguistic features drawn from social media posts. We formulate this novel task and develop an approach to automatically extract time spans of self-reported psychiatric hospitalizations. Using this dataset, we build predictive models of psychiatric hospitalization, comparing feature sets, user vs. post classification, and comparing model performance using a varying time window of posts. Our best model achieves an F1 of .718 using 7 days of posts. Our results suggest that this is a useful framework for collecting hospitalization data, and that social media data can be leveraged to predict acute psychiatric crises before they occur, potentially saving lives and improving outcomes for individuals with mental illness.

2020

pdf bib
Detection of Mental Health from Reddit via Deep Contextualized Representations
Zhengping Jiang | Sarah Ita Levitan | Jonathan Zomick | Julia Hirschberg
Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis

We address the problem of automatic detection of psychiatric disorders from the linguistic content of social media posts. We build a large scale dataset of Reddit posts from users with eight disorders and a control user group. We extract and analyze linguistic characteristics of posts and identify differences between diagnostic groups. We build strong classification models based on deep contextualized word representations and show that they outperform previously applied statistical models with simple linguistic features by large margins. We compare user-level and post-level classification performance, as well as an ensembled multiclass model.

pdf bib
Acoustic-Prosodic and Lexical Cues to Deception and Trust: Deciphering How People Detect Lies
Xi (Leslie) Chen | Sarah Ita Levitan | Michelle Levine | Marko Mandic | Julia Hirschberg
Transactions of the Association for Computational Linguistics, Volume 8

Humans rarely perform better than chance at lie detection. To better understand human perception of deception, we created a game framework, LieCatcher, to collect ratings of perceived deception using a large corpus of deceptive and truthful interviews. We analyzed the acoustic-prosodic and linguistic characteristics of language trusted and mistrusted by raters and compared these to characteristics of actual truthful and deceptive language to understand how perception aligns with reality. With this data we built classifiers to automatically distinguish trusted from mistrusted speech, achieving an F1 of 66.1%. We next evaluated whether the strategies raters said they used to discriminate between truthful and deceptive responses were in fact useful. Our results show that, although several prosodic and lexical features were consistently perceived as trustworthy, they were not reliable cues. Also, the strategies that judges reported using in deception detection were not helpful for the task. Our work sheds light on the nature of trusted language and provides insight into the challenging problem of human deception detection.

pdf bib
A Novel Methodology for Developing Automatic Harassment Classifiers for Twitter
Ishaan Arora | Julia Guo | Sarah Ita Levitan | Susan McGregor | Julia Hirschberg
Proceedings of the Fourth Workshop on Online Abuse and Harms

Most efforts at identifying abusive speech online rely on public corpora that have been scraped from websites using keyword-based queries or released by site or platform owners for research purposes. These are typically labeled by crowd-sourced annotators – not the targets of the abuse themselves. While this method of data collection supports fast development of machine learning classifiers, the models built on them often fail in the context of real-world harassment and abuse, which contain nuances less easily identified by non-targets. Here, we present a mixed-methods approach to create classifiers for abuse and harassment which leverages direct engagement with the target group in order to achieve high quality and ecological validity of data sets and labels, and to generate deeper insights into the key tactics of bad actors. We use women journalists’ experience on Twitter as an initial community of focus. We identify several structural mechanisms of abuse that we believe will generalize to other target communities.

2019

pdf bib
SpatialNet: A Declarative Resource for Spatial Relations
Morgan Ulinski | Bob Coyne | Julia Hirschberg
Proceedings of the Combined Workshop on Spatial Language Understanding (SpLU) and Grounded Communication for Robotics (RoboNLP)

This paper introduces SpatialNet, a novel resource which links linguistic expressions to actual spatial configurations. SpatialNet is based on FrameNet (Ruppenhofer et al., 2016) and VigNet (Coyne et al., 2011), two resources which use frame semantics to encode lexical meaning. SpatialNet uses a deep semantic representation of spatial relations to provide a formal description of how a language expresses spatial information. This formal representation of the lexical semantics of spatial language also provides a consistent way to represent spatial meaning across multiple languages. In this paper, we describe the structure of SpatialNet, with examples from English and German. We also show how SpatialNet can be combined with other existing NLP tools to create a text-to-scene system for a language.

pdf bib
Crowdsourced Hedge Term Disambiguation
Morgan Ulinski | Julia Hirschberg
Proceedings of the 13th Linguistic Annotation Workshop

We address the issue of acquiring quality annotations of hedging words and phrases, linguistic phenomenona in which words, sounds, or other constructions are used to express ambiguity or uncertainty. Due to the limited availability of existing corpora annotated for hedging, linguists and other language scientists have been constrained as to the extent they can study this phenomenon. In this paper, we introduce a new method of acquiring hedging annotations via crowdsourcing, based on reformulating the task of labeling hedges as a simple word sense disambiguation task. We also introduce a new hedging corpus we have constructed by applying this method, a collection of forum posts annotated using Amazon Mechanical Turk. We found that the crowdsourced judgments we obtained had an inter-annotator agreement of 92.89% (Fleiss’ Kappa=0.751) and, when comparing a subset of these annotations to an expert-annotated gold standard, an accuracy of 96.65%.

2018

pdf bib
Linguistic Cues to Deception and Perceived Deception in Interview Dialogues
Sarah Ita Levitan | Angel Maredia | Julia Hirschberg
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We explore deception detection in interview dialogues. We analyze a set of linguistic features in both truthful and deceptive responses to interview questions. We also study the perception of deception, identifying characteristics of statements that are perceived as truthful or deceptive by interviewers. Our analysis show significant differences between truthful and deceptive question responses, as well as variations in deception patterns across gender and native language. This analysis motivated our selection of features for machine learning experiments aimed at classifying globally deceptive speech. Our best classification performance is 72.74% F1-Score (about 17% better than human performance), which is achieved using a combination of linguistic features and individual traits.

pdf bib
Collecting Code-Switched Data from Social Media
Gideon Mendels | Victor Soto | Aaron Jaech | Julia Hirschberg
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Evaluating the WordsEye Text-to-Scene System: Imaginative and Realistic Sentences
Morgan Ulinski | Bob Coyne | Julia Hirschberg
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Using Hedge Detection to Improve Committed Belief Tagging
Morgan Ulinski | Seth Benjamin | Julia Hirschberg
Proceedings of the Workshop on Computational Semantics beyond Events and Roles

We describe a novel method for identifying hedge terms using a set of manually constructed rules. We present experiments adding hedge features to a committed belief system to improve classification. We compare performance of this system (a) without hedging features, (b) with dictionary-based features, and (c) with rule-based features. We find that using hedge features improves performance of the committed belief system, particularly in identifying instances of non-committed belief and reported belief.

pdf bib
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching
Gustavo Aguilar | Fahad AlGhamdi | Victor Soto | Thamar Solorio | Mona Diab | Julia Hirschberg
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching

pdf bib
Joint Part-of-Speech and Language ID Tagging for Code-Switched Data
Victor Soto | Julia Hirschberg
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching

Code-switching is the fluent alternation between two or more languages in conversation between bilinguals. Large populations of speakers code-switch during communication, but little effort has been made to develop tools for code-switching, including part-of-speech taggers. In this paper, we propose an approach to POS tagging of code-switched English-Spanish data based on recurrent neural networks. We test our model on known monolingual benchmarks to demonstrate that our neural POS tagging model is on par with state-of-the-art methods. We next test our code-switched methods on the Miami Bangor corpus of English Spanish conversation, focusing on two types of experiments: POS tagging alone, for which we achieve 96.34% accuracy, and joint part-of-speech and language ID tagging, which achieves similar POS tagging accuracy (96.39%) and very high language ID accuracy (98.78%). Finally, we show that our proposed models outperform other state-of-the-art code-switched taggers.

pdf bib
Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task
Gustavo Aguilar | Fahad AlGhamdi | Victor Soto | Mona Diab | Julia Hirschberg | Thamar Solorio
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching

In the third shared task of the Computational Approaches to Linguistic Code-Switching (CALCS) workshop, we focus on Named Entity Recognition (NER) on code-switched social-media data. We divide the shared task into two competitions based on the English-Spanish (ENG-SPA) and Modern Standard Arabic-Egyptian (MSA-EGY) language pairs. We use Twitter data and 9 entity types to establish a new dataset for code-switched NER benchmarks. In addition to the CS phenomenon, the diversity of the entities and the social media challenges make the task considerably hard to process. As a result, the best scores of the competitions are 63.76% and 71.61% for ENG-SPA and MSA-EGY, respectively. We present the scores of 9 participants and discuss the most common challenges among submissions.

2017

pdf bib
Comparing Approaches for Automatic Question Identification
Angel Maredia | Kara Schechtman | Sarah Ita Levitan | Julia Hirschberg
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Collecting spontaneous speech corpora that are open-ended, yet topically constrained, is increasingly popular for research in spoken dialogue systems and speaker state, inter alia. Typically, these corpora are labeled by human annotators, either in the lab or through crowd-sourcing; however, this is cumbersome and time-consuming for large corpora. We present four different approaches to automatically tagging a corpus when general topics of the conversations are known. We develop these approaches on the Columbia X-Cultural Deception corpus and find accuracy that significantly exceeds the baseline. Finally, we conduct a cross-corpus evaluation by testing the best performing approach on the Columbia/SRI/Colorado corpus.

2016

pdf bib
Identifying Individual Differences in Gender, Ethnicity, and Personality from Dialogue for Deception Detection
Sarah Ita Levitan | Yocheved Levitan | Guozhen An | Michelle Levine | Rivka Levitan | Andrew Rosenberg | Julia Hirschberg
Proceedings of the Second Workshop on Computational Approaches to Deception Detection

pdf bib
Babler - Data Collection from the Web to Support Speech Recognition and Keyword Search
Gideon Mendels | Erica Cooper | Julia Hirschberg
Proceedings of the 10th Web as Corpus Workshop

pdf bib
Proceedings of the Second Workshop on Computational Approaches to Code Switching
Mona Diab | Pascale Fung | Mahmoud Ghoneim | Julia Hirschberg | Thamar Solorio
Proceedings of the Second Workshop on Computational Approaches to Code Switching

pdf bib
Part of Speech Tagging for Code Switched Data
Fahad AlGhamdi | Giovanni Molina | Mona Diab | Thamar Solorio | Abdelati Hawwari | Victor Soto | Julia Hirschberg
Proceedings of the Second Workshop on Computational Approaches to Code Switching

pdf bib
Incrementally Learning a Dependency Parser to Support Language Documentation in Field Linguistics
Morgan Ulinski | Julia Hirschberg | Owen Rambow
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We present experiments in incrementally learning a dependency parser. The parser will be used in the WordsEye Linguistics Tools (WELT) (Ulinski et al., 2014) which supports field linguists documenting a language’s syntax and semantics. Our goal is to make syntactic annotation faster for field linguists. We have created a new parallel corpus of descriptions of spatial relations and motion events, based on pictures and video clips used by field linguists for elicitation of language from native speaker informants. We collected descriptions for each picture and video from native speakers in English, Spanish, German, and Egyptian Arabic. We compare the performance of MSTParser (McDonald et al., 2006) and MaltParser (Nivre et al., 2006) when trained on small amounts of this data. We find that MaltParser achieves the best performance. We also present the results of experiments using the parser to assist with annotation. We find that even when the parser is trained on a single sentence from the corpus, annotation time significantly decreases.

2015

pdf bib
A New Dataset and Evaluation for Belief/Factuality
Vinodkumar Prabhakaran | Tomas By | Julia Hirschberg | Owen Rambow | Samira Shaikh | Tomek Strzalkowski | Jennifer Tracey | Michael Arrigo | Rupayan Basu | Micah Clark | Adam Dalton | Mona Diab | Louise Guthrie | Anna Prokofieva | Stephanie Strassel | Gregory Werner | Yorick Wilks | Janyce Wiebe
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf bib
Acoustic-prosodic entrainment in Slovak, Spanish, English and Chinese: A cross-linguistic comparison
Rivka Levitan | Štefan Beňuš | Agustín Gravano | Julia Hirschberg
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf bib
Towards simultaneous interpreting: the timing of incremental machine translation and speech synthesis
Timo Baumann | Srinivas Bangalore | Julia Hirschberg
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

In simultaneous interpreting, human experts incrementally construct and extend partial hypotheses about the source speaker’s message, and start to verbalize a corresponding message in the target language, based on a partial translation – which may have to be corrected occasionally. They commence the target utterance in the hope that they will be able to finish understanding the source speaker’s message and determine its translation in time for the unfolding delivery. Of course, both incremental understanding and translation by humans can be garden-pathed, although experts are able to optimize their delivery so as to balance the goals of minimal latency, translation quality and high speech fluency with few corrections. We investigate the temporal properties of both translation input and output to evaluate the tradeoff between low latency and translation quality. In addition, we estimate the improvements that can be gained with a tempo-elastic speech synthesizer.

pdf bib
Teenage and adult speech in school context: building and processing a corpus of European Portuguese
Ana Isabel Mata | Helena Moniz | Fernando Batista | Julia Hirschberg
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a corpus of European Portuguese spoken by teenagers and adults in school context, CPE-FACES, with an overview of the differential characteristics of high school oral presentations and the challenges this data poses to automatic speech processing. The CPE-FACES corpus has been created with two main goals: to provide a resource for the study of prosodic patterns in both spontaneous and prepared unscripted speech, and to capture inter-speaker and speaking style variations common at school, for research on oral presentations. Research on speaking styles is still largely based on adult speech. References to teenagers are sparse and cross-analyses of speech types comparing teenagers and adults are rare. We expect CPE-FACES, currently a unique resource in this domain, will contribute to filling this gap in European Portuguese. Focusing on disfluencies and phrase-final phonetic-phonological processes we show the impact of teenage speech on the automatic segmentation of oral presentations. Analyzing fluent final intonation contours in declarative utterances, we also show that communicative situation specificities, speaker status and cross-gender differences are key factors in speaking style variation at school.

pdf bib
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages
Jeff Good | Julia Hirschberg | Owen Rambow
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages

pdf bib
Documenting Endangered Languages with the WordsEye Linguistics Tool
Morgan Ulinski | Anusha Balakrishnan | Daniel Bauer | Bob Coyne | Julia Hirschberg | Owen Rambow
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages

pdf bib
Proceedings of the First Workshop on Computational Approaches to Code Switching
Mona Diab | Julia Hirschberg | Pascale Fung | Thamar Solorio
Proceedings of the First Workshop on Computational Approaches to Code Switching

pdf bib
Overview for the First Shared Task on Language Identification in Code-Switched Data
Thamar Solorio | Elizabeth Blair | Suraj Maharjan | Steven Bethard | Mona Diab | Mahmoud Ghoneim | Abdelati Hawwari | Fahad AlGhamdi | Julia Hirschberg | Alison Chang | Pascale Fung
Proceedings of the First Workshop on Computational Approaches to Code Switching

pdf bib
Detecting Inappropriate Clarification Requests in Spoken Dialogue Systems
Alex Liu | Rose Sloan | Mei-Vern Then | Svetlana Stoyanchev | Julia Hirschberg | Elizabeth Shriberg
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

pdf bib
WELT: Using Graphics Generation in Linguistic Fieldwork
Morgan Ulinski | Anusha Balakrishnan | Bob Coyne | Julia Hirschberg | Owen Rambow
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2013

pdf bib
Exploring Features For Localized Detection of Speech Recognition Errors
Eli Pincus | Svetlana Stoyanchev | Julia Hirschberg
Proceedings of the SIGDIAL 2013 Conference

pdf bib
Modelling Human Clarification Strategies
Svetlana Stoyanchev | Alex Liu | Julia Hirschberg
Proceedings of the SIGDIAL 2013 Conference

2012

pdf bib
Acoustic-Prosodic Entrainment and Social Behavior
Rivka Levitan | Agustín Gravano | Laura Willson | S̆tefan Ben̆us̆ | Julia Hirschberg | Ani Nenkova
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Affirmative Cue Words in Task-Oriented Dialogue
Agustín Gravano | Julia Hirschberg | Štefan Beňuš
Computational Linguistics, Volume 38, Issue 1 - March 2012

pdf bib
Detecting Hate Speech on the World Wide Web
William Warner | Julia Hirschberg
Proceedings of the Second Workshop on Language in Social Media

2011

pdf bib
Entrainment in Speech Preceding Backchannels.
Rivka Levitan | Agustín Gravano | Julia Hirschberg
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages
Ani Nenkova | Julia Hirschberg | Yang Liu
Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages

pdf bib
Detecting Levels of Interest from Spoken Dialog with Multistream Prediction Feedback and Similarity Based Hierarchical Fusion Learning
William Yang Wang | Julia Hirschberg
Proceedings of the SIGDIAL 2011 Conference

2009

pdf bib
Improving the Arabic Pronunciation Dictionary for Phone and Word Recognition with Linguistically-Based Pronunciation Rules
Fadi Biadsy | Nizar Habash | Julia Hirschberg
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Detecting Pitch Accents at the Word, Syllable and Vowel Level
Andrew Rosenberg | Julia Hirschberg
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Spoken Arabic Dialect Identification Using Phonotactic Modeling
Fadi Biadsy | Julia Hirschberg | Nizar Habash
Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages

pdf bib
Turn-Yielding Cues in Task-Oriented Dialogue
Agustín Gravano | Julia Hirschberg
Proceedings of the SIGDIAL 2009 Conference

2008

pdf bib
Speaking More Like You: Lexical, Acoustic/Prosodic, and Discourse Entrainment in Spoken Dialogue Systems
Julia Hirschberg
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue

pdf bib
An Unsupervised Approach to Biography Production Using Wikipedia
Fadi Biadsy | Julia Hirschberg | Elena Filatova
Proceedings of ACL-08: HLT

pdf bib
High Frequency Word Entrainment in Spoken Dialogue
Ani Nenkova | Agustín Gravano | Julia Hirschberg
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
On the role of context and prosody in the interpretation of ‘okay’
Agustín Gravano | Stefan Benus | Héctor Chávez | Julia Hirschberg | Lauren Wilcox
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure
Andrew Rosenberg | Julia Hirschberg
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Characterizing and Predicting Corrections in Spoken Dialogue Systems
Diane Litman | Marc Swerts | Julia Hirschberg
Computational Linguistics, Volume 32, Number 3, September 2006

pdf bib
Summarizing Speech Without Text Using Hidden Markov Models
Sameer Maskey | Julia Hirschberg
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Story Segmentation of Broadcast News in English, Mandarin and Arabic
Andrew Rosenberg | Julia Hirschberg
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

2004

pdf bib
Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies
Michel Galley | Kathleen McKeown | Julia Hirschberg | Elizabeth Shriberg
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2001

pdf bib
Identifying User Corrections Automatically in Spoken Dialogue Systems
Julia Hirschberg | Diane Litman | Marc Swerts
Second Meeting of the North American Chapter of the Association for Computational Linguistics

pdf bib
Labeling Corrections and Aware Sites in Spoken Dialogue Systems
Julia Hirschberg | Marc Swerts | Diane Litman
Proceedings of the Second SIGdial Workshop on Discourse and Dialogue

pdf bib
Computational Linguistics, Volume 27, Number 1, March 2001
Julia Hirschberg
Computational Linguistics, Volume 27, Number 1, March 2001

pdf bib
Computational Linguistics, Volume 27, Number 2, June 2001
Julia Hirschberg
Computational Linguistics, Volume 27, Number 2, June 2001

pdf bib
Computational Linguistics, Volume 27, Number 3, September 2001
Julia Hirschberg
Computational Linguistics, Volume 27, Number 3, September 2001

pdf bib
Computational Linguistics, Volume 27, Number 4, December 2001
Julia Hirschberg
Computational Linguistics, Volume 27, Number 4, December 2001

pdf bib
SCANMail: Audio Navigation in the Voicemail Domain
Michiel Bacchiani | Julia Hirschberg | Aaron Rosenberg | Steve Whittaker | Donald Hindle | Phil Isenhour | Mark Jones | Litza Stark | Gary Zamchick
Proceedings of the First International Conference on Human Language Technology Research

pdf bib
Predicting User Reactions to System Error
Diane Litman | Julia Hirschberg | Marc Swerts
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib
Predicting Automatic Speech Recognition Performance Using Prosodic Cues
Diane J. Litman | Julia B. Hirschberg | Marc Swerts
1st Meeting of the North American Chapter of the Association for Computational Linguistics

pdf bib
Computational Linguistics, Volume 26, Number 1, March 2000
Julia Hirschberg
Computational Linguistics, Volume 26, Number 1, March 2000

pdf bib
Modeling Local Context for Pitch Accent Prediction
Shimei Pan | Julia Hirschberg
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

1999

pdf bib
Computational Linguistics, Volume 25, Number 1, March 1999
Julia Hirschberg
Computational Linguistics, Volume 25, Number 1, March 1999

pdf bib
Computational Linguistics, Volume 25, Number 2, June 1999
Julia Hirschberg
Computational Linguistics, Volume 25, Number 2, June 1999

pdf bib
Computational Linguistics, Volume 25, Number 3, September 1999
Julia Hirschberg
Computational Linguistics, Volume 25, Number 3, September 1999

pdf bib
Computational Linguistics, Volume 25, Number 4, December 1999
Julia Hirschberg
Computational Linguistics, Volume 25, Number 4, December 1999

1998

pdf bib
Computational Linguistics, Volume 24, Number 1, March 1998 - Special Issue on Word Sense Disambiguation
Julia Hirschberg
Computational Linguistics, Volume 24, Number 1, March 1998 - Special Issue on Word Sense Disambiguation

pdf bib
Computational Linguistics, Volume 24, Number 2, June 1998
Julia Hirschberg
Computational Linguistics, Volume 24, Number 2, June 1998

pdf bib
Computational-Linguistics, Volume 24, Number 3, September 1998
Julia Hirschberg
Computational-Linguistics, Volume 24, Number 3, September 1998

pdf bib
Computational Linguistics, Volume 24, Number 4, December 1998
Julia Hirschberg
Computational Linguistics, Volume 24, Number 4, December 1998

pdf bib
I just played that a minute ago!:” Designing User Interfaces for Audio Navigation
Julia Hirschberg | John Choi | Christine Nakatani | Steve Whittaker
Content Visualization and Intermedia Representations (CVIR’98)

1997

pdf bib
Computational Linguistics, Volume 23, Number 1, March 1997
Julia Hirschberg
Computational Linguistics, Volume 23, Number 1, March 1997

pdf bib
Computational Linguistics, Volume 23, Number 2, June 1997
Julia Hirschberg
Computational Linguistics, Volume 23, Number 2, June 1997

pdf bib
Computational Linguistics, Volume 23, Number 3, September 1997
Julia Hirschberg
Computational Linguistics, Volume 23, Number 3, September 1997

pdf bib
Computational Linguistics, Volume 23, Number 4, December 1997
Julia Hirschberg
Computational Linguistics, Volume 23, Number 4, December 1997

1996

pdf bib
A Prosodic Analysis of Discourse Segments in Direction-Giving Monologues
Julia Hirschberg | Christine H. Nakatani
34th Annual Meeting of the Association for Computational Linguistics

pdf bib
Computational Linguistics, Volume 22, Number 1, March 1996
Julia Hirschberg
Computational Linguistics, Volume 22, Number 1, March 1996

pdf bib
Computational Linguistics, Volume 22, Number 2, June 1996
Julia Hirschberg
Computational Linguistics, Volume 22, Number 2, June 1996

pdf bib
Computational Linguistics, Volume 22, Number 3, September 1996
Julia Hirschberg
Computational Linguistics, Volume 22, Number 3, September 1996

pdf bib
Computational Linguistics, Volume 22, Number 4, December 1996
Julia Hirschberg
Computational Linguistics, Volume 22, Number 4, December 1996

1995

pdf bib
Computational Linguistics, Volume 21, Number 1, March 1995
Julia Hirschberg
Computational Linguistics, Volume 21, Number 1, March 1995

pdf bib
Computational Linguistics, Volume 21, Number 2, June 1995
Julia Hirschberg
Computational Linguistics, Volume 21, Number 2, June 1995

pdf bib
Computational Linguistics, Volume 21, Number 3, September 1995
Julia Hirschberg
Computational Linguistics, Volume 21, Number 3, September 1995

pdf bib
Computational Linguistics, Volume 21, Number 4, December 1995
Julia Hirschberg
Computational Linguistics, Volume 21, Number 4, December 1995

1994

pdf bib
Computational Linguistics, Volume 20, Number 1, March 1994
Julia Hirschberg
Computational Linguistics, Volume 20, Number 1, March 1994

pdf bib
Computational Linguistics, Volume 20, Number 2, June 1994
Julia Hirschberg
Computational Linguistics, Volume 20, Number 2, June 1994

pdf bib
Computational Linguistics, Volume 20, Number 3, September 1994
Julia Hirschberg
Computational Linguistics, Volume 20, Number 3, September 1994

pdf bib
Computational Linguistics, Volume 20, Number 4, December 1994
Julia Hirschberg
Computational Linguistics, Volume 20, Number 4, December 1994

1993

pdf bib
Computational Linguistics, Volume 19, Number 1, March 1993, Special Issue on Using Large Corpora: I
Julia Hirschberg
Computational Linguistics, Volume 19, Number 1, March 1993, Special Issue on Using Large Corpora: I

pdf bib
Computational Linguistics, Volume 19, Number 2, June 1993, Special Issue on Using Large Corpora: II
Julia Hirschberg
Computational Linguistics, Volume 19, Number 2, June 1993, Special Issue on Using Large Corpora: II

pdf bib
Computational Linguistics, Volume 19, Number 3, September 1993
Julia Hirschberg
Computational Linguistics, Volume 19, Number 3, September 1993

pdf bib
Empirical Studies on the Disambiguation of Cue Phrases
Julia Hirschberg | Diane Litman
Computational Linguistics, Volume 19, Number 3, September 1993

pdf bib
Computational Linguistics, Volume 19, Number 4, December 1993
Julia Hirschberg
Computational Linguistics, Volume 19, Number 4, December 1993

pdf bib
A Speech-First Model for Repair Detection and Correction
Christine Nakatani | Julia Hirschberg
31st Annual Meeting of the Association for Computational Linguistics

pdf bib
A Speech-First Model for Repair Detection and Correction
Christine Nakatani | Julia Hirschberg
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993

1992

pdf bib
Session 13: Prosody
Patti Price | Julia Hirschberg
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

pdf bib
Intonational Features of Local and Global Discourse Structure
Julia Hirschberg | Barbara Grosz
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

1991

pdf bib
Predicting Intonational Boundaries Automatically from Text: The ATIS Domain
Michelle Q. Wang | Julia Hirschberg
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

1990

pdf bib
Disambiguating Cue Phrases in Text and Speech
Diane Litman | Julia Hirschberg
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

1989

pdf bib
Distinguishing Questions by Contour Speech Recognition Tasks
Julia Hirschberg
Speech and Natural Language: Proceedings of a Workshop Held at Cape Cod, Massachusetts, October 15-18, 1989

1988

pdf bib
Assigning Intonational Features in Synthesized Spoken Directions
James Raymond Davis | Julia Hirschberg
26th Annual Meeting of the Association for Computational Linguistics

1987

pdf bib
Now Let’s Talk About Now; Identifying Cue Phrases Intonationally
Julia Hirschberg | Diane Litman
25th Annual Meeting of the Association for Computational Linguistics

pdf bib
NO TITLE
Julia Hirschberg
Theoretical Issues in Natural Language Processing 3

1986

pdf bib
The Intonational Structuring of Discourse
Julia Hirschberg | Janet Pierrehumbert
24th Annual Meeting of the Association for Computational Linguistics

1984

pdf bib
Toward a Redefinition of Yes/No Questions
Julia Hirschberg
10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics

Search
Co-authors