Empircal Translation
Empircal Translation
Empircal Translation
MichaelCarl
SrinivasBangalore
MoritzSchaeffer Editors
New Directions
in Empirical
Translation
Process
Research
Exploring the CRITT TPR-DB
Series editor
Defeng Li
Centre for Translation Studies, SOAS, University of London, London,
United Kingdom
Centre for Studies of Translation, Interpreting and Cognition, University of Macau,
Macau SAR
123
Editors
Michael Carl
Center for Research and Innovation in
Translation and Translation Technology
Department of International
Business Communication
Copenhagen Business School
Frederiksberg, Denmark
Srinivas Bangalore
Interactions Corporation
New Providence
New Jersey, USA
Moritz Schaeffer
Center for Research and Innovation in
Translation and Translation Technology
Department of International
Business Communication
Copenhagen Business School
Frederiksberg, Denmark
ISSN 2197-8689
ISSN 2197-8697 (electronic)
New Frontiers in Translation Studies
ISBN 978-3-319-20357-7
ISBN 978-3-319-20358-4 (eBook)
DOI 10.1007/978-3-319-20358-4
Library of Congress Control Number: 2015945979
Springer Cham Heidelberg New York Dordrecht London
Springer International Publishing Switzerland 2016
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made.
Printed on acid-free paper
Springer International Publishing AG Switzerland is part of Springer Science+Business Media
(www.springer.com)
New Frontiers in Translation Studies, as its name suggests, is a Series which focuses
on new and emerging themes in Translation Studies. The last four decades have
witnessed a rapid growth of this fledgling discipline. This Series intends to publish
and promote these developments and provide readers with theories and methods
they need to carry out their own translation studies projects.
Translation Studies is now expanding into new or underexplored areas both
in theories and research methods. One recent development is the keen interest
in translation theories that transcend Eurocentrism. Translation Studies has for
decades been dominated by Western modes of understanding and theorizing about
translation and closed to models of other traditions. This is due to, as many have
argued, the unavailability of reliable data and systematic analysis of translation
activities in non-European cultures (Hung and Wakabayashi 2005). So in the past
few years, some scholars have attempted to make available literature on translation
from non-European traditions (Cheung 2006). Several conferences have been held
with themes devoted to Asian translation traditions. Besides, rather than developing
translation theories via a shift to focusing on non-Eurocentric approaches, efforts
have been directed towards investigating translation universals applicable across all
languages, cultures, and traditions.
Modern Translation Studies has adopted an interdisciplinary approach from its
inception. Besides tapping into theories and concepts of neighboring disciplines,
such as linguistics, anthropology, education, sociology, and literary studies, it
has also borrowed research models and methods from other disciplines. In the
late 1970s, German translation scholars applied Think-aloud Protocols (TAPs) of
cognitive psychology in their investigation of translators mental processes, and
more recently, process researchers have incorporated into their research designs lab
methods, such as eye-tracker, EEG, and fMRI. In the early 1990s, computational
and corpus linguistics was introduced into Translation Studies, which has since
generated a proliferation of studies on the so-called translation universals, translator
style, and features of translated language. Studies on interpreting and translation
education have also taken a data-based empirical approach and yielded interesting
and useful results.
v
vi
References
Cheung, M. (2006). An anthology of Chinese discourse on translation, volume one: From earliest
times to the Buddhist project. Manchester/Kinderhook: St. Jerome Publishing.
Hung, E., & Wakabayashi, J. (2005). Asian translation traditions. Manchester/Northampton: St.
Jerome.
Foreword
The appearance of the present volume coincides with the 10th anniversary of
CRITT, the Center for Research and Innovation in Translation and Translation
Technology, which was inaugurated on 10 June 2005. As it happens, the publication
of the book also coincides with the 20th anniversary of the development of the first
version of Translog, which my son Lasse Schou programmed for me towards the
end of 1995 (when he was 15). The idea of the program came to me because I had
become interested in knowing about the mental processes involved in translating
and had learnt elementary programming. I had become somewhat frustrated with my
own attempts at analyzing verbal data from think-aloud experiments, which was the
dominant methodological paradigm at the time following the publication of Ericsson
and Simons influential Protocol Analysis: Verbal Reports as Data (1984; 2nd ed.
1993). Therefore I was trying to think of a way of getting harder, less subjective data
as a corrective to, or control on, inferences based on think-aloud data. I first intended
Translog only as an instrument I would use to log timed keystrokes in my personal
research, but the program quickly generated broader interest and soon colleagues
were contributing ideas for additional features, primarily in the CBS TRAP project
(19962002) and in the international Translation Expertise group of researchers
generously funded by the University of Oslo (19992005). One important outcome
of meetings and publications in the context of the Translation Expertise group
activities was the idea of the CRITT center at CBS. Another major outcome was
our successful application for the EU Eye-to-IT project (20062009), which made
it possible to thoroughly re-program Translog (2006) so that it would accept UTF8encoded characters, present output in xml, accept data from an eye-tracker via a
gaze-to-word mapping program developed at the University of Tampere, and have
many other new features. The Eye-to-IT project also made it possible for CRITT
to recruit researchers, among them Michael Carl (in 2008), the present director of
the Center. With his machine translation background and his blend of computational
expertise and complete dedication to research, he gave the Centers research a new
direction and was the main driver in working out the successful EU CASMACAT
project proposal. In the context of this project (20112014), apart from developing
an interactive post-editing workbench and contributing the promised deliverables,
vii
viii
Foreword
he oversaw the development of Translog II, made sure that key and gaze data were
properly integrated and recorded, and organized the construction of the TPR-DB,
now probably the largest database anywhere of key and gaze process data from
translation and post-editing sessions, so that both legacy recordings and recent
recordings would all have the same data formats. All of the contributions in the
present book are based on recordings stored in the TPR-DB and testimony to its
huge value as a TPR resource.
For the past 5 years, CRITT has offered a 1-week summer course for PhD
students with an interest in TPR. This has been enormously gratifying both on
professional and a personal level. Some students have come back; more have stayed
in contact and now constitute a network ranging from Brazil to Canada and from
China and India to most of the countries in Western Europe from north to south.
Some of the participants in the early days (2011) have returned as co-instructors. It
is truly gratifying to see that at least one of the authors of each of the 14 contributions
to this book has attended one of the TPR summer courses.
Following a chance meeting at a conference in India between Michael Carl and
Srinivas Bangalore, then at AT&T in New Jersey, they agreed to run an 8-week
workshop called SEECAT at CBS in 2013. Most of the participants came from
leading universities and IT institutions in India. The aim of the workshop was to
implement voice recognition as well as gaze control of certain screen operations in
a translation workbench solution. In the course of 8 weeks, this aim was achieved
and prototypes recognizing not just English but Hindi and (less successfully) Danish
were produced.
A 4-week follow-up workshop aimed at developing a new TPR subdiscipline
to be called Translation Data Analytics (TDA) was run by Michael Carl, Srinivas
Bangalore, and Moritz Schaeffer in JulyAugust 2014. Here, participants worked in
teams on developing the appropriate computational, statistical, and other analytical
tools that would constitute TDA and make it possible, by applying TDA to
large-scale process data of the kind stored in the TPR-DB, to produce reliable
descriptions of and predictions about, e.g., translator profiles, the use of default
translation strategies, and the occurrence of revision patterns and of predictable
delays triggered by syntactic and word order rearrangements, all of which will
contribute to generating a process-oriented model of human translation.
All of this, in much more detail, is what the reader will find in this most welcome
celebration of 10 years of CRITT.
Frederiksberg, Denmark
List of Contributors
ix
List of Contributors
Barbara Dragsted Center for Research and Innovation in Translation and Translation Technology, Department of International Business Communication, Copenhagen Business School, Frederiksberg, Denmark
Norma Fonseca Linguistic Studies Department, Federal University of Minas
Gerais (UFMG), Belo Horizonte, Brazil
Ulrich Germann Machine Translation Group, School of Informatics, University
of Edinburgh, Edinburgh, UK
Maheshwar Ghankot Indian Space Research Organisation, Hassan, India
Jos Luiz Gonalves Universidade Federal de Ouro Preto (UFOP), and Laboratory
of Experimentation in Translation (LETRA/UFMG), Universidade Federal de
Minas Gerais, Belo Horizonte, Brazil
Jess Gonzlez-Rubio Unbabel Lda, Samora Correia, Portugal
Robert Hartsuiker Department of Experimental Psychology, University of Ghent,
Ghent, Belgium
Arndt Heilmann English Linguistics Department, RWTH Aachen, Aachen,
Germany
Kristian Tangsgaard Hvelplund Department of English, Germanic and Romance
Studies, University of Copenhagen, Copenhagen, Denmark
Arnt Lykke Jakobsen Center for Research and Innovation in Translation and
Translation Technology, Department of International Business Communication,
Copenhagen Business School, Frederiksberg, Denmark
Arlene Koglin Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Samuel Lubli Machine Translation Group, School of Informatics, University of
Edinburgh, Edinburgh, UK
Ana L. V. Leal Department of Portuguese, University of Macau, Macau, China
Lieve Macken Department of Translation, Interpreting and Communication,
Ghent University, Ghent, Belgium
Mercedes Garca Martnez Computer Laboratory, University of Maine, Le Mans,
France
Bartolom Mesa-Lao Center for Research and Innovation in Translation and
Translation Technology, Department of International Business Communication,
Copenhagen Business School, Frederiksberg, Denmark
Jean Nitzke Department for Language, Culture and Translation Studies in Germersheim (FTSK), University of Mainz, Mainz, Germany
List of Contributors
xi
Dagmara Ponska
University of Social Sciences and Humanities, Warsaw, Poland
Paulo Quaresma Department of Informatics, Universidade de vora, vora,
Portugal
Germn Sanchis-Trilles Sciling SL, Valencia, Spain
Moritz Schaeffer Center for Research and Innovation in Translation and Translation Technology, Department of International Business Communication, Copenhagen Business School, Frederiksberg, Denmark
Institute for Language, Cognition and Computation University of Edinburgh,
Edinburgh, UK
Mrcia Schmaltz Department of Portuguese, University of Macau, Macau, China
Kyoko Sekino Laboratory for Experimentation in Translation (LETRA), Federal
University of Minas Gerais (UFMG), Belo Horizonte, Brazil
Annegret Sturm University of Geneva, 36, rue Prvost-Martin, 1205 Geneva
Karina Sarto Szpak Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
Sonia Vandepitte Department of Translation, Interpreting and Communication,
Ghent University, Ghent, Belgium
Derek F. Wong Department of Computer and Information Science, University of
Macau, Macau, China
Julian Zapata School of Translation and Interpretation, University of Ottawa,
Ottawa, Canada
Contents
Part I
Empirical TPR
13
Part II
3
57
77
95
xiii
xiv
Contents
Part III
Part I
Empirical TPR
Chapter 1
Abstract New Directions in Empirical Translation Process Research is a continuation of the development which originates in descriptive translation studies as
conceived by Holmes (1972) and Toury (1995). This introduction shows how this
volume is a documentation of a technological development which makes it possible
for translation research to go beyond the description. As the various chapters in
this volume argue, the analysis of records from keyloggers and eye-trackers enable
us to explain and predict (Holmes, 1972:71) translators behaviour on various
levels of granularity. All contributions are centered around the CRITT TPR-DB, a
unique resource of more than 500 h of recorded translation process data augmented
with over 200 different annotations. The chapters describe aspects of computational,
statistical and psycholinguistic models of the translation process that are facilitated
by the TPR-DB. This chapter gives an overview of the contributions and provides a
background for the work reported in the volume.
Keywords Predictive translation process studies Computational Statistical
and psycholinguistic modelling of the translation process
M. Carl ()
Center for Research and Innovation in Translation and Translation Technology, Department of
International Business Communication, Copenhagen Business School, Frederiksberg, Denmark
e-mail: mc.ibc@cbs.dk
S. Bangalore
Interactions Corporation, New Providence, NJ, USA
M. Schaeffer
Center for Research and Innovation in Translation and Translation Technology, Department of
International Business Communication, Copenhagen Business School, Frederiksberg, Denmark
Institute for Language, Cognition and Computation, University of Edinburgh, Edinburgh, UK
Springer International Publishing Switzerland 2016
M. Carl et al. (eds.), New Directions in Empirical Translation Process Research,
New Frontiers in Translation Studies, DOI 10.1007/978-3-319-20358-4_1
M. Carl et al.
and theorists after Horace were equally prescriptive in their writing on translation
(cf. Robinson 1997).
In 1972, Holmes (1972) made the case for research on translations to be
descriptive. He produced a map of what is now called translation studies and
applied standard scientific methods to the study of translations by arguing that
: : : translation studies thus has two main objectives: (1) to describe the phenomena
of translating and translation(s) : : : , and (2) to establish general principles by means
of which these phenomena can be explained and predicted. (71) Following Holmes,
Toury (1995) turned scholars attention away from questions regarding whether a
translation is equivalent or not by setting out the methods and theoretical framework
for what became known as descriptive translation studies. One of Tourys central
hypotheses was that translation is a norm governed activity, and with the availability
of large quantities of translated text in electronic form, corpus-based translation
studies set out to find empirical evidence for these norms (e.g. Mauranen and
Kujamki 2004). While corpus-based translation studies has been prolific in the
production of hypotheses regarding norms found in target texts, it has not been easy
to draw inferences from these regarding the translation process.
We are now at a stage in the development where translation research becomes
predictive. The records from keylogging software and eye-trackers make it possible to address Holmes (1972) second main objective, to explain and predict
translators behaviour: at present, we have all the necessary tools to address the
challenge of building a model of human translation which makes specific, falsifiable
predictions regarding the process and the product of translation. Perhaps the most
fundamental question in this regard is to determine the mechanisms underlying
the production of translations which are common to all translators. This babelian
question attempts to find, on the one hand, the cognitive processes which are shared
among all translators during the translation of diverse language combinations; it
is the quest for linguistic and cognitive universals of translation. On the other
hand, from a utilitarian viewpoint, having a model which can predict translators
behaviour makes it possible to design translator assistance just when it is needed.
It will allow us to automate those aspects of the translation process that save
mechanical effort, so that the translator can dedicate their full attention to those
aspects which cannot be automatized.
Some 20 years ago it was very difficult to base any investigation of human
translation processes on empirical observations. Any finding regarding cognitive
processes during translation was either based on an analysis of the final product,
i.e. the target text itself, or on Think Aloud Protocols (TAPs) (Krings 1986;
Lrscher 1991). In TAPs, participants are asked to verbalize their thoughts during
a concurrent task such as translation. While studies using TAPs have been highly
valuable in the investigation of the cognitive processes during translation, the
very act of verbalizing thoughts has been shown to have a considerable effect on
the cognitive processes during translation (Jakobsen 2003). However, given the
technology used in modern Translation Process Research (TPR), and as exemplified
in this volume, it is possible to have : : : a structured record of the exact temporal
succession of translators eye and hand activity : : : (Jakobsen 2011: 47)and it is
therefore possible to : : : ask old questions in a new way and begin to formulate
tentative answers to them : : : (ibid).
An important landmark for empirical TPR was set up in 1995 by a group of
researchers at the Copenhagen Business School when developing a data-acquisition
software, Translog (Jakobsen and Schou 1999)with which translators keystroke
could be recorded, replayed and analysed. In contrast to previous TAP elicitation
methods, a keylogger runs in the background so as not to interfere with the writing
or translation process. In a replay mode the translation processes can be visualized
and analysed. Since 2009, this program has been extended with an eye-tracker
interface, so that gaze activities can also be logged (Carl 2012). If connected to an
eye-tracker, Translog-II records gaze-sample points, computes gaze fixations and
maps the fixations to the closest character on the screen. The Translog tool and the
emerging research activities around it have given rise to the foundation of the Center
for Research in Translation and Translation Technology (CRITT) in 2005, and has
resulted in considerable research which has been reported, amongst others, in a
number of edited volumes published within the Copenhagen Studies in Language
series, in volumes 24, 27, 3539 and 41 (Hansen 1999, 2002; Pchhacker et al.
2007; Gpferich and Jakobsen 2008; Gpferich et al. 2010; Mees et al. 2010a,b;
Sharp et al. 2011).
Since then, three developments have given rise to the research reported in this
volume. The first development is related to the extension of Translog for languages
with different scripts and a tighter integration of eye-trackers; second to apply
empirical TPR methods to investigate and predict processes of human-machine
interaction in computer aided translation and third the collection of a large amount of
translation process data in a translation process research database, (TPR-DB), so as
to arrive at generalizable results. The large set of language combinations in the TPRDB and multiple translation modes have made it possible to arrive at statistically
reliable results. To this end, a consistent and transparent representation for logging
the diverse input modalities across different languages, and scripts was needed.
Within Translog-II the first requirement was addressed by replacing Translogs
initial keyboard logging method with text-diff logging1 method that records differences in the emerging texts, rather than memorizing the pressed keystrokes. For
languages written in the Latin script, there is an isomorphism between the produced
keystrokes and the modifications in the text, which does not exist for some other
scripts, such as, e.g., Chinese or Japanese. These logographic scripts make use of
special input methods, such as e.g. SoGou (see Chap. 11), with the effect that the
relation between the pressed keys and the characters that appear on the screen cannot
be reproduced from the keystroke log only. Switching from keystroke logging to
text-diff logging in Translog-II was triggered by the requirement for language and
script independency, so that now, irrespectively of the script, Translog-II encodes the
text modifications in UTF-8 and stores it in an XML file. At the same time a tight
1
Most papers which use the TPR-DB, not only in this volume, still refer to this as keylogging, even
though, strictly speaking, this is actually not correct.
M. Carl et al.
integration with eye-trackers was achieved, which is now functional for TOBII, SMI
and eyelink eye-trackers. As a consequence, data records are compatible and can be
compared across different languages.
The second development concerns the increasing interest of TPR to study the
interaction of computer assisted translation and human translation processes. The
importance of human-computer interaction in translation has been acknowledged
since the early days. The ALPAC report (ALPAC 1966) suggested that studies and
computer applications should be supported for speeding up the human translation
process and for the production of adequate reference works for the translator,
including the adaptation of glossaries : : : (ALPAC 1966: 34), In 1980, concrete
suggestions were made how such systems could be implemented (Kay 1998),
although, until recently, the investigation of cognitive processes in computerassisted translation has not been a topic of concern for TPR. As some of the
chapters in this volume describe, TPR has practical implications when investigating
how translation assistance is used in translators every-day applications, and what
technologies are suitable to support the underlying cognitive processesa field of
research labelled translator-computer interaction (TCI), or, as proposed in Chap. 7,
translator-information interaction (TII).
In order to study cognitive processes underlying the task of post-editing machine
translation, the Translog-II system that was originally designed to investigate
reading, writing and translation processes, was extended with an operation mode
to record sessions of post-editing machine translation. The machine-translated text
would appear in an editable text box, which a post-editor would edit to create the
final translation of the text. Text modifications would be recorded, in addition to
the gaze data, if an eye-tracker were to be connected. However, Translog-II does not
provide an experimental environment similar to real working conditions. Translog-II
presents two running texts in a source and target window, while modern translation
aides, such as translation memories, segment the texts into fragments and present
each source segment with its translation in a more structured manner.
In order to obtain a more realistic picture of professional translators working
styles and to assess how to support their translation processes with advanced
machine translation technology, the CASMACAT project (see Chaps. 38, but also
Sanchis-Trilles et al. 2014; Alabau et al. 2013; Koehn et al. 2013) has implemented
an advanced state-of-the-art, browser-based post-editing environment and combines
this with Translog-II style keyboard logging and eye-tracking possibilities. In
this way, detailed empirical data can be collected from a realistic translation
environment, with the hope that the assessment of this data would lead to a more
complete picture and better predictive models of human cognitive processes during
computer-aided translation.
The third development concerns the creation of a large database of TPR data.
Given the compatible representation of Translog-II in its various languages, scripts
and operation modes, it became possible to collect data from different studies into
one single repository and to process them in a generic and consistent manner (see
Chap. 2). The TPR-DB stores Translog-II data from reading, writing, translation,
copying and post-editing experiments, as well as CASMACAT translation sessions
in a single format, with common metrics, which make it possible to analyse the data
from different language combinations and translation modes consistently. The TPRDB is therefore ideally suited as a resource to answer questions regarding cognitive
processes during translation and post-editing, reading and copying which are shared
across different individuals and different language combinations. It facilitates the
generation and validation of hypotheses regarding translation processes across
different language combinations and different translation modes. Since the database
contains a large number of different languages and many language-agnostic features, it is now possible to verify these predictions, as illustrated by the range of
studies reported in this volume.
This volume is, hence, centered around the CRITT TPR-DB, a unique resource
of more than 500 h recorded translation process data, augmented with over 200
different rich annotations. Chapter 2 introduces the CRITT TPR Database, which
is a publicly available database of recorded text production (writing, copying,
translation) sessions for TPR. It contains user activity data (UAD) of translators,
editors, post-editors and authors behaviour recorded with Translog-II and with the
CASMACAT workbench. In addition to the raw logging data, the TPR-DB consists
of tables with rich features set that can be easily processed by various visualization
and analysis tools.
The remaining 12 chapters make up part II and part III of this book, which
describe the diverse directions in translation process research, including computational, statistical and psycholinguistic modelling that is facilitated by the TPR
data. The second part of this book is dedicated to the CASMACAT post-editing
workbench, outlining implementation details and usability issues of interactive
machine translation, the usage of external resources and translator-information
interaction. The third part contains studies modeling the human translation process.
Chapter 3 describes the integration of online and active learning techniques
in the CASMACAT. The foundations of current phrase-based statistical machine
translation (SMT) model, the mathematical basis for interactive translation prediction (ITP), and the use of online and active learning for translation are discussed
in this chapter. During online learning (OL), modifications by the translators are
immediately learned by the system with the aim of preventing the same errors in
the machine generated translations. During active learning (AL), only a subset of
the machine generated translations with worst quality are post-edited, the SMT
model is re-trained with the new translation example, and finally, the improved
SMT system returns the remaining (presumably correct) translations. The chapter
also presents a pilot evaluation with translators using the system. Results showed
that translators using the ITP systems incorporating OL required less typing effort
and had increased post-editing speed for 60 % of the translators.
Chapter 4 investigates the CASMACAT ITP post-editing mode with traditional
MT post-editing (PE) for the language pair English ! Brazilian Portuguese using
metrics to quantify the temporal, technical and cognitive post-editing effort. Two
M. Carl et al.
medical texts from the EMEA corpus2 were post-edited by 16 participants with
recordings of their gaze and keyboard activity. The measured effort was correlated
with an objectively computed score, Translation Edit Rate (TER) that was designed
to compare translations of a text. While the authors found that the technical effort
is higher for ITP than in the PE mode, the cognitive effort in ITP is lower than for
post-editing due to shorter fixation durations.
Based on the assumption, that interactive post-editing (ITP) is a new technology
that post-editors need to get acquainted with, Chap. 5 compares the CASMACAT
ITP and traditional post-editing modes (PE) in a longitudinal study (LS14), to
investigate whether and how the performance of professional post-editors improved
over time when working with ITP. Five post-editors used both modes over a period
of 6 weeks in which their activity data was recorded. In a second experiment
(CFT14), the translators learned behaviour was compared with a control group of
post-editors who did not have experience with ITP. It was found that the technical
post-editing effort, as measured by the ratio of coherent production time divided by
the overall post-editing time, was lower after the 6 weeks period of using the ITP
than the technical effort measured in the control group in CFT14 study who had not
worked with ITP before.
Chapters 6 and 7 highlight the use of external resources during translation, postediting and post-editing with online learning. As these contributions show, usage of
external resources is an important aspect, which can account for more than 50 % of
the total translation time (CITE).
Chapter 6 discusses the effectiveness of consulting external resources during
translation and post-editing of general text types by analysing 40 from-scratch
translation sessions and 40 post-editing sessions of 10 masters level translation
students, using the CASMACAT workbench. The usage of external resources was
recorded with Inputlog, and infused into the CASMACAT logfile. In this way, the
authors were able to go beyond previous studies which were restricted to manual
assessment of external resource usage or on only one type of external resource.
The study found that translation students spend significantly more time in external
resources when translating from scratch, compared to post-editing. No statistically
confirmative evidence was found to suggest that different types of resources were
used during translation compared to post-editing. However, longer consultation of
external resources during from-scratch translation correlated with higher translation
quality, while consultation of external resources during post-editing correlated with
lower translation quality.
Chapter 7 concludes the first part of this volume with a broader view on
translator-information interaction (TII), that is, translators interaction with (digital) information and information tools. The study is based on the CFT14 data,
mentioned in Chap. 5, and investigates the interaction of post-editors with the
CASMACAT BiConc tool (biconcordancer). On the basis of screen recordings and a
total of 55 instances of BiConc usage, it was found that four of the seven participants
http://opus.lingfil.uu.se/EMEA.php
in this study did not use BiConc. Participants who used BiConc also used other
Internet resources, such as term banks, dictionaries and corpora, to complement their
information retrieval efforts, and those who did not use the CASMACAT BiConc
also used fewer external resources overall. Factors such as relevance and trust seem
to play an important role in the usage of external resources, since only 47 % of the
CASMACAT BiConc searches were adopted by participants.
The third part of the volume is concerned with cognitive and statistical modeling
of human translation processes, the investigation of multilingual co-activation
and priming effects at the lexical, syntactic and discourse levels of granularity,
translation literality and syntactic annotation schemata.
Chapter 8 starts with the assumption that there are three human translation
processes (HTPs) during post-editing of machine translation output: orientation,
revision and pausing. Since these processes are not directly observable in the
logging data, the authors conceptualize the recognition of these phases as a Hidden
Markov process. The logging data is automatically segmented into fragments of
310 s and transformed into vectors of observations O. The observations are
automatically clustered, and Hidden Markov models trained with the observations
where the cluster labels serve as output symbols of the Hidden Markov models. The
aim of the model is to yield the most probable HTP for each observation o in O,
taking into account (1) the feature values (dimensions) of the current observation
and (2) the HTPs assigned to the preceding observations o1 , o2 , : : : , on . In a final
step the cluster labels are mapped on the three HTPs: orientation, revision and pause.
The authors show that the system reaches as high an accuracy to predict the times
spent on orientation, revision and pause as some of the human annotators.
There has been a long tradition of studying priming effects in comprehension and
production models of human sentence processing. More recently, effects of lexical
priming in translation tasks have been observed.
Chapter 9 shows that translators are primed in terms of semantics and syntax
already during very early stages of the reading process. Two features of the TPRDB, i.e., relative word order (Cross) and word translation entropy (HTra), are used
to predict first fixation durations, among other early eye movement measures. A
first fixation duration is the time a reader spends on a word, before either refixating that same word or before moving the gaze to a different word. This chapter
shows that reading of a source text word leads to the automatic activation of shared
semantic and structural representations. This chapter further shows that these primed
representations serve as the basis for later, conscious processes during which the
source text is regenerated in the target language. The results presented in this chapter
further suggest that word recognition is essentially non-selective, i.e., during the
early stages of reading, the reader makes no distinction regarding the language to
which a word belongs and both linguistic systems are co-activated. Implications for
models of the bilingual lexicon are discussed.
In Chap. 10 the authors provide evidence of priming at the level of syntactic
structure. By introducing a concept of syntactic entropya measure of uncertainty
for a translator to pick syntactic structures for a target sentence given a source
sentencethe authors correlate syntactic entropy with the observable measurements
10
M. Carl et al.
found in the TPR database such as the time spent reading either the source text or the
target and typing speed. They demonstrate positive correlations between syntactic
entropy and the durations for translation activities, in translation tasks across a few
language pairs. In a monolingual copy task these correlations between syntactic
entropy and behavioural measures are not observed, lending support to the claim
that not only the lexicon but also syntactic structures might be co-activated for the
two languages.
Chapter 11 investigates translation and post-editing processes of cohesive chains
in translations from Portuguese to Chinese. One group of participants translated
and another group of participants post-edited the same text. Eye movements and
keyboard activity for two cohesive chains were analysed. Establishing a semantic
relationship between the words in one of these chains relied on the general lexicon
of the language, while doing the same for the other chain required text-local
relationships. It was hypothesized that establishing text-local semantic relationships
was more difficult than establishing semantic relationships on the basis of the
general lexicon. The authors find that the type of chain has an effect on eye
movements on the target text and on keyboard activity, suggesting that cohesion
is established mainly during target text production. The task had no effect on
the processing of cohesive chains, suggesting that cohesive chains are processed
similarly in post-editing and translation.
Typing Chinese texts involves using a graphical user interface which converts
sequences of Alphabetic letters into Chinese characters. This chapter describes also
how process data from this different input method for text is captured.
Chapters 12, 13 and 14 re-consider and underpin some of the basic units and
annotations, on different levels of granularity and on the level of the translation
product and translation process data that were assumed in the previous of the
assumptions in the previous chapters, the notion of basic human translation
processes activity unit orientation, revision and pausing used in (HTPs) during postediting of machine translation output:
Chapter 12 discusses the merits of three possible ways of operationalizing
restructuring of source material in the target text. The first of these possibilities
is the one reported in the context of Chap. 10. The author points at the fact
that the annotations which were used for the analysis in Chap. 10 were relatively
shallow and, by analysing small number of examples in minute detail, suggests
how the annotation could be improved in order to better capture the variation in the
alternative translations. In addition, the author discusses the merits of an annotation
system used in a large product-based corpus and argues that this annotation system
would most likely capture more fine-grained details which are not covered by
the annotation system used for the analyses presented in Chap. 10. However, this
corpus does not contain process data. Finally, the most promising annotation schema
which might best capture restructuring effort is discussed in the final sections of
the chapter. This annotation schema uses relevance theoretical notions applied to
translation.
Chapter 13 presents an experiment which investigates the claim that novices
translate more literally than professionals. Previous research suggests that novices
11
translate more literally than professional translators, because novices focus less on
a representation of the whole text at a discourse level than professionals do, who
rely less on linguistic equivalence and take into account more world knowledge
and pragmatic considerations. Three groups of twenty (non-professional bilinguals,
student translators and professional translators) took part in the experiment which
had two conditions: translating after a first reading of the source text in addition to
producing a summary of the source text in the target language versus translating
straight away without a first reading and summary. Results showed that students
translated freer than professionals, but initial reading and summary of the source
text had a different effect on professionals and students: students translated more
literally after a first reading and professionals translated freer. The definition of
literality used in this chapter is different to the one used in Chap. 9. The target texts
in Chap. 13 are annotated manually, while the definition of literality used in Chap.
9 is generated automatically. A comparison between these two measures shows,
however, that they are significantly correlated.
Chapter 14 introduces an alternate annotation of the user activity data and
suggests methods that provide visualizations that may be easier for visual analytics
of the translation process data. The chapter goes on to discuss and quantify
the differences in translation-from-scratch and post-editing activities for general
purpose texts as compared to domain-specific texts. As it might be expected, the
time for post-editing is shorter than for translation-from-scratch independent of the
domain of the texts, the keystroke activity is less and the gaze on the target text is
more for post-editing domain texts.
The volume assembles a number of studies that explore possibilities for predictive modelling of human translation processes, which, we believe, opens prospectives for new directions in empirical translation process research.
References
Alabau, V., Bonk, R., Buck, C., Carl, M., Casacuberta, F., Garca-Martnez, M., et al. (2013).
CASMACAT: An open source workbench for advanced computer aided translation. The Prague
Bulletin of Mathematical Linguistics, 100, 101112.
ALPAC. (1966). Languages and machines: Computers in translation and linguistics. A report by
the Automatic Language Processing Advisory Committee, Division of Behavioral Sciences,
National Academy of Sciences, National Research Council (124 pp.). Washington, D.C.:
National Academy of Sciences, National Research Council (Publication 1416).
Carl, M. (2012). Translog-II: A program for recording user activity data for empirical reading and
writing research. In Proceedings of the 8th international conference on language resources and
evaluation (LREC) (pp. 41084112), Istanbul, Turkey.
Gpferich, S., & Jakobsen, A. L. (Eds.). (2008). Looking at eyes (Copenhagen studies in language,
Vol. 36). Frederiksberg: Samfundslitteratur.
Gpferich, S., Jakobsen, A. L., & Mees, I. M. (Eds.). (2010). Behind the mind (Copenhagen studies
in language, Vol. 37). Frederiksberg: Samfundslitteratur.
Hansen, G. (Ed.). (1999). Probing the process in translation: Methods and results (Copenhagen
studies in language, Vol. 24). Denmark: Samfundslitteratur.
12
M. Carl et al.
Hansen, G. (Ed.). (2002). Empirical translation studies: Process and product (Copenhagen studies
in language, Vol. 27). Denmark: Samfundslitteratur.
Hardison, O. B., & Golden, L. (1995). Horace for students of literature. The Ars Poetica and its
tradition. Miami: University Press of Florida.
Holmes, J. S. (1972). The name and nature of translation studies. In Translation section of the third
international congress of applied linguistics, August 2126 (pp. 6679). Copenhagen.
Jakobsen, A. (2003). Effects of think aloud on translation speed, revision and segmentation. In F.
Alves (Ed.), Triangulating translation: Perspectives in process oriented research (pp. 6995).
Amsterdam: Benjamins.
Jakobsen, A., & Schou, L. (1999). Translog documentation. In G. Hansen (Ed.), Probing the
process in translation: Methods and results (pp. 136). Frederiksberg: Samfundslitteratur.
Jakobsen, A. L. (2011). Tracking translators keystrokes and eye movements with Translog. In
C. Alvstad, A. Hild, & E. Tiselius (Eds.), Methods and strategies of process research integrative
approaches in translation studies (pp. 3755). Amsterdam: John Benjamins Publishing
Kay, M. (1998). The proper place of men and machines in language translation. In Readings in
machine translation (pp. 221232), MIT Press.
Koehn, P., Carl, M., Casacuberta, F., & Marcos, E. (2013). CASMACAT: Cognitive analysis
and statistical methods for advanced computer aided translation. In A. Way, K. Simaan,
M. L. Forcada, D. Grasmick, & H. Depraetere (Eds.), Proceedings of the XIV Machine
Translation Summit (p. 411). Allschwil: European Association for Machine Translation.
Krings, H. P. (1986). Was in Den Kpfen von bersetzern Vorgeht: Eine Empirische Untersuchung
Zur Struktur Des bersetzungsprozesses an Fortgeschrittenen Franzsischlernern. Tbingen:
Gnter Narr Verlag.
Lrscher, W. (1991). Translation performance, translation process, and translation strategies. A
psycholinguistic investigation. Tbingen: Gnter Narr Verlag.
Mauranen, A., & Kujamki, P. (2004). Translation universals: Do they exist? Amsterdam;
Philadelphia: John Benjamins.
Mees, I. M., Alves, F., & Gpferich, S. (Eds.). (2010a). Methodology, technology and innovation
in translation process research (Copenhagen studies in language, Vol. 38). Frederiksberg:
Samfundslitteratur.
Mees, I., Gpferich, S., & Alves, F. (Eds.). (2010b). New approaches in translation process
research (Copenhagen studies in language, Vol. 39). Frederiksberg: Samfundslitteratur.
Pchhacker, F., Jakobsen, A. L., & Mees, I. M. (Eds.). (2007). Interpreting studies and beyond. A
tribute to Miriam Shlesinger (Copenhagen studies in language, Vol. 35). Denmark: Samfundslitteratur.
Robinson, D. (1997). Western translation theory: From Herodotus to Nietzsche. Manchester: St.
Jerome Publishing.
Sanchis-Trilles, G., Alabau, V., Buck, C., Carl, M., Casacuberta, F., & Martinez, M. G. (2014).
Interactive translation prediction versus conventional post-editing in practice: A study with the
CasMaCat workbench. Machine Translation, 28(3-4), 217235.
Sharp, B., Zock, M., Carl, M., & Jakobsen, A. L. (Eds.). (2011). Human-machine interaction in
translation (Copenhagen studies in language, Vol. 41). Frederiksberg: Samfundslitteratur.
Toury, G. (1995). Descriptive translation studies and beyond. Amsterdam; Philadelphia: John
Benjamins.
Chapter 2
Abstract Since its existence 10 years ago, the Center for Research and Innovation
in Translation and Translation Technology (CRITT) at the Copenhagen Business
School has been involved in Translation Process Research (TPR). TPR data was
initially collected by the Translog tool and released in 2012 as a Translation
Process Research Database (TPR-DB). Since 2012 many more experiments have
been conducted and more data has been added to the TPR-DB. In particular, within
the CASMACAT (Sanchis-Trilles et al. 2014) project a large amount of TPR data
for post-editing machine translation was recorded and the TPR-DB has been made
publicly available under a creative commons license. At the time of this writing, the
TPR-DB contains almost 30 studies of translation, post-editing, revision, authoring
and copying tasks, recorded with Translog and with the CASMACAT workbench.
Each study consists of between 8 and more than 100 recording sessions, involving
more than 300 translators. Currently, the data amounts to more than 500 h of
text production time gathered in more than 1400 sessions with more than 600,000
translated words in more than 10 different target languages.
This chapter describes the features and visualization options of the TPR-DB. This
database contains recorded logging data, as well as derived and annotated information assembled in seven kinds of simple and compound processand product units
which are suited to investigate human and computer-assisted translation processes
and advanced user modelling.
Keywords Empirical translation process research Translation process research
database
M. Carl ()
Center for Research and Innovation in Translation and Translation Technology, Department of
International Business Communication, Copenhagen Business School, Frederiksberg, Denmark
e-mail: mc.ibc@cbs.dk
M. Schaeffer
Center for Research and Innovation in Translation and Translation Technology, Department of
International Business Communication, Copenhagen Business School, Frederiksberg, Denmark
Institute for Language, Cognition and Computation, University of Edinburgh, Edinburgh, UK
S. Bangalore
Interactions Corporation, New Providence, NJ, USA
Springer International Publishing Switzerland 2016
M. Carl et al. (eds.), New Directions in Empirical Translation Process Research,
New Frontiers in Translation Studies, DOI 10.1007/978-3-319-20358-4_2
13
14
M. Carl et al.
2.1 Introduction
Empirical translation process research requires the availability of suitable process
data. Thus, in order to allow for empirically grounded translation process research,
Jakobsen and Schou (1999) have devisedin 1995a keyboard logging tool,
Translog, with which translation sessions could be recorded, the data visualized
and statistically analyzed. Since then, Translog, the data acquisition tool, and the
format and representation of the collected process data have undergone a number of
changes (Jakobsen 2011) so as to allow for more powerful analyses of the data: The
current Translog-II (Carl 2012a) has been complemented with the CASMACAT
workbench (Sanchis-Trilles et al. 2014, see also Chap. 3 in this volume) as a
browser-based machine translation post-editing tool and the raw logging data
gathered at the output of the recorded translation sessions can be enriched with
annotations and converted into a Translation Process Research Database (TPR-DB).
As of now, the TPR-DB has accumulated a large amount of process data, with the
aim to:
1. Represent activity data for TPR in a consistent manner, so as to facilitate
research across hundreds of translation sessions, different languages and different
translation modes.
2. Implement and make available a large number of features across the entire
collected dataset which would be difficult or nearly impossible to compute
individually for each session separately.
The aim of the TPR-DB is thus to stimulate and lower the barrier of entry for
large-scale translation process research facilitated by a consistent database format
and a well-defined set of features.1
The TPR-DB is organized in studies and sessions. As described in various
chapters in this volume, a study is a collection of sessions that are conducted in
the same experimental context. Translog and CASMACAT generate a single log file
for each session. This raw logging data is subsequently annotated and processed
into a set of tables that contain a rich set of features and attributes.
This chapter describes the tables and the features that are extracted from logged
and annotated data. Section 2.2 provides an overview of the TPR-DB; it describes
the process of annotating the data logged from a translation session, their mapping
into the TPR-DB, and gives an overview over the TPR-DB tables. Sections 2.3,
2.4 and 2.5 describe the tables in more detail. Section 2.3 tackles the tables that
encode single keystrokes and fixations. Section 2.4 illustrates tables of production
and fixation units. A special property of those units is parallel and alternating
reading and typing behavior that indicates the workload of a translator. Section 2.5
describes the tables of translation product units, i.e. units that are derived from
The database is freely available under a creative commons license, and can be downloaded free
of charge from https://sites.google.com/site/centretranslationinnovation/tpr-db
15
the final translation product: source tokens, target tokens and alignment units.
Section 2.6 shows visualization possibilities of the process data and Sect. 2.7 points
to possibilities for adding externally generated features to the TPR-DB. Three
appendixes complement this chapter, Appendices 1 and 2 give an overview of the
studies in the TPR-DB. An exhaustive list of features is given in the Appendix 3.
While the Translog-II and CASMACAT logged UAD is slightly different, the structure of the
generated tables is identical.
16
M. Carl et al.
process data, such as manually gaze to word re-mapping. Further, a data integration
step computes keystroke-to-token and fixation-to-token mappings, as described in
(Carl 2012a). Finally, a number of different tables are produced, which contain a
large number of features, describing various kinds of product and process units, as
described in Sect. 2.2.3.
The TPR-DB compilation process is fully automatic, but provides a GUI in which
word alignments can be semi-automatically adjusted. Figure 2.2 shows the YAWAT
17
Task
R
P
PIO
Part
4
7
7
Sess
14
7
7
Texts
14
2
2
SL
en
en
en
TL
es
es
es
FDur
4:54
16:51
15:68
KDur
1.21
7.90
7.98
PDur
0.28
3.41
3.49
SLen
2901
2901
2901
TLen
40,614
20,273
20,341
The example is taken from a CASMACT study. Tasks are R: revision, P: post-editing and PIO:
Interactive post-editing with online learning. A full list of task descriptions is in Appendix 1.
18
M. Carl et al.
Seven participants produced seven translations (sessions) for each of the P and the
PIO tasks, and 4 participants subsequently reviewed the 14 post-edited texts.
The total production time is given in terms of FDur, KDur, and PDur, which
represent the sum of the durations for all sessions, excluding pauses before the
first keystroke and after the last keystroke, as well as pauses between successive
keystrokes depending on the pause length:
In Table 2.1, the total duration (FDur) for post-editing (P) of the 7 texts by the 7
post-editors took 16.51 h. Two additional duration values indicate typing durations.
According to the KDur value post-editors were typing roughly 50 % of that time
(7.90 h) while based on PDur it was only 3.41 h, or approximately 20 % of the
post-editing time.4 Table 2.1 shows also average source text length (SLen) and the
total number of produced target language words (TLen).
A large number of different pause thresholds have been suggested and are used. Vandepitte et al.
(2015) segment keystroke sequences at 200 ms, while Lacruz and Shreve (2014: 250) find that
complete editing events are separated by long pauses (5 s or more.) They normally contain short
pauses (more than 0.5 s, but less than 2 s,) and more effortful complete editing events will often
include multiple short pauses. Post-editors may make intermediate duration pauses (more than 2 s,
but less than 5 s) during a complete editing event. Jakobsen (2005) suggests 2.4 s for his definition
of key performance.
5
The letters in brackets in the list represent the file extensions in the TPR-DB. The section in italics
points to the section where the table is described in more detail.
19
20
M. Carl et al.
Table 2.2 General session information including length of the source and the target text in terms
of token and characters
Study
BML12
BML12
BML12
Session
P01_E5
P01_P4
P01_T1
SL
en
en
en
TL
es
es
es
Part
P01
P01
P01
Text
5
4
1
TokS
139
110
160
LenS
788
668
838
TokT
153
131
180
LenT
840
763
964
:::
:::
:::
:::
Dur
310,234
268,328
757,281
TimeD
114,140
71,234
92,016
TimeR
232,656
264,765
290,391
Pause
0
0
0
Fdur
167,110
193,531
654,812
Kdur
80,374
29,407
314,378
Pdur
23,366
14,485
210,415
Pnum
29
15
72
:::
:::
:::
:::
FixS
3
551
1122
TrtS
167
78,224
115,692
FixT
661
236
392
TrtT
68,214
18,668
26,605
Scatter
17
9
30
Mins
85
77
1152
Mdel
93
62
186
Ains
0
0
0
Adel
0
0
0
21
keystroke, which coincides with the end of the orientation phase. TimeR
indicates the time when the drafting phase ended and the revision phase
started. This is defined as the end of the first micro unit (see below) in which
the last token of the source text was translated (cf. Jakobsen 2002).
The durations FDur, KDur, and PDur were already discussed previously.
The PDur interval fragments the UAD into production units (PUs), which
will be discussed in Sect. 2.5. Pnum provides the number of PUs within a
session.
(c) Session processing information provides keystrokes and gazing behaviour:
FixS and FixT are the number of fixations on the source token(s) and on the
target token(s), while TrtS and TrtT represents the total reading time, i.e. the
sum of all fixation durations on the source and target text respectively.
Mins and Mdel are the number of manually inserted and deleted characters,
while Ains and Adel are the automatically inserted and deleted characters.
Ains and Adel account for post-editing in CASMACAT where the edited text
can be programmatically changed in the interactivity mode.
The Scatter feature indicates how often the typing was not in a sequential
order, i.e. how often the translator or editor typed successive keystrokes
which were part of two or more different words.
Tables 2.2, 2.3, and 2.4 show three sessions from the BML12 study, conducted by
participant P01. Text 5 was edited (Task D E), text 4 was post-edited (Task D P) and
text 1 was translated from scratch (Task D T). Translation took longest in terms of all
available duration measures, Dur, FDur KDur and PDur, whereas post-editing was
quicker than editing with respect to Dur, KDur and Pdur, but slower with respect to
FDur. Not that editing was a more scattered activity than post-editing as many more
PUs were produced.
STseg
1
2
3
4
5
6
7
8
9
10 C 11
TTseg
1
2
3
4
5
6
7
8
9
10
Study
BML12
BML12
BML12
BML12
BML12
BML12
BML12
BML12
BML12
BML12
Session
P01_T1
P01_T1
P01_T1
P01_T1
P01_T1
P01_T1
P01_T1
P01_T1
P01_T1
P01_T1
Dur
20,028
38,951
83,452
73,292
24,373
14,030
58,966
40,779
32,812
61,326
:::
:::
:::
:::
:::
:::
:::
:::
:::
:::
:::
Scatter
2
5
5
4
3
2
4
4
1
6
Literal
27:93
48:24
67:41
29:45
31:67
33:3
19:65
151:9
31:6
29:11
HTra
2.16
1.23
1.7
1.74
1.84
2.43
0.97
2.9
1.38
1.24
HSeg
1.18
0.67
0.95
0.8
0.79
1.36
0.46
1.59
0.72
0.61
CrossS
2
2
1.57
1
1.14
1.3
1.47
2.94
1.21
1.67
CrossT
1.29
1.29
1.08
1.29
1.5
1.09
0.94
1.19
1.1
1.28
22
M. Carl et al.
23
The Nedit attribute indicates how often the segment was revised. A number >1
indicates that the translator first drafted the translation and then came back later to
revise it. For instance, segment 4 was drafted and then three times revised, whereas
only STseg 9 was not revised during the translation process.
The features Literal, HTra, HSeg, CrossS and CrossT will be discussed in detail
in Sects. 2.4.4 and 2.4.6 and there are many examples of their application throughout
this volume. CrossS and CrossT measure the amount of syntactic similarity between
the source and the target text. HTra and HSeg give the average word translation and
average segmentation entropy while Literal is the sum of the product of HTra and
CrossS.
24
M. Carl et al.
3
3
3
3
BML12 P01_T1 en es T
BML12 P01_T1 en es T
1
1
TAU
SAUnbr TAUnbr
P01 of
de
1
P01 sleeping_ tranquili- 2
medicine zantes
1
1
Del
21
0
Dur
11,407
1610
FixS
2
27
FPDurS
167
631
TrtS
167
1896
FixT
18
8
FPDurT
50
465
TrtT
1232
615
InEff
15
0.94
Munit
2
1
Edit
de_medicinas_para_dormir[rimrod_arap_sanicidem]
tranquilizantes
Tables 2.6, 2.7, and 2.8 show the English ! Spanish translation in two AU44
and AU45 of of $ de and sleeping medicine $ tranquilizantes. As indicated in
the columns SAUnbr and TAUnbr, AU44 is a one-to-one correspondence, whereas
AU45 is a two-to-one correspondence. The Edit column traces the sequence of
keystrokes which were typed to produce the translation. It shows for AU44 that
first de medicinas para dormir was typed but later medicinas para dormir was
deleted, so that only de remained from that initial typing activity, while for AU45 ,
the translation tranquilizantes was typed with no revision. The table shows the
overall number of keystrokes produced: for AU44 there were 24 insertions, of which
21 characters (the string in square brackets) were later deleted. Note that deletions
are to be read in the reverse direction, so that reading [rimrod_arap_sanicidem]
from right-to-left results in the deleted string. Even though medicinas para dormir
and tranquilizantes are paraphrases, the former deleted string is part of AU44 ,
while the latter is part of AU45 . The assignment of multi-word deletions to words
in the final text to which they contribute can only be approximated, so that an
error margin to neighboring words should be expected. In line with Alves and Vale
(2011), we refer to these revisions as micro units that will be discussed in Sect. 2.4.2.
The time needed to type the translation is given by the duration feature (Dur).
In the example above, more than 11 s (11,407 ms) were needed for all the typing
activities in AU44 while 1610 ms were needed to type AU44 tranquilizantes.
Table 2.8 shows the total reading time (TrtS and TrtT) and number of fixations
(FixS and FixT) on the source token(s) and on the target token(s). According to this
information, the SAU word of in AU44 was fixated twice with a total reading time
of 167 ms, while the translation de was fixated 12 times with a total reading time
of 1232 ms. The source string in AU45 was fixated 27 times with a TrtS of 1896 ms
and the target string received 8 fixations with a TrtT of 615 ms.
25
0
0
0
0
0
0
0
26
M. Carl et al.
(2.1)
In most of the cases, the length of a word equals the number of character insertions
minus character deletions C 1. We add 1 since the white space following the word is
counted as being part of it. However, in some cases no white space follows a word,
in which case the InEff value may be smaller than 1. Thus, for AU44 in Table 2.8
the number of the insertion and deletion keystrokes amounts to 45 which, divided
by the length 3 of the final word of (including a white space), results in an editing
inefficiency of 15, while the number of keystroke string to produce tranquilizantes
in AU45 amounts to the length of the final translation, and thus the editing effort is
0.94. Note that for post-editing the InEff can be 0 if an MT proposal was accepted
without any modifications, while it would be 2 if the word was deleted and another
word of identical length was retyped.
27
position leading to a Cross value of 2, etc. In this way, the TT Cross values indicate
the relative reordering of ST words to arrive at the TT translation.
A Cross value is also computed for the source text. The ST Cross values assume
that the ST text is the output text and the TT text is the input. Accordingly ST Cross
indicates the relative reordering of TT words to arrive at the ST.
Languages with similar word order will have low average Cross values. In a
monotonous 1-to-1 translation all Cross values are 1. The more syntactic reordering
between source and target text takes place the higher the average Cross value. See
also Chap. 8, Sect. 2.3 for extended discussion on the Cross feature.
STid
5
6
4
STseg
1
1
1
Study
BML12
BML12
BML12
Session
P01_T1
P01_T1
P01_T1
Lemma
four
life
sentence
Prob1
3.4339
3.3508
4.64
Prob2
50
50
6.1669
PoS
CD
NN
NNS
TToken
cuatro
perpetuas
cadenas
TTid
5
7
6
28
M. Carl et al.
29
7
We follow suggestions as proposed by Kertz Lab as in https://wiki.brown.edu/confluence/display/
kertzlab/Eye-Tracking+While+Reading
STid
4
5
6
Study
BML12
BML12
BML12
Session
P01_T1
P01_T1
P01_T1
:::
:::
:::
:::
FFTime
280
1843
2515
RPDur
1317
650
866
Regr
1
0
1
FixS
9
8
12
FPDurS
50
650
666
TrtS
1567
1600
1498
FixT
3
24
0
FPDurT
133
149
0
TrtT
183
1945
0
30
M. Carl et al.
31
computed as the ratio of the number of alignments s ! ti counted in TTs over the
total number of observed TT tokens, as in Eq. (2.2):
p .s ! ti / D
count .s ! ti /
#translations
(2.2)
Xn
iD1
p .s ! ti / log2 .p .s ! ti //
(2.3)
Word translation entropy H(s) is the sum over all observed word translation probabilities (i.e. expectations) of a given ST word s into TT words ti : : : n multiplied with
their information content. It represents the average amount of information contained
in a translation choice. Thus, if a given source word s has only one possible
translation t in a given context, its word translation probability is p(s ! t) D 1, its
information I(p(s ! t)) D 0bit and thus the entropy H(s) D 0 is minimal. The more
different equally probable translations a source word has, the higher is its word
translation entropy H(s). Chapter 10, Sect. 10.2 in this volume gives a more in depth
background on word translation entropy.
Perplexity (PP) is related to entropy H, as an exponential function as shown in
Eq. (2.4):
PP .s/ D 2H.s/
(2.4)
The higher the perplexity, the more similarly likely choices exist and hence the more
difficult is a decision to make.
The ST tables provide some of this information: CountT represents the number
of observed SToken ! TTokeni alignments count(s ! ti ), and AltT the number of
different TTokensi . ProbT is the probability of that token and HTra is the word
translation entropy of SToken. For instance, consider STid4 in Table 2.13. The
translation four ! cuatro occurred 25 times in the corpus with a probability of
0.8. With this we can reconstruct the total number of translations in the corpus to be
31 25/0.8, and the remaining six translations (3125) were distributed over three
different word forms.
HSeg indicates the entropy of the word alignment segmentation. For instance,
an expression like life sentences could be aligned as a multi-word unit, or
compositional as two different units. The number of source and target language
words of the alignment unit (AU) of which life is part, is reflected in the
SAUnbr and TAUnbr values respectively. The HSeg attribute takes into account this
alignment segmentation context, and is calculated in a similar way as HTra with the
difference that it relies on counting identical TAUnbr, instead of TToken.
STid
4
5
6
Study
BML12
BML12
BML12
Session
P01_T1
P01_T1
P01_T1
:::
:::
:::
:::
SToken
four
life
sentences
SAUnbr
1
1
1
TAUnbr
1
1
1
AltT
4
8
8
CountT
25
17
18
ProbT
0.8065
0.5484
0.5806
HTra
0.9511
1.9385
1.899
HSeg
0.7088
0.6595
0.4587
32
M. Carl et al.
33
Time: the delay in time (ms) after which the keystroke is produced
Type: whether the keystroke is an insertion or a deletion
Cursor: at which offset in the target text the keystroke is produced
Char: which character (UTF8) is produced (inserted or deleted)
TTseg: the target segment (sentence) that is being produced
STid: the source text word id of which the produced target word is a translation
TTid: the id of the target text word that is being produced by the keystroke
The example in Table 2.14 shows the processed keylog data for the production of
two Spanish words El enfer[e]mero , as a translation of source word STid2 . These
are the first two words of the first segment in the translation. The table records only
text modifying keystrokes, insertions and deletionsnavigation information such
as mouse clicks etc. are ignored. Insertions and deletions can be produced manually
(Mins and Mdel) or automatically (Ains and Adel). An example for a manual deletion
is in line 9 in Table 2.14.
34
M. Carl et al.
Table 2.14 Keystroke information as extracted from session P01_T1 of study BML12
KDid
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Time
92,016
92,172
92,313
92,375
92,563
92,828
92,938
93,047
93,266
93,610
93,797
93,875
93,938
94,078
94,203
Type
Mins
Mins
Mins
Mins
Mins
Mins
Mins
Mins
Mins
Mdel
Mins
Mins
Mins
Mins
Mins
Cursor
0
1
2
3
4
5
6
7
8
8
8
9
10
11
12
Char
E
l
_
e
n
f
e
r
e
e
m
e
r
o
_
TTseg
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
STid
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
TTid
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
3.
4.
5.
6.
7.
8.
Win: source window (1) or target window (2) in which the fixation is observed
Cursor: mapping of the fixation center on the closest character in the window
STid: id of the source text token that is being looked at
TTid: id of the target text word that is being looked at
Seg: segment id of the source text word (STid) that is being looked at
ParalK: amount of concurrent keyboard activity, i.e. production unit (PU, see
Sect. 2.5.3)
9. Edit: character(s) that have been typed during fixation
10. EDid: the target segment id that is being produced by the typed characters
Table 2.15 shows a sequence of 13 fixations, FDid 507519, which are part of
the P01_T1 session, introduced above. All fixations take place in window 1, on the
first segment and STid tokens 4, 6, 3 and 5, which are translated into TTids 5, 6 4 and
7, respectively. Some of the fixations show concurrent typing activity: as the amount
of parallel keyboard activity (ParalK) equals the fixation duration time (Dur), the
first seven fixations (FDid 507513) overlap to 100 % with text production. No
keyboard activity took place during fixations FDid 515517, and a partial overlap
of 16 % (124 ms/750 ms) typing activity is recorded for fixation FDid 518. During
fixations 507510, for instance, was typed (Edit) the sequence eno, which is part
of the production of asesino. The column EDid indicates the STid of the produced
translation, i.e. asesino is a translation of STid 3. In Sect. 2.5.3, we show that the
keyboard sequence is part of one production unit PU0 while the fixations are part of
FU14 . Section 2.6 visualizes the data in a larger context.
35
Time
94;530
94;749
95;077
95;218
98;952
99;015
99;202
99;265
99;499
99;624
99;718
99;780
100;546
Dur
150
67
67
67
50
167
50
83
100
83
50
750
250
Win
1
1
1
1
1
1
1
1
1
1
1
1
1
Cursor
25
24
25
26
36
37
36
25
16
16
17
24
30
Seg
150
1
1
1
1
1
1
1
1
1
1
1
1
STid
4
4
4
4
6
6
6
4
3
3
3
4
5
TTid
5
5
5
5
6
6
6
5
4
4
4
5
7
ParalK
150
67
67
67
50
167
50
1
0
0
0
124
250
Edit
e
n
o
i
b
EDid
3C
3C
3C
4C
4C
4C
4C
Table 2.16 Three production units from session P01_T1 of study BML12
PUid
0
Study
BML12
Session
P01_T1
Time
92,016
Dur
7250
Pause
92,016
Ins
34
Del
7
1
2
BML12
BML12
P01_T1
P01_T1
100,406
103,594
1313
4187
1140
1875
8
23
0
3
Edit
El_enfere[e]mero_
asesiono_re[er_ono]no_
recibe
_cuatro_
sentencias_de_vida.
__[__.]__
(2.5)
started at time 92,016 and was typed within 7250 ms, with no inter-key delay of
more than 1000 ms. It was preceded by a pause of 92,016 ms. The next PU1 starts
with a Pause of 1140 ms. Follows this pause the typing sequence starts at Time
100,406 ms and lasts for 1313 s. Table 2.16 indicates the number of insertions and
deletions of the PUs. PU0 contains 34 insertions (Ins) and 7 deletions (Del). The
36
M. Carl et al.
latter are within square brackets in the Edit column and must be read in the reverse
direction. Thus, the substring [er_ono] reflects actually the deletion ono_re, as
shown in example (2.6):
asesiono_re ! asesino_recibe
(2.6)
STseg
1
1
1
TTseg
1
1
1
STid
1C2C3
3C4
4C5
TTid
FixS ParalS
1 C 2 C 3 C 4 10
735
4C5
4
504
5C7
4
216
FixT
0
0
0
ParalT
0
0
0
Table 2.17 Three production units from session P01_T1 of study BML12
Scatter
0
0
1
CrossS
2.67
2
1.5
CrossT
1.25
1.5
1
PosS
NNP C VBP C NNS
NNS C CD
CD C NN
PosT
ART C NC C NC C VLfin
VLfin C CARD
CARD C ADJ
38
M. Carl et al.
Path
1:4 C 1:4 C 1:4 C 1:4C
1:6 C 1:6 C 1:6 C 1:4 C 1:3 C 1:3 C 1:3 C 1:4 C 1:5C
1:5 C 1:5 C 1:6 C 1:6 C 1:6 C 1:5 C 1:5C
39
(2.7)
Figure 2.5 shows boundaries of successive CUs and their labels: the first CU
with time stamp 231,500232,500 ms is a source text reading activity of 894 ms,
followed by an idle unit (Type 8) of 1107 ms in which no activities were recorded.
Then follows a typing CU (Type 7) at time stamp 233,500 ms of 4640 ms in
which concurrent ST reading and TT reading can be observed. During this time
span medicinas para dormir A is produced. This is followed by a target text
reading activity (Type 2, Duration 484 ms) in which the just typed word (dormir) is
monitored. The figure represents a translation progression graph (TPG) which will
be discussed in Sect. 2.6.
40
M. Carl et al.
The graph shows when segments loaded into the target buffer, when and where
translators read the source and the target segments, and when the text was modified.
TPGs are thus a useful means to assess the TPR-DB data qualitatively.
Another TPG is shown in Fig. 2.7. This graph puts into relation the translation
product on the source text (vertical axis, left) and the target text (vertical axis,
Fig. 2.6 A translation progression graph plotting keystroke and gazing information
41
Fig. 2.7 The progression graph shows product and process information from Tables 2.14, 2.15,
2.16, 2.17, and 2.18
right) and the translation process data on a time line on the horizontal axis. It
visualizes how the translation emerges in time. Insertions are represented in black
letters, deletions are red, and fixations are blue dots in rectangular boxes that stretch
their duration in time. The TPG in Fig. 2.7 plots the keystroke data of Table 2.14,
the fixation data from Table 2.15, as well as the three FUs from Table 2.18 and
three PUs of Table 2.16 and 2.17. The red horizontally striped boxes indicate PUs
while the green boxes represent FUs. The first part (approx. Time 92,000 ms to
94,000 ms) reproduces the production of words 1 and 2 (El enfermero) as plotted
in Table 2.14.
As discussed in Sects. 2.5.3 and 2.5.4, reading and writing activity can occur
in parallel. For instance, FU14 around time stamp 95,000 takes place while the
translator produces asesino, the translation of Killer, while FU15 and FU16
at time stamps 99,000 and 101,500 respectively only partially overlaps with two
adjacent PU0 and PU1 . Progression graphs, illustrate in a graphical manner the
relation between reading and writing activities.
42
M. Carl et al.
together with the actual key pressed and the time stamp of a keystroke (or mouse
movement) in its IDFX log file.
A script,8 InfuseIDFX.pl, can be used to integrate Inputlog IDFX logging data
into Translog-II files. The InfuseIDFX.pl script first synchronizes the Inputlog and
the Translog-II logging data based on common keystrokes and then inserts the data
that was collected outside the Tranlog-II (or CASMACAT) GUIs into the TranslogII log file. The TPR-DB compilation process subsequently generates an EX table
indicating usage of the external resources.
For instance in a browser-based application, Inputlog knows which window is
on focus. Successive keystrokes can accordingly be associated with the web page in
focus. In this way web searches can be tracked and reconstructed. On the one hand,
Inputlog is universally deployable in different windows-based applications. On the
other hand, Inputlog has no possibility to know where the typed characters occur in
a text. From Inputlog we know which keystrokes were pressed, but not necessarily
which characters are produced or which characters are deleted and we also do not
know where in a text these operations would take placeunless produced within
MS word.
As an example, Table 2.19 plots an excerpt from a converted Inputlog table
showing that Google Chrome was used as a main external resource in a Translog-II
session. At the time instant 33,453 ms, an application with the name TASKBAR
was activated for approximately s, followed by a search query in Google
Chrome, which lasted slightly more than 32 s. The user then went via an application
Menu Iniciar back to the Translog-II User program, which he left again after
14,297 s. The Edit column contains the concatenation of the typed keystrokes that
occur during the time in focus. It is empty if no keystroke was produced. Examples
of the EDIT string in the 32 s between time stamp 34,000 and 66,818, when Google
Chrome was in focus may consist of:
1. bring 

2. emit[..]otional trra[..]aduo

3. in [.] the arr[.]ticle
 traduo
presented

A query is usually terminated by a return, which is here encoded as

, and deletions are in square brackets [..]. Thus in (1) the translator
typed bring and then pressed the return key. In (2) the translator deleted twice
two characters in the input string. From the Inputlog IDFX file we do not know
which characters were deleted, but it is most likely that first it and then ra was
deleted using backspace so as to produce the search string emotional traduo.
In example (3) three search strings were produced in the article, traduo and
presented. While we can re-construct the query which the translator produced
in the external resource, we do not know what the results of these queries were.
However, we can trace the translators reaction from within Translog-II. The attribute
InfuseIDFX.pl script is part of the TPR-DB and can be downloaded from the TPR-DB website,
https://sites.google.com/site/centretranslationinnovation/tpr-db
EXid
0
1
2
3
Study
N1
N1
N1
N1
Session
P02_P1
P02_P1
P02_P1
P02_P1
Focus
TASKBAR
Google Chrome
Menu Iniciar
Translog-II User
Time
33,453
34,000
66,813
67,625
STsegN
2
2
2
2
STsegL
1
1
1
1
STidN
18
18
18
18
STidL
8
8
8
8
KDidN
27
27
27
27
KDidL
26
26
26
26
EDIT
Edit
44
M. Carl et al.
KDidL indicates the last keystroke (KDid) before Translog-II was left, KDidN gives
the next keystroke after s/he came back. Similarly, STidL and STidN indicate the
source word id of the translation of those keystrokes and STsegL and STsegN the
source segments. Thus, the last keystroke before the translator left Translog at time
33,453 was KDidL D 26 and the first keystroke after coming back into Translog-II
User was KDidN D 27. These two keystrokes are part of the production for the
translation of STidL D 8 and STidN D 18 which belong to two successive segments
1 and 2. While we thus do not know what exactly a translator may have learned
from visiting the external resource, we have a means of re-constructing the effect
by investigating the behavior that precedes and follows its consultation. A usage of
this tool is described in Chap. 6 in this volume.
Session
default
P03_P1.sg
P06_P1.sg
P03_P1.sg
BML12
ML12
BML12
BML12
STseg
default
2
2
3
SynH
0
0
0
0.721
STriplet
TPI
TPI
TAI_DAD
P06_P1.sg 3
0.721
TAI_DAD
P28_P1.sg 3
P32_P1.sg 3
P03_P1.sg 4
0.721
0.721
0
TAI_DAD
TAI_DAD
TPI_TAD
TTriplet
TPI
TPI
TAI_DAD_
IAD
TAI_DAD_
IAD
TAI_DAD
TAI_DAD
MPI
PrimeDiff
DIFF
PRIME
PRIME
DIFF
Prime
Prob
0
1
1
0.2
DIFF
0.2
PRIME
PRIME
DIFF
0.2
0.2
0
9
The script AddExtColumns.pl can be downloaded from https://svn.code.sf.net/p/tprdb/svn/ and
called with the parameters AddExtColumns.pl -C ExtraColumnsFile S Study_name
45
Acknowledgement This work was supported by the CASMACAT project funded by the European
Commission (7th Framework Programme). We are grateful to all contributors to the database and
for allowing us to use their data.
Appendix 1
Overall the TPR-DB contains more than 580 h of text production time in terms
of Fdur duration. In the 1689 sessions were involved 132 different translators
producing all together more than 660,000 words in 9 different languages.
The language pair en ! es is the by far the largest language represented in
TPR-DB, with 660 sessions, 500,000 target words and more than 320 h of Fdur
production time. The second most represented language pair is en ! hi with 161
sessions, more than 20,000 tokens in the Hindi translations and more than 46 h
of Fdur production time. The third language pair is en ! de with 146 sessions,
more than 24,000 tokens in the German translations and more than 24 h of Fdur
production time production time, followed by en ! da with 127 sessions, more than
18,000 tokens in the Danish translations and 12 h of Fdur production time. The rest
of the language pairs in the TPR-DB involve more than 20 translation directions in
7 different source and 16 target languages (This includes language directions not
shown in Table 2.21). Please consult the TPR-DB website for an updated version of
the database contents.
Each study in the TPR-DB was conducted with a (set of) research question(s) in
mind, which can be roughly summarized as follows:
(A) The TPR-DB contains ten studies conducted with the three different CASMACAT workbenches as follows:
1. ALG14: This study compares professional translator and bilinguals while
post-editing with the third prototype of the CASMACAT workbench
featuring visualization of word alignments.
2. CEMPT13: This study contains post-editing recordings with the second
prototype of the CASMACAT workbench, featuring interactive machine
translation.
3. CFT12: This study contains data of the first CASMACAT field trial from
June 2012, comparing post-editing with from-scratch translation.
4. CFT13: This study contains data of the second CASMACAT field trial
from June 2013, comparing post-editing and interactive machine translation.
5. CFT14: This study contains data of the second CASMACAT field trial
from June 2014, comparing interactive machine translation and online
learning.
6. EFT14: The study compares active and online learning during interactive
translation prediction
46
M. Carl et al.
Sess
30
30
8
8
10
8
10
64
63
60
20
20
27
27
27
27
7
7
7
7
60
11
11
10
8
15
18
18
17
10
10
4
4
69
60
60
40
19
15
10
SL
en
en
en
en
en
en
en
en
en
en
en
en
en
en
en
en
en
en
en
en
fr
en
en
en
es
en
en
en
en
en
pt
en
en
en
en
en
da
en
en
en
TL
da
en
es
es
da
da
da
es
es
es
pt
pt
es
es
es
es
es
es
es
es
pl
es
es
es
en
et
zh
zh
zh
pt
en
de
de
da
es
es
en
zh
zh
zh
Task
T
C
P
PA
T
T
P
P
T
E
PIA
P
R
PI
P
PIA
RE
R
P
PIO
T
PIVO
PI
PIVA
P
T
S
P
R
T
T
PIA
P
T
PI
P
T
P
T
E
Texts
4
4
2
2
1
2
2
6
6
6
2
2
26
9
9
9
7
7
2
2
2
3
3
3
4
3
1
1
1
3
3
2
2
3
24
24
3
6
5
5
Part
17
17
8
8
10
8
10
32
32
30
20
20
4
9
9
9
3
4
7
7
60
11
11
10
4
5
18
18
17
5
5
4
4
24
5
5
18
11
10
8
Fdur
4:6776
2:0436
2:6018
2:7954
1:4575
0:8079
0:4412
4:6394
9:8032
3:7009
6:634
5:5943
8:3388
30:0923
28:167
35:5658
3:8435
3:2497
16:8321
15:8297
33:8564
10:221
11:9495
10:7885
4
2:5457
2:0227
4:5318
2:594
5:6048
5:6391
2:7428
2:3311
7:4469
53:3764
51:7256
3:7061
2:6953
3:7369
0:7714
Kdur
Pdur
2:9704 1:9332
1:8316 1:6013
0:4854 0:1747
0:4437 0:1692
0:7493 0:448
0:5368 0:3213
0:1074 0:0569
0:9079 0:4418
5:9308 3:8062
0:9657 0:4729
1:823
0:5387
1:2678 0:5732
0:9733 0:4413
10:2351 3:3044
8:0677 3:51
11:2626 3:9125
0:2465 0:0586
0:3687 0:1485
7:9316 3:418
8:1574 3:4917
17:5784 11:2075
5:2041 2:2521
6:8647 3:2755
5:1993 2:3594
4
2:1901
1:1214 0:673
0:2641 0:0455
0:8192 0:1442
0:3451 0:0567
2:1218 1:2302
2:0787 1:1718
0:7284 0:2735
0:6374 0:2189
5:6824 3:8183
22:0971 9:5166
17:3211 7:4178
2:8926 2:0511
0:4817 0:0497
1:0512 0:1088
0:1564 0:0183
Stok
5085
5099
4460
4460
1100
786
970
9012
8936
8468
6706
6494
26,919
31,752
31,294
31,838
20,341
20,273
20,273
20,341
25,380
12,437
12,437
11,327
0.3586
1535
1947
1998
1946
2577
2611
2590
2590
10,571
72,109
72,126
5652
2708
2061
1295
Ttok
5075
5109
4807
4801
1056
751
1014
10,216
10,102
9594
6840
6585
28,738
33,871
33,770
34,047
22,015
22,251
22,067
22,284
20,329
13,549
13,696
12,472
0.1909
1186
1728
1845
1833
2781
2621
2668
2571
10,667
80,278
80,454
6206
2562
1916
1203
(continued)
47
Sess
MS13
16
MS13
16
MS13
22
MS13
18
NJ12
39
NJ12
61
PFT13
9
PFT13
19
PFT13
16
PFT13
15
PFT13
16
PFT14
3
PFT14
2
PFT14
2
RH12
2
ROBOT14
40
ROBOT14
40
SG12
46
SG12
45
SG12
47
TDA14
48
WARDHA13
34
WARDHA13
31
WARDHA13
27
ZHPT12
12
Total
1689
SL TL Task
zh
pt
zh
pt
en
en
en
en
en
en
en
en
en
en
es
en
en
en
en
en
en
en
hi
en
zh
7
pt
zh
pt
zh
hi
hi
es
es
es
es
es
es
es
es
es
nl
nl
de
de
de
en
hi
hi
hi
pt
9
P
2
T
2
T
2
P
2
T
6
P
6
P
1
PI
1
PIC
3
PIO
3
PIL
3
PIVO
2
PIVA
1
PIV
2
A
2
P
8
T
8
E
6
P
6
T
6
C
6
T
6
C
6
P
6
T
1
15
132
Kdur
Pdur
16
2:7139
0:9211
0:4443
16
2:3327
0:7687
0:1161
22
4:1631
2:1803
1:2265
18
2:555
0:6698
0:0934
20 14:4697
7:5368
3:3156
20 17:4402
6:8654
3:0615
9
2:0861
0:3154
0:1406
19
5:2058
1:5351
0:4267
16
2:7853
0:744
0:1518
15
2:4784
0:4741
0:0669
16
2:7226
0:6761
0:1511
3
2:1558
0:6775
0:1622
2
2:0228
0:7255
0:1843
2
1:987
0:7667
0:1905
2
2:9849
0:9786
0:6398
10 10:8706
3:2467
1:5417
10 12:2457
5:1006
3:1753
23
7:0716
1:8571
0:9342
23
8:027
1:9976
1:055
24 11:7259
4:7344
2:9421
8
3:8335
3:5653
2:6617
18 15:2298
3:6917
0:5553
18 11:49
5:3097
0:7569
15
8:0582
1:9611
0:4418
12
3:5244
1:4856
0:851
418 586:769 217:2386 100:2227
Stok
Ttok
1410
1386
1938
1555
5505
8581
3035
6689
5396
4611
5572
3245
2286
2161
1207
7375
7375
6522
6352
6632
6792
4832
4365
3780
1104
702,701
1648
1378
2216
1507
5784
9365
3144
7437
5147
4666
5344
3150
2184
2077
1207
7527
7329
6741
6470
6777
6779
4790
4104
4016
1603
660,595
The table shows summary information of the TPR-DB for each session: task, language direction, number
of different texts, number of different participants, production duration (Fdur, Kdur, Pdur) as well as total
source text length (STok) and total produced target language (TTok) in words (tokens)
7. JN13: This study is recorded with the second prototype of the CASMACAT workbench featuring interactive machine translation and word
alignments.
8. LS14: This study investigates learning effects with interactive post-editing
over a period of 6 weeks (longitudinal study) with the third prototype of
the CASMACAT workbench.
9. PFT13: This study is a pre-field trial test prior to the second CASMACAT
field trial.
10. PFT14: This study is a pre-field trial test prior to the third CASMACAT
field trial.
(B) The aim of the MultiLingual experiment is to compare from-scratch translation
(T), post-editing (P) and monolingual post-editing (E), for different translators
and for different languages. The six English source texts are translated by
48
M. Carl et al.
student and experienced translators; three texts (13) are news, three texts (45)
sociological texts from an encyclopedia. Texts were permuted in a systematic
manner so as to make sure that each text was translated by every translator and
every translator translated two different texts in each translation mode.
11. BML12: This study contains translating, post-editing and editing data of
six texts from English into Spanish.
12. KTHJ08: This study contains only translation data for the news text 13.
13. MS12: This study contains translating, post-editing and editing of the six
texts English into Chinese.
14. NJ12: This study contains translating, post-editing and editing of the six
texts English into Hindi by professional translators.
15. SG12: This study contains translating, post-editing and editing of the six
texts English into German.
16. TDA14: In this study participants were asked to copying the six English
texts.
17. WARDHA13: This study contains translating, post-editing and editing of
the six texts English into Hindi by students.
(C) In addition, the TPR-DB contains a few individual experiments that were
conducted with Translog-II:
18. ACS08: This study explores the way in which translators process the meaning of non-literal expressions by investigating the gaze times associated
with these expressions.
19. BD08: This study involves Danish professional translators working from
English into Danish.
20. BD13: This study involves secondary school students translating and postediting from English into Danish.
21. DG01: The study compares students, professional and non-professional
translators with and without a representation of the text.
22. GS12: This study contains post-editing data of four pieces of news from
Spanish into English.
23. HLR13: This is a translation study from English into Estonian (5 participants translating 3 different texts).
24. JLG10: This study investigates L1 and L2 translations from/to English and
Brazilian Portuguese.
25. LWB09: This study reports on an eye tracking experiment in which
professional translators were asked to translate two texts from L1 Danish
into L2 English.
26. MS13: This study is an investigation of translators behaviour when
translating and post-editing Portuguese and Chinese in both language
directions.
27. RH12: This is an authoring study for the production of news by two
Spanish journalists.
28. ROBOT14: This study investigates usage of external resources during
translation and post-editing.
49
29. ZHPT12: This study investigates translators behaviour when translating journalistic texts. The specific aim is to explore translation process
research while processing non-literal (metaphoric) expressions.
Appendix 2
During each session a particular Task is conducted, as follows:
A: Authoring of a journalistic text. Source and target languages are identical.
C: Copying a text (manually) from the source window into the target window.
Source and target languages are identical.
E: Editing of post-editing of MT output without access to the source text
(monolingual post-editing).
P: Traditional post-editing of MT output (no additional help is provided during
the process).
R: Review of post-edited text.
T: Translation from-scratch.
Within the CASMACAT context, a large number of different post-editing settings
were investigated:
PA: Traditional post-editing visualizing source (ST) and target (TT) alignment
links (triggered by mouse or cursor).
PI: Advanced post-editing through interactive translation prediction (ITP) /
interactive machine translation.
PIA: Advanced post-editing through ITP showing ST-TT alignments (visualization option).
PIC: Advanced post-editing through ITP showing ST-TT alignments (visualization option).
PIO: Advanced post-editing through ITP and online learning techniques.
PIL: Advanced post-editing through ITP showing the post-edited text (suffix) in
grey (visualization option).
PIV: Advanced post-editing through ITP showing Search&Replace bar, alignments and mouse-triggered alternative ITP options.
PIVA: Advanced post-editing through ITP and active learning techniques.
PIVO: Advanced post-editing through ITP and online learning techniques.
Appendix 3
This appendix lists all features that are used in the TPR-DB v2 to describe the
unit tables. There are in total 275 features and 111 different features describing 11
different unit tables discussed in this chapter. These features are clustered here into
50
M. Carl et al.
3. Tokens: information concerning source and target text tokens in the translation
product
STId: unique identifier of source text token (FD, KD, PU, ST, TT)
TTId: unique identifier of target text token (FD, KD, PU, ST, TT)
SAU: Source text segment string (AU)
TAU: Target text segment string (AU)
SAUnbr: Number of tokens in source side of alignment unit (AU, ST, TT)
TAUnbr: Number of tokens in target side of alignment unit (AU, ST, TT)
SToken: Source text token (ST, TT)
TToken: Target text token (ST, TT)
Lemma: Lemma of token (ST, TT)
PoS: Part-of-Speech of token (ST, TT)
PosS: Part-of-Speech of source token sequence (PU)
PosT: Part-of-Speech of target token sequence (PU)
51
52
M. Carl et al.
53
References
Alves, F., & Vale, D. C. (2011). On drafting and revision in translation: A corpus linguistics
oriented analysis of translation process data. Translation: Corpora, Computation, Cognition.
Special Issue on Parallel Corpora: Annotation, Exploitation, Evaluation, 1(1), 105122. http://
www.t-c3.org/.
Carl, M. (2012a). Translog-II: A program for recording user activity data for empirical reading and
writing research. In The eighth international conference on language resources and evaluation
(pp. 26). May 2127, 2012, Istanbul, Tyrkiet. Department of International Language Studies
and Computational Linguistics.
Carl, M. (2012b). The CRITT TPR-DB 1.0: A database for empirical human translation process
research. In S. OBrien, M. Simard, & L. Specia (Eds.), Proceedings of the AMTA 2012
workshop on post-editing technology and practice (WPTP 2012) (pp. 918). Stroudsburg, PA:
Association for Machine Translation in the Americas (AMTA).
Carl, M., & Kay, M. (2011). Gazing and typing activities during translation: A comparative study
of translation units of professional and student translators. Meta, 56(4), 952975.
Jakobsen, A. L. (2002). Translation drafting by professional translators and by translation students.
In G. Hansen (Ed.), Empirical translation studies: Process and product (pp. 191204).
Copenhagen: Samfundslitteratur.
Jakobsen, A. L. (2011). Tracking translators keystrokes and eye movements with translog.
In C. Alvstad, A. Hild, & E. Tiselius (Eds.), Methods and strategies of process research:
Integrative approaches in translation studies (Benjamins translation library, Vol. 94, pp. 37
55). Amsterdam: John Benjamins.
Jakobsen, A. L., & Schou, L. (1999). Translog documentation. In G. Hansen (Ed.), Probing the
process in translation methods and results (pp. 136). Copenhagen: Samfundslitteratur.
54
M. Carl et al.
Part II
Chapter 3
Abstract This chapter describes a pilot study aiming at testing the integration of
online and active learning features into the computer-assisted translation workbench
developed within the CASMACAT project. These features can be used to take
advantage of the new knowledge implicitly provided by human experts when
they generate new translations. Online learning (OL) allows the system to learn
from user feedback in real time by incrementally adapting the parameters of the
statistical models involved in the translation process. On the other hand, active
learning (AL) determines those sentences that need to be supervised by the user
so as to maximize the final translation quality minimizing user effort and, at the
same time, improving the statistical model parameters. We investigate the effect
of these features on translation productivity, using interactive translation prediction
(ITP) as a baseline. ITP is a computer assisted translation approach where the user
interactively collaborates with a statistical machine translation system to generate
high quality translations. User activity data was collected from ten translators using
key-logging and eye-tracking. We found that ITP with OL performs better than
standard ITP, especially in terms of typing effort required from the user to generate
correct translations. Additionally, ITP with AL provides better translation quality
than standard ITP for the same levels of user effort.
57
58
D. Ortiz-Martnez et al.
Keywords Active learning Computer assisted translation Interactive translation prediction Online learning Post-editing Statistical machine translation
3.1 Introduction
The use of machine translation (MT) systems for the production of post-editing
drafts has become a widespread practice in the industry. Many language service
providers use post-editing workflows due to a greater availability of resources and
tools for the development of MT systems, as well as a successful integration of MT
systems in well-established computer-assisted translation (CAT) workbenches.
This chapter reports on the CAT workbench developed within the CASMACAT
project.1 This study is focused on one of the different features implemented
in this workbench, more specifically, the interactive translation prediction (ITP)
approach (Langlais and Lapalme 2002; Casacuberta et al. 2009; Barrachina et al.
2009). Within the ITP framework (see Sect. 3.2 for more details), the user collaborates with a statistical machine translation (SMT) system so as to generate high
quality translations with less effort.
Conventional translation systems are not able to learn from user feedback,
repeating the same errors when translating the same or similar sentences contained
in a given document. One of the main goals of the CASMACAT project is to design
and implement techniques to effectively deal with this problem. For this purpose,
the ITP approach is extended by introducing two new features, namely, online and
active learning. These two new features (see Sect. 3.3 for more details) are designed
to allow the system to incrementally update the model parameters in real time from
the target translations validated by the user. After the models have been updated
for a specific sentence, the system will generate better translations not only for that
sentence but for similar ones, improving the productivity of the users. Despite the
strong potential of these features to improve the user experience (Ortiz-Martnez
et al. 2010; Gonzlez-Rubio et al. 2012; Bertoldi et al. 2013; Denkowski et al.
2014), they are still not widely implemented in CAT systems. To the best of our
knowledge, the only exception is Ortiz-Martnez et al. (2011), where the authors
describe the implementation of online learning within an ITP system.
This chapter reports the results obtained during an evaluation of the CASMACAT
workbench with human users under three different conditions2: (1) basic ITP, (2)
ITP with online learning, and (3) ITP with active learning (see Sects. 3.4 and 3.5).
The ultimate aim of testing these different configurations was to assess their
potential in real world post-editing scenarios for the benefit of the human translator.
CASMACAT: Cognitive Analysis and Statistical Methods for Advanced Computer Aided Translation. Project co-funded by the European Union under the Seventh Framework Programme Project
287576 (ICT-2011.4.2).
2
The logging data for this study can be found in the TPR-DB as the ETF14 study: https://sites.
google.com/site/centretranslationinnovation/tpr-db.
59
3.2 Background
In this section, we briefly describe the statistical approach to machine translation, as
well as its application to ITP.
(3.1)
(3.2)
The terms in Eq. (3.2) are the language model probability Pr.t/ that represents
the well-formedness of t and the translation model Pr.s j t/ that represents the
relationship between the source sentence and its translation. The reader should note
that, if we had perfect models, the use of Eq. (3.1) would suffice. Given that we
have only approximations, the use of Eq. (3.2) allows the language model to correct
deficiencies in the translation model.
However, in practice we often estimate Pr.t j s/ directly by combining all these
models (and possibly others) into a log-linear model (Och and Ney 2002):
(
Ot D arg max
t
N
X
)
n log.fn .t; s//
(3.3)
nD1
where fn .t; s/ can be any model that represents an important feature for the
translation process, N is the number of models (or features), and n is the weight of
the nth model in the log-linear combination.
Currently, most popular MT systems are based on the use of n-gram3 models (see
for instance Chen and Goodman 1996) to implement language models and phrasebased models (Koehn et al. 2003) as translation models. The so called n-gram
models assign probabilities to individual words of the target language taking into
account the last n-1 words. On the other hand, the basic idea of phrase-based
translation is to segment the source sentence into phrases, then to translate each
source phrase into a target phrase, and finally to reorder the translated target phrases
in order to compose the target sentence. If we summarize all the decisions made
60
D. Ortiz-Martnez et al.
Pr.QsK1 ; aQ K1 j Qt1K /
(3.4)
K;QaK
1
where each aQ k 2 f1 : : : Kg denotes the index of the target phrase Qt that is aligned
with the k-th source phrase sQk , assuming a segmentation of length K.
p
ts To view the resources
list
p To view
IT-1 k
a
ts
list
p To view
IT-2 k
ts
list
i
ng resources
p To view
IT-3 k
ts
END p To view
of
resources
listing
o
listing
resources
of
resources
Fig. 3.1 ITP session to translate a Spanish sentence into English. The desired translation is
the translation the human user wants to obtain. At iteration zero (IT-0), the system suggests a
translation (ts ). At IT-1, the user moves the mouse to accept the first eight characters To view and
presses the a key (k), then the system suggests completing the sentence with list of resources (a
new ts ). Iterations 2 and 3 are similar. In the final iteration, the user accepts the current translation
61
Depending on the system or the users preferences, the user might type the full
next word, or only some letters of it (in our example, the user types the single
next character a). Then, the system suggests a new suffix ts D list of resources
that completes the user-validated prefix and the input the user has just typed (p D
To view a). The interaction continues with a new prefix validation followed,
if necessary, by new input from the user. This process continues until the user
considers the translation to be satisfactory.
The crucial step of the process is the production of the suffix. Again, decision
theory tells us to maximize the probability of the suffix given the available
information. Formally, the best suffix of a given length will be:
Ots D arg max Pr.ts j s; p/
(3.5)
ts
(3.6)
ts
(3.7)
ts
Note that, since p ts D t, this equation is very similar to Eq. (3.2). The main
difference is that now the search process is restricted to those target sentences t that
contain p as prefix. This implies that we can use the same MT models (including
the log-linear approach) if the search procedures are adequately modified (Och et al.
2003). Finally, it should be noted that the statistical models are usually defined at
a word level, while the ITP process described in this section works at a character
level. To deal with this problem, during the search process it is necessary to verify
the compatibility between t and p at a character level.
3.2.3 Search
In conventional SMT, the best translation for a given source sentence is produced by
incrementally generating the target sentence from left to right. This problem can be
solved by means of dynamic programming (DP) techniques (Bellman 1957). Due
to the great complexity of the search process in SMT, DP-based search is typically
restricted by introducing the beam-search heuristic (Jelinek 1997).
Due to the demanding temporal constraints inherent to any interactive environment, performing beam-search each time the user validates a new prefix is
unfeasible. The usual approach is to rely on a certain representation of the search
space that includes the most probable translations of the source sentence. The
computational cost of this approach is much lower, since for each source sentence,
62
D. Ortiz-Martnez et al.
files
Move
1
Transfer
scanned
documents
scanned
3
files
scanned
to
to
7
directory
other
another
10
folder
documents
Fig. 3.2 Example of a word graph encoding different English translations for the Spanish source
sentence Transferir documentos explorados a otro directorio
the translation representation can be generated only once when obtaining the initial
translation and reused for further completion requests.
The representation usually chosen to implement ITP is known as word graph. A
word graph is a weighted directed acyclic graph, in which each node represents
a partial translation hypothesis and each edge is labeled with a word of the
target sentence and is weighted according to the language and translation model
scores. Ueffing et al. (2002) give a detailed description of word graphs and how
to produce them easily as a sub-product of the SMT search process. An example of
word graph is displayed in Fig. 3.2. During the interaction process, the system makes
use of this word graph in order to complete the prefixes accepted by the human
translator. First, the system looks for a node in the word graph that represents the
partial translation validated by the user. Then, the system follows the most probable
path from such a node to an end node, and returns the target language suffix defined
by this path.
For a fixed source sentence, if no pruning is applied, the word graph represents all
possible sequences of target words for which the posterior probability is greater than
zero according to the models. In practice, however, the pruning needed to render
the problem computationally feasible implies that the resulting word graphs only
represent a subset of the possible translations. Therefore, it may happen that the
user sets a prefix not encoded in the word graph. To circumvent this problem, we
introduce the use of error correction techniques. First, we look for the node that
represents a partial translation with minimum edit distance (Levenshtein 1966) to
the prefix. Then, we select the completion path which starts with the last word
of the prefix and has the best backward score.4 This scoring mechanism based
on edit distance can be introduced in the statistical formalization of the system
by using probabilistic finite-state machines (see Ortiz-Martnez 2011 for a detailed
description).
This is the score associated with a path going from the node representing the partial translation
with minimum edit distance to the final node.
63
64
D. Ortiz-Martnez et al.
models to generate word alignment matrices for the sentence pairs contained in the
training corpus (see Koehn et al. 2003). The parameters of such alignment models
are obtained by means of the incremental version of the EM algorithm.
65
In the experimentation, we used the incremental version of the EM algorithm (Neal and Hinton 1999) to update the word-to-word translation probability
model with the new sentence pairs available. We thus maintain an updated version
of the probability distribution over translations so that the user is not repeatedly
asked to supervise translations that provide similar information.
3.4.2 Corpus
Our experiments were based on ITP systems with models initialized using the well
known Europarl corpus (Koehn 2005). Europarl is extracted from the proceedings
of the European Parliament, which are written in the different languages of the
https://sites.google.com/site/centretranslationinnovation/tpr-db.
66
D. Ortiz-Martnez et al.
Fig. 3.3 Screenshot of the CASMACAT workbench showing the users point of gaze in yellow
Table 3.1 Main figures of the corpora used in the experiments
Europarl
Segments
Words (En/Es)
Training
1.9 M
51.3 M/53.7 M
Development
3003
73.0 K/78.8 K
EMEA
Test
d1
53
989/
d2
55
958/
d3
55
979/
European Union. In our experiments we have used the version that was created
for the shared task of the ACL 2013 Workshop on Statistical Machine Translation (Bojar et al. 2013). This version includes a training set, used to estimate the
parameters of the language and translation models, as well as a development corpus
that has been used to adjust the weights of the log-linear model underlying our ITP
systems.
The test texts involved in this experimentation were documents from the
European Medicines Agency as compiled in the EMEA corpus (Tiedemann 2009).
From the English-Spanish (En-Es) partition of EMEA, we created three different
documents (d1, d2 and d3) containing consecutive sentences and being roughly
of the same size. Table 3.1 shows the main figures of the corpora used in the
experiments.
67
3.4.4 Participants
We conducted our experiments in cooperation with Celer Soluciones, a language
service provider (LSP) based in Madrid, Spain. The experiments involved ten
freelance translators, all native speakers of Spanish offering translation and postediting services on a regular basis for this LSP.
In an attempt to unify post-editing criteria among participants, all of them were
instructed to follow the same post-editing guidelines aiming at a final high-quality
target text (publishable quality). The post-editing guidelines distributed in hard copy
were6 :
Retain as much raw MT as possible.
Do not introduce stylistic changes.
Make corrections only where absolutely necessary, i.e. correct words and phrases
that are clearly wrong, inadequate or ambiguous according to Spanish grammar.
Make sure that there are no mistranslations with regard to the English source text.
Publishable quality is expected.
Additionally, before starting their tasks, participants were introduced to the
CASMACAT workbench and the ITP post-editing protocol. They were given time
to familiarize themselves with the workbench, and asked to start the translation tasks
only after they consider themselves comfortable working with the tool.
Similar instructions were also used in other CASMACAT studies (Chaps. 4, 5, 7, 8 and 13).
68
Table 3.2 Task assignments
in the experiments
D. Ortiz-Martnez et al.
ITP
PIVO
PIVA
P1
d1
d2
d3
P2
d3
d1
d2
P3
d2
d3
d1
P4
d1
d2
d3
P5
d3
d1
d2
P6
d2
d3
d1
P7
d1
d2
d3
P8
d3
d1
d2
P9
d2
d3
d1
P10
d1
d2
d3
Each user translated the three test documents (d1, d2 and d3)
in the three different conditions analyzed in this study: ITP,
ITP with online learning (PIVO) and ITP with active learning
(PIVA). The number contained in the document name indicates
the order in which the participant translated such document
3.4.5 Methodology
Three different setups of the CASMACAT workbench were evaluated in the experiments: conventional ITP (ITP), ITP with online learning (denoted PIVO according
to the TPR-DB naming conventions), and ITP with active learning (PIVA). Each
participant translated one test document in one of the conditions. The assignment
between documents and conditions were randomized for each participant, Table 3.2
gives an overview of such assignments. The number in the document name indicates
the order in which the participant translated that document. Note that both the
assignment between document and condition, and the order of the conditions were
randomized for the ten participants. Keyboard and mouse activity was logged for
each individual task.
69
In our experiments, we used the widespread BLEU score (Papineni et al. 2002) that
measures the quality of a candidate translation by comparing it against a reference
translation. Specifically, BLEU is calculated as the precision of n-grams weighted
by a brevity penalty designed to penalize short translations. BLEU results vary
between zero and one although usually it is represented as a percentage where 100 %
denotes a perfect translation.
3.5 Results
The results of the experiments carried out are presented in the following two
subsections. First, we present results related to online learning and its impact in the
performance of ITP. Then, we present the corresponding results for active learning.
Table 3.3 Effort required from the user measured in terms of number of edits per each source
word to be translated for conventional ITP and ITP with online learning (best results are shown in
bold face)
ITP
PIVO
P1
10.6
6.3
P2
9.1
8.0
P3
11.8
7.3
P4
10.9
10.5
P5
8.5
7.9
P6
11.2
9.1
P7
5.4
4.5
P8
10.1
4.8
P9
8.6
4.8
P10
5.3
4.7
Avg
9.1
6.7
70
D. Ortiz-Martnez et al.
Table 3.4 Typing speed measured in translated source words per second for conventional ITP and
ITP with online learning (best results are shown in bold face)
ITP
PIVO
P1
0:37
0:51
P2
0:18
0:21
P3
0:21
0:28
P4
0:34
0:30
P5
0:39
0:36
P6
0:24
0:30
P7
0:37
0:49
P8
0:26
0:18
P9
0:17
0:24
P10
0:46
0:34
Avg
0:29
0:32
with respect to conventional ITP was worse. Despite surprising at first, these results
were coherent with previous research. In fact, in Alabau et al. (2014) it was found
that post-editors tend to spend more time outside the CASMACAT workbench when
working with the online learning approach. The authors of that work hypothesized
that the participants felt the necessity to do Internet searches so as to double check
correct translations generated by the ITP system with OL. This hypothesis was
confirmed by inspecting the recorded videos of the translation process of selected
source segments, obtaining significant gains in translation speed when the time spent
by the participants making Internet searches was removed from the study. This same
explanation can be applied to our experiments.
Note that the leftmost and rightmost points in each plot are equivalent to the SMT and PIVO
scenarios respectively.
71
Fig. 3.4 Quality of the final translations (BLEU) generated by participants 15 (one per row) as a
function of the translation effort (left) or the translation speed (right)
Sect. 3.3.2. We used the whole post-edited document generated by each participant
as the reference translation of this participant for that document. Workload was
measured both in terms of typing effort (left column) and translation speed (right
72
D. Ortiz-Martnez et al.
Fig. 3.5 Quality of the final translations (BLEU) generated by participants 610 (one per row) as
a function of the translation effort (left) or the translation speed (right)
column). The difference between the translation quality of ITP and PIVA with zero
workload can be explained by the different document translated, e.g. participant P1
translated document d1 with ITP and document d3 with PIVA, and the different
order in which translations were performed.
73
Results for the different participants were quite varied and noisy. As a general
result, we can say that there was a quite clear tendency of PIVA obtaining better
translation quality than ITP at the same level of typing effort (left column).
Nevertheless, results varied greatly between participants, as it can be seen when
comparing for example the plots of participants P1 (slight improvement), P3 (clear
improvement), and P4 (slight deterioration). This tendency was less clear when we
measured post-editing workload in terms of translation speed (right column). In this
case, there is a number of participants (P4, P7, P8, and P10) that post-edited at a
lower speed using PIVA.
In order to achieve a more robust conclusion, we grouped together post-edit
results for all participants and documents. Figure 3.6 displays the quality of the
translated documents as a function of the typing effort (top), or the translation speed
(bottom) of the post-edit process. Additionally, we also display the least-squares fit
for the results of the ITP and PIVA scenarios. These fitted lines show the tendencies
of the data that were shadowed by the noisy results of the individual participants.
When measuring the post-edit workload by the number of edits performed by
Fig. 3.6 Quality of the final translations for all participants (dots) as a function of the translation
effort (top) or the translation speed (bottom). We also include least-squares fits (solid lines) to show
average tendencies of the data
74
D. Ortiz-Martnez et al.
the participant (typing effort), results indicate that AL allowed to improve the
performance of the ITP framework. That is, participants employing the same effort
generated translations of higher quality when working with the PIVA approach.
Similar results were obtained for post-edit speed. However, in this case, differences
tend to be smaller between ITP and ITP with AL.
As we have said before, this difference between the post-edit effort and postedit speed measures can be explained by the tendency of the users to double
check translations proposed by re-trained SMT systems. However, we consider an
additional complimentary explanation. The uncertainty measure used to implement
active learning (see Sect. 3.3.2) is based on an estimation of the quality of the
individual words in the translation. Since low-quality words are more prone to be
edited by the user (Gonzlez-Rubio et al. 2010), our uncertainty measure is closely
related to the amount of edits required to post-edit the translation. Given these
considerations, it is intuitively clear why results as measured by post-edit effort
may be better than those measured by post-edit speed.
3.6 Conclusions
We have presented the results of a pilot study involving real users concerning the
implementation of online and active learning within a CAT workbench with ITP
functionalities. The main goal of the study was to determine whether the use of OL
and AL allows to improve the performance of a conventional ITP system or not.
For this purpose, the typing effort measured as the number of edit operations per
each source word, as well as the speed calculated as the number of translated source
words per second were obtained for ten different users translating a test set extracted
from the EMEA corpus, a real translation task belonging to the medical domain.
Results showed that the users of ITP systems incorporating OL consistently
required less typing effort than those using regular ITP. OL also increased the
translation speed for 60 % of the users. The rest of the users were faster using the
conventional system, despite requiring a greater typing effort. As it was explained in
Sect. 3.5.1, a previous study using the CASMACAT workbench with OL capabilities
showed that the participants felt the necessity to do Internet searches so as to
double check correct translations generated by means of the ITP system with OL,
substantially decreasing the translation speed. We think that this circumstance is also
the explanation for the greater translation times using OL observed for particular
users in the work presented here.
On the other hand, the translation quality obtained using ITP with AL was
consistently better than that obtained by means of conventional ITP at the same level
of typing effort. The differences in translation speed between ITP and ITP with AL
were smaller and more dependent on the particular user. Again, we think that these
observations are due to the tendency of the users to double check the translations
generated by the updated systems, which will be reduced as the users trust in the
system learning capabilities improves with time.
75
References
Alabau, V., Bonk, R., Buck, C., Carl, M., Casacuberta, F., Garca-Martnez, M., et al. (2013).
Casmacat: An open source workbench for advanced computer aided translation. The Prague
Bulletin of Mathematical Linguistics, 100, 101112.
Alabau, V., Carl, M., Garca-Martnez, M., Gonzlez-Rubio, J., Mesa-Lao, B., Ortiz-Martnez, D.,
et al. (2014). D6.3: Analysis of the third field trial. Technical report, CasMaCat project.
Barrachina, S., Bender, O., Casacuberta, F., Civera, J., Cubel, E., Khadivi, S., et al. (2009).
Statistical approaches to computer-assisted translation. Computational Linguistics, 35(1), 3
28.
Bellman, R. (1957). Dynamic programming (1st ed.). Princeton, NJ: Princeton University Press.
Bertoldi, N., Cattelan, A., & Federico, M. (2012). Machine translation enhanced computer assisted
translation. First report on lab and field tests. Available from: http://www.matecat.com/wpcontent/uploads/2013/01/MateCat-D5.3-V1.2-1.pdf.
Bertoldi, N., Cettolo, M., & Federico, M. (2013). Cache-based online adaptation for machine
translation enhanced computer assisted translation. In Proceedings of the MT Summit (pp. 35
42).
Bojar, O., Buck, C., Callison-Burch, C., Federmann, C., Haddow, B., Koehn, P., et al. (2013).
Findings of the 2013 Workshop on Statistical Machine Translation. In Proceedings of the
Eighth Workshop on Statistical Machine Translation (pp. 144). Sofia: Association for
Computational Linguistics.
Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., & Mercer, R. L. (1993). The mathematics of statistical
machine translation: Parameter estimation. Computational Linguistics, 19(2), 263311.
Casacuberta, F., Civera, J., Cubel, E., Lagarda, A. L., Lapalme, G., Macklovitch, E., et al. (2009).
Human interaction for high quality machine translation. Communications of the ACM, 52(10),
135138.
Chen, S. F., & Goodman, J. (1996). An empirical study of smoothing techniques for language
modeling. In A. Joshi & M. Palmer (Eds.), Proceedings of the Thirty-Fourth Annual Meeting
of the Association for Computational Linguistics (pp. 310318). San Francisco: Morgan
Kaufmann.
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the
EM algorithm. Journal of the Royal Statistical Society, 39(1), 138.
Denkowski, M., Dyer, C., & Lavie, A. (2014). Learning from post-editing: Online model
adaptation for statistical machine translation. In Proceedings of the EACL (pp. 395404).
Gothenburg: Association for Computational Linguistics.
Gonzlez-Rubio, J., & Casacuberta, F. (2014). Cost-sensitive active learning for computer-assisted
translation. Pattern Recognition Letters, 37, 124134.
Gonzlez-Rubio, J., Ortiz-Martnez, D., & Casacuberta, F. (2010). On the use of confidence
measures within an interactive-predictive machine translation system. In Proceedings of the
EAMT.
Gonzlez-Rubio, J., Ortiz-Martnez, D., & Casacuberta, F. (2012). Active learning for interactive
machine translation. In Proceedings of the EACL (pp. 245254).
Jelinek, F. (1997). Statistical methods for speech recognition. Cambridge: MIT.
Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of
MT Summit (pp. 7986).
Koehn, P., Och, F., & Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of
Human Language Technologies: The 2003 Annual Conference of the North American Chapter
of the Association for Computational Linguistics (pp. 4854).
76
D. Ortiz-Martnez et al.
Langlais, P., & Lapalme, G. (2002). TransType: Development-evaluation cycles to boost translators productivity. Machine Translation, 17(2), 7798.
Levenshtein, V. (1966). Binary codes capable of correcting deletions, insertions and reversals.
Soviet Physics Doklady, 10(8), 707710.
Lewis, D., & Gale, W. (1994). A sequential algorithm for training text classifiers. In Proceedings of
the ACM SIGIR conference on Research and development in information retrieval (pp. 312).
Neal, R., & Hinton, G. (1999). A view of the EM algorithm that justifies incremental, sparse, and
other variants. In Learning in graphical models (pp. 355368). MIT press.
Och, F. J., & Ney, H. (2002). Discriminative training and maximum entropy models for statistical
machine translation. In Proceedings of the ACL (pp. 295302).
Och, F. J., Zens, R., & Ney, H. (2003). Efficient search for interactive statistical machine
translation. In Proceedings of the European chapter of the Association for Computational
Linguistics (pp. 387393).
Ortiz-Martnez, D. (2011). Advances in fully-automatic and interactive phrase-based statistical
machine translation. Ph.D. thesis, Universidad Politcnica de Valencia, Valencia (Spain).
Advisors: F. Casacuberta and I. Garca-Varea.
Ortiz-Martnez, D. (2015, submitted). Online learning for statistical machine translation.
Ortiz-Martnez, D., & Casacuberta, F. (2014). The new thot toolkit for fully automatic and
interactive statistical machine translation. In Proceedings of the EACL (pp. 4548).
Ortiz-Martnez, D., Garca-Varea, I., & Casacuberta, F. (2010). Online learning for interactive
statistical machine translation. In Proceedings of the NAACL-HLT (pp. 546554).
Ortiz-Martnez, D., Leiva, L. A., Alabau, V., Garca-Varea, I., & Casacuberta, F. (2011). An
interactive machine translation system with online learning. In ACL (System Demonstrations)
(pp. 6873).
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation
of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02 (pp. 311318). Association for Computational Linguistics.
Tiedemann, J. (2009). News from opusa collection of multilingual parallel corpora with tools
and interfaces. In Proceedings of the RANLP (Vol. V, pp. 237248).
Ueffing, N., Och, F., & Ney, H. (2002). Generation of word graphs in statistical machine translation.
In Proceedings of the EMNLP (pp. 156163).
Chapter 4
Abstract The combination of temporal, technical and cognitive effort has been
proposed as metrics to evaluate the feasibility of post-editing on machine-translation
(MT) output (Krings, 2001). In this study, we investigate the impact of interactive
machine translation on the post-editing effort required to post-edit two specialized
texts under experimental conditions and correlate it with Translation Edit Rate
(TER) scores. Using the CasMaCat workbench as a post-editing tool in conjunction
with a Tobii T60 eye tracker, process data were collected from 16 participants
with some training on postediting. They were asked to carry out post-editing tasks
under two different conditions: i) traditional post-editing (MT) and ii) interactive
77
78
F. Alves et al.
post-editing (IMT). In the IMT condition, as the user types, the MT system
suggests alternative target translations which the post-editor can interactively accept
or overwrite, whereas in the traditional MT condition no aids are provided to
the user while editing the raw MT output. Temporal effort is measured by the
total time spent to complete the task whereas technical effort is measured by the
number of keystrokes and mouse events performed by each participant. In turn,
cognitive effort is measured by fixation duration and the number of eye fixations
(fixation count) in each task. Results show that IMT post-editing had significantly
lower fixation duration and fewer fixation counts in comparison to traditional
post-editing.
Keywords Post-editing effort Interactive post-editing Traditional postediting TER scores CASMACAT workbench
4.1 Introduction
First investigations of post-editing effort go back to Kringss (2001) seminal work
and relate mainly to his separation and categorization of temporal, technical
and cognitive effort. More recently, however, with the advent of interactive
machine-translation technologies, human-machine interaction has come to
the centre stage. Consequently, empirical-experimental research has included
interactivity as a main component of research designs. One might say that
Krings (2001) was ahead of his time. He suggested a strategy that entails a
crucial motivation for carrying out research on post-editing with a focus on the
impact of interactivity on post-editing process. According to Krings, this strategy
implies that one should benefit from interactive systems architecture when he
recommends
renunciation of fully automatic machine translation in favour of interactive system architectures where the computer relies on the knowledge of a human translator in specific situations
in order to achieve better machine translation results (Krings 2001, p. 24)
In this chapter, we have taken Kringss suggestion as the starting point of our
research which aims at achieving two complementary goals, namely, to compare
the impact of interactive machine translation on post-editing effort and to correlate
post-editing effort with Translation Edit Rate (henceforth TER) scores (Snover et al.
2006).
Building on the existing literature, we assume that Kringss distinction between
temporal, technical and cognitive effort still holds true. Thus, if temporal effort
refers to the amount of time needed to post-edit the MT output, a shorter task time
for tasks involving interactive translation prediction (ITP) in post-editing would
be a positive indicator that the ITP condition should be favoured in comparison
with traditional post-editing. In other words, if post-editors spent less time when
carrying out an ITP post-editing task compared to the time spent on traditional MT
79
80
F. Alves et al.
of a more efficient form of interaction between humans and machines in the context
of computer-assisted translation is still an open research question.
Traditionally, post-editing workflows only take into account the human component in a serial process (Isabelle and Church 1998). First, the MT system provides
complete translations which are then proofread by a human translator. In such a
serial scenario, there is no actual interaction between the MT system and the human
translator, making it impossible for the MT system to benefit from overall human
translation skills and preventing the human translator from making the most out of
the adaptive ability of some MT systems.
The interactive framework constitutes an alternative to fully automatic MT systems in which the MT system and the human agent interact to generate translations
according to different degrees of quality. The system proposes a new translation
whenever the user edits a word, trying to guess the correct auto-completion for
the text that the user inputs. The user can then accept or partially accept the ITP
proposal.
Our research is focused on a study where the set of data was generated with
the aid of the CASMACAT workbench featuring interactive translation prediction
(ITP). For a description of the CASMACAT system, see Chap. 3 (Langlais and
Lapalme 2002; Casacuberta et al. 2009; Barrachina et al. 2009).
CASMACAT1 was a European project in which cognitive studies of actual
unaltered translator behaviour are carried out based on key-logged and eye-tracking
data. More details of this workbench are described in Chap. 3 of this book.
A screenshot of the CAT workbench can be seen in Fig. 4.1.
CASMACAT: Cognitive Analysis and Statistical Methods for Advanced Computer Aided Translation was a project (20112014) co-funded by the European Union under the Seventh Framework
Programme Project 287576 (ICT-2011.4.2).
81
a monolingual speaker with no knowledge of the source language can correct the
MT output. Secondly, who sets the quality criteria? In general, one could say that
a client who needs a post-edited text establishes the quality criteria for the posteditor to follow. These criteria can be based on one of two forms of post-editing:
partial (rapid, light or yet fast) post-editing and a complete (conventional or yet
full) post-editing. Thus, those forms are distinguished on the basis of the depth of
intervention in the machine translation (Krings 2001, p. 45).
In partial post-editing, as Krings (2001, p. 54) states, only its value as a source
of raw or gist information remains. Therefore, only the most egregious machine
translation errors are repaired. On the contrary, in full post-editing, [t]he machine
translation serves the complete post-edit apparently only as a source of raw material
for an extensive new formulation. (Krings 2001, p. 48). Then, a machine-translated
text that is completely post-edited would probably look like a manually translated
text, because it would have undergone a deep intervention by the post-editor.
Krings (2001) investigates many of those aspects in the post-editing process.
In his study, he assesses machine translation and post-editing objectively and
empirically with different groups of translators as participants. He considers the
cost and the post-editing effort with access and no access to source text and uses
TAPs (Think-Aloud Protocols) as a methodological tool to investigate what is going
on in the subjects minds. Among those aspects, Krings (2001) highlights that the
question of post-editing effort is the key issue in the evaluation of the practicality
of machine translation systems (p. 178) and points out the obvious reason for this:
As long as fully automatic high quality machine translation remains an unreachable
ideal concept, then the amount of post-editing effort will be the primary determinant
82
F. Alves et al.
of whether machine translation is worthwhile (p. 178). But what did Krings (2001)
mean by post-editing effort?
83
for segments without NTI, which are measured in words per second, were higher
than segments with NTI segments. These differences were statistically significant.
However, the author asserts post-editing effort can sometimes be greater for those
sentences [without NTIs] than for those that contain NTIs (OBrien 2007, n.p.)
Thus, she concludes that removing NTIs does not always lead to less temporal effort
in post-editing task.
As regards the technical effort, OBrien (2007) concludes that the segments
without NTIs require less deletions and insertions than those segments with NTIs,
indicating that, on average, both technical effort and the time spent on the postediting task (temporal effort) are reduced when NTIs are absent. Therefore, it is
useful to evaluate the allocation of effort in removing those NTIs before submitting
to machine translation in the translation market.
Carl et al. (2011) also carried out experiments in which they compared human
translation with post-editing. They focused on the time spent on those tasks, on
productivity gains and on the quality of target texts rendered by seven translators
who produced seven versions of three post-edited and three manually translated
texts. The study analyses two kinds of post-editing effort, namely temporal and
technical effort. Although the authors do not explicit use the term technical effort
in their study, we infer it by what they call properties of the manual and the
post-edited translations. Those properties include the number of characters in both
post-edited translations and manually translated versions, the number of deletions,
insertions, navigations keystrokes and mouse clicks. Carl et al. (2011) found out
that, on average, there were more deletions, navigation keystrokes and mouse clicks
in post-editing than in manual translation while fewer insertions were found in postediting tasks. Thus, those findings imply that more technical effort is demanded in
manual translation than in post-editing. In their study, Carl et al. (2011) also points
out that their results indicate that translators spend less time in post-editing than in
manual translation and that post-editing presents differences in gaze behaviour, i.e.
the number of fixations in ST and TT windows are not as evenly distributed as in
manual translation. Furthermore, both the total reading time and the fixation count
on the TT proved to be significantly higher in post-editing task compared to manual
translation when Carl et al. (2011) applied an unpaired two-sample t-test (p < 0.01).
As far as cognitive effort is concerned, we have seen that Krings (2001) evaluates
it by analysing think-aloud data. The author assumes that this kind of data permit
the development of a number of empirical parameters for determining cognitive
post-editing effort (Krings 2001, p. 179). Therefore, Krings (2001) analyses this
data assuming that the higher the verbalization effort, the higher the cognitive
effort. He concludes that, among the processes he analyses, verbalization effort was
considerably higher in post-editing without source text when compared with postediting with source text. His results also indicate that translation and post-editing
with source text present almost the same rate of verbalization effort, indicating
similar levels of cognitive effort.
Pause analysis has been a common method to analyse cognitive effort in
translation process research. When applying this methodology specifically to postediting process, OBrien (2006) and Lacruz et al. (2012) found interesting results.
84
F. Alves et al.
OBrien (2006) analyses pauses in post-editing and triangulates them with the
Choice Network Analysis method and key-logged data generated with the aid of
the software Translog. According to OBrien (2006, p. 16), by applying Choice
Network Analysis, it is possible to identify those parts of a sentence that are most
changed during post-editing. It is assumed that for these changes to be implemented,
cognitive effort is required. Although her results suggest that analysing pauses is a
useful indicator of cognitive effort in post-editing, she asserts that it is very difficult
to correlate cognitive effort with pauses, source text difficulty and target text quality.
This is probably due to the fact that pause duration and frequency are subject to
individual differences, thus justifying the need of supplementary methods to analyse
cognitive effort such as Choice Network Analysis and keyboard monitoring using
Translog.
Lacruz et al. (2012) complement OBriens (2006) study by introducing average
pause ratio as a metric to establish a relationship between pauses and cognitive effort
in post-editing. Lacruz et al. (2012) assert that the average pause ratio is sensitive
to the number of pauses and pause duration. The authors state that they assessed
the cognitive effort required to post-edit a segment by using a measure of technical
effort that counts the number of complete editing events. According to the authors:
We classify post-edited segments as having required more or less cognitive effort on the part
of the post-editor based on a metric that counts the number of complete editing events. In
many circumstances, collections of individual editing actions can be considered to naturally
form part of the same overall action, which is what we label as complete editing event
(Lacruz et al. 2012, n.p.)
In the same paper, Lacruz et al. (2012) report a case study with a professional
translator with 25 years of experience as a freelance translator and with no previous
experience in post-editing. They classify the post-edited segments rendered by him
into more or less cognitively demanding on the basis of more or fewer complete
editing events in the segment. Their results indicate that the average pause ratio was
higher for cognitively less demanding segments (with two or fewer complete events)
than for cognitively more demanding segments (with four or more complete editing
events).
In line with recent studies in translation process research that have been using
eye-tracking data as part of a methodology designed to investigate cognitive effort,
it is indeed possible to include metrics such as fixation duration and fixation count
for such purposes.
85
from the European Medicines Agency (EMEA).2 One of the selected texts was
about human insulin to treat diabetes (Text 1) whereas the other text was about
an anticancer medicine (Text 2). Each of the two texts consisted of 20 segments
to be post-edited. Care was taken in attempting to choose texts that had the same
level of difficulty translating. We would also argue that, despite being specialized
texts, these texts could be edited by post-editors without specific training in medical
translation. The experiment is contained in the TPR-DB (Chap. 2) under the study
name CEMPT13, and can be downloaded from the TPR-DB website.
4.3.3 Participants
Twenty-one subjects were recruited overall, of whom sixteen produced usable data.
All of the participants had Brazilian Portuguese as their L1 and English as their
L2 and had received academic training on post-editing MT outputs as part of their
undergraduate degree. The average age of the post-editors was 29.5 years (range
2239).
http://opus.lingfil.uu.se/EMEA.php
86
F. Alves et al.
grammar; (4) Make sure there are no mistranslations with regard to the English
source text; (5) Publishable quality is expected.3
The decision to participate was voluntary and made upon awareness of the details
on this sheet, including the fact they would have their eye movements and keyboard
activity recorded.
Two different conditions were evaluated: (1) traditional post-editing (PE) and
(2) post-editing with interactive translation prediction (ITP). Each of the two
texts was shown on both conditions among the different participants. Each editor
processed each text once under one of the two conditions experimental conditions
at the Laboratory for Experimentation in Translation (LETRA), located at Federal
University of Minas Gerais, Brazil.
Before starting to carry out their tasks, participants were introduced to the
CASMACAT workbench and to the two different conditions used in the experiment.
They were given time to familiarize themselves with the CASMACAT workbench.
Before starting to perform both tasks, participants were asked to complete an online
questionnaire with information about their translation profile (biodata, working
languages and previous experience in post-editing).
Similar guidelines were also used in the other CASMACAT studies, including the EFT14 study
(Chap. 3), the LS14 study (Chap. 5) and the CFT14 study (Chaps. 7 and 8).
87
duration, did not deviate significantly from a normal distribution either in traditional post-editing (W D 0.9008, p < 0.08) or interactive post-editing (W D 0.9305,
p < 0.25).
Table 4.1 Total time spent (ms) on PE and ITP and per character
Participants
P02
P03
P04
P05
P06
P09
P10
P11
P12
P13
P14
P15
P16
P18
P19
P21
Duration (ms)
PE
ITP
1,231,219 1,280,000
1,083,447 1,269,000
1,438,495 1,566,000
2,343,441 2,327,000
818,592 1,179,000
873,094
762,000
863,664 1,427,000
968,101
897,000
930,180
979,000
653,330
902,000
722,673 1,220,000
707,302 1,014,000
1,030,441 1,458,000
606,403 1,041,000
1,074,724 1,231,000
737,477 1,061,000
Duration
(ms)normalized
by ST characters
PE
ITP
976
801
678 1006
1140
979
1466 1845
649
737
546
604
684
892
605
711
737
612
408
715
573
763
442
804
817
912
480
651
672
976
584
663
Difference
between task
durations
(PE-ITP)
Total duration
average
(PE C ITP)/2
48,781
185,553
127,505
16,441
360,408
111,094
563,336
71,101
48,820
248,670
497,327
306,698
427,559
434,597
156,276
323,523
1,255,609
1,176,223
1,502,247
2,335,220
998,796
817,547
1,145,332
932,550
954,590
777,665
971,336
860,651
1,244,220
823,701
1,152,862
899,238
88
F. Alves et al.
Contrary to what was expected, participants did not become faster in the ITP
condition. Actually, they spent significantly more time with ITP according to the
results of a Wilcoxon signed-rank Test (Z D 10, p D 0.001).4
Participants
P02
P03
P04
P05
P06
P09
P10
P11
P12
P13
P14
P15
P16
P18
P19
P21
Keystrokes
PE
ITP
412
424
642
963
939 1054
510
799
964
776
1372
839
694 1624
537
547
1273 1065
315
600
952
856
517 1023
746
919
1876 2472
483
594
709 1757
Mouse events
PE ITP
71
81
80 112
101 175
114 264
71
84
14
40
66
53
79
66
50
33
72
80
97
85
65
76
71 140
47
50
92
76
121 245
Keystrokes
normalized by
characters
PE
ITP
0.3267 0.2653
0.4018 0.7637
0.7446 0.6596
0.3191 0.6336
0.7645 0.4856
0.8586 0.6653
0.5504 1.0163
0.3360 0.4338
1.0095 0.6665
0.1971 0.4758
0.7550 0.5357
0.3235 0.8113
0.5916 0.5751
1.4877 1.5469
0.3023 0.4711
0.5623 1.0995
Mouse
eventsnormalized
by characters
PE
ITP
0.0563 0.0507
0.0501 0.0888
0.0801 0.1095
0.0713 0.2094
0.0563 0.0526
0.0088 0.0317
0.0523 0.0332
0.0494 0.0523
0.0397 0.0207
0.0451 0.0634
0.0769 0.0532
0.0407 0.0603
0.0563 0.0876
0.0373 0.0313
0.0576 0.0603
0.0960 0.1533
Provided that this non-parametric test is computed with the median, such value is given for each
condition (PE and ITP) as follows: PE D 901,637 and ITP D 1,199,500.
89
p D 0.05) than ITP post-editing. As for the mouse events, there was not a significant
difference (Z D 38, p D 0.12) between both conditions.
Table 4.3 Fixation count and mean fixation duration for PE and ITP
Participants
P02
P03
P04
P05
P06
P09
P10
P11
P12
P13
P14
P15
P16
P18
P19
P21
Fixation count
PE
ITP
1801 4233
1336 2648
1331 2038
2031 4079
977 3396
1151 1905
710 3533
724 1783
1107 2454
985 1976
674 2471
872 2024
883 3151
840 2813
1339 2511
700 4263
90
F. Alves et al.
1,1
1,1
1,1
1,1
1,1
1,0
1,0
0
100
200
300
400
500
600
700
800
900
800
900
Fig. 4.2 Scatterplot correlating TER score and fixation duration for ITP
TER score vs. fixation duration PE
1,4
TER score
1,2
1,0
0,8
0,6
0,4
0,2
0,0
0
100
200
300
400
500
600
700
Fig. 4.3 Scatterplot correlating TER score and fixation duration for PE
91
2000000
1500000
1000000
500000
0
0,00
0,20
0,40
0,60
0,80
1,00
1,20
1,40
1,20
1,40
TER score
Time (ms)
2000000
1500000
1000000
500000
0
0,00
0,20
0,40
0,60
0,80
1,00
TER score
Fig. 4.5 Scatterplot correlating TER score and temporal effort in ITP
segments with TER scores ranging from approximately 0.2 to 0.4 had fixation
duration as high as TER scores above 1.0. However, lower TER scores should have
required fewer edits and therefore shorter fixation duration.
The following scatterplots summarize the results (Figs. 4.4 and 4.5) with respect
to TER scores and temporal effort in PE and ITP.
Figure 4.4 shows that there was no correlation between TER score and temporal
effort in traditional post-editing (rsp D 0.197, p D 0.46). However, Spearmans
correlation coefficient indicates that interactive post-editing had a positive correlation between TER scores and temporal effort (rsp D 0.570, p D 0.02). Increases in
time spent to complete the task were associated with higher TER scores.
4.5 Discussion
Throughout the analysis developed in this chapter, we have tested two hypotheses:
(1) ITP would contribute to a decrease in post-editing effort, and (2) ITP post-editing
effort would positively correlate with TER scores. By analysing temporal, technical
92
F. Alves et al.
and cognitive effort in ITP post-editing compared to traditional PE, we were able to
observe whether the interactivity reduces the effort dedicated to post-editing.
Our results show that, contrary to what was expected, participants neither became
faster when post-editing in the ITP condition nor showed a reduction in the
number of keystrokes when working in that condition. Interestingly, an opposite and
favourable result emerges when fixations are analysed. Participants had significantly
shorter fixation duration when working with ITP.
Bearing those results in mind, we could then speculate the reasons for the negative results regarding temporal and technical effort. They may be explained by the
lack of familiarity of the participants with the interactive tool or their unwillingness
to incorporate the auto-complete changes provided by the CASMACAT workbench.
As described in the methodology section, participants had some training on postediting but not real professional experience as post-editors. This may have had an
impact on the time spent and the changes made on each task (See Chap. 5 for an
in-depth analysis of the learning effect in ITP).
In addition to that, analysing the amount of editing performed on each of the
segments during the task in comparison to the amount of editing predicted by
TER scores provides interesting results from the industry perspective since semiautomatic translation edit rate metrics such as TER scores are used to predict quality
estimates. Therefore, investigating the correlation between actual and predicted
post-editing effort could bring helpful insights for establishing fair and reasonable
price rates for post-editors, Language Service Providers (LSPs) and customers.
Our results show that there was no correlation for traditional post-editing but a
positive correlation between TER scores and temporal effort in ITP post-editing.
As expected, an increase in the time spent to complete the task was associated with
higher TER scores. A correlation between the actual and the predicted effort can
be considered encouraging regarding the use of semi-automatic translation edit rate
metrics for establishing a threshold of good and bad MT outputs in the context
of ITP post-editing.
Some of our results related to post-editing effort suggest that ITP post-editing
may offer a successful path; however, since only a few post-editors participated
in this data collection, the current study should be considered only as an initial
exploration of interactivity on post-editing processes, particularly for EnglishPortuguese language pair. The small sample may also have interfered with some of
the non-significant results, so it would be beneficial to conduct further experiments
with a larger set of participants as well as to explore some of the qualitative
data related to participants previous training, experience and willingness to use
interactive machine translation.
93
scratch. However, we have seen that such evidence is not at all clear from the outset.
Thus, in this chapter we have set about to investigate the effect of interactivity in
human-machine post-editing process in English-Portuguese translation.
On the one hand, our results have shown that, contrary to what was expected,
participants neither became faster when carrying out ITP-related post-editing tasks
nor showed a reduction in the number of keystrokes when working in that condition.
These negative results raise questions related to possible implications of these
results for the industry with respect to gains related to temporal and technical
effort in post-editing. They also provide food for thought concerning future research
directions.
On the other hand, however, our results have also indicated that ITP-related tasks
have a positive impact on cognitive effort in post-editing as shown by significantly
shorter fixation duration when participants worked in the ITP condition. Another
positive result relates TER scores with temporal effort in ITP post-editing.
Altogether, evidence from results related to hypotheses 1 and 2 highlights the
relevance of combining a quantitative and a qualitative approach when assessing
different types of effort in post-editing. This combined approach seems to grow
in importance as far as cognitive effort is concerned. After all, gains for both
post-editors and the industry will only be meaningful if less effort also leads to
qualitatively better output.
Our small-scale results seem to indicate that this is what actually happens when
one investigates effort with respect to ITP post-editing. Nevertheless, as we have
stated above, one should carry out larger-scale studies to arrive at more robust
and concluding evidence. Research in post-editing is somehow still in its infancy,
particularly in scarcely related language pairs such as English-Portuguese. With
our results, we hope to have provided elements to expand on-going research in
post-editing and paved the way for further studies which may confirm some of the
exploratory claims that we have made in this chapter.
Acknowledgements The work described in this chapter was carried out within the framework of
the EU project CASMACAT: Cognitive Analysis and Statistical Methods for Advanced Computer
Aided Translation, funded by the European Union 7th Framework Programme Project 287576
(ICT-2011.4.2). Website: http://www.casmacat.eu. Brazilian researchers were funded by CNPq,
the Brazilian Research Council (grant 307964/2011-6), and FAPEMIG, the Research Agency of
the State of Minas Gerais (grant SHA/PPM-00170-14).
References
Barrachina, S., Bender, O., Casacuberta, F., Civera, J., Cubel, E., Khadivi, S., Lagarda, A., Ney,
H., Toms, J., Vidal, E., & Vilar, J. M. (2009). Statistical approaches to computer-assisted
translation. Computational Linguistics, 35(1), 328.
Carl, M., Dragsted, B., Elming, J., Hardt, D., & Jakobsen, A. L. (2011).The process of post-editing:
A pilot study. In B. Sharp, M. Zock, M. Carl, & A. L. Jakobsen (orgs.), Proceedings of the 8th
natural language processing and cognitive science workshop (Copenhagen studies in language
series, Vol. 41, pp. 131142).
94
F. Alves et al.
Casacuberta, F., Civera, J., Cubel, E., Lagarda, A. L., Lapalme, G., Macklovitch, E., & Vidal, E.
(2009). Human interaction for high quality machine translation. Communications of the ACM,
52(10), 135138.
Duchowski, A. (2007). Eye tracking methodology: theory and practice. Clemson: Springer.
Federico, M., Cattelan, A., & Trombetti, M. (2012). Measuring user productivity in machine
translation enhanced computer assisted translation. In Proceedings of the tenth conference
of the association for machine translation in the americas (AMTA). AMTA 2012. Retrieved
October 30, 2014.
Flournoy, R., & Duran, C. (2009). Machine translation and document localization at adobe: From
pilot to production. In MT Summit XII: Proceedings of the twelfth machine translation summit.
Green, S., Heer, J., & Manning, C. D. (2013). The efficacy of human post-editing for language
translation. In SIGCHI conference on human factors in computing systems (pp. 439448).
ACM.
Isabelle, P., & Church, K. (1998). Special issue on: New tools for human translators. Machine
Translation, 12(1/2).
Jakobsen, A. L., & Jensen, K. T. H. (2008). Eye movement behaviour across four different types
of reading task. Copenhagen Studies in Language, 36, 103124.
Kay, M., Gawron, J. M., & Norvig, P. (1994). Verbmobil: A translation system for face-to face
dialog. Stanford: Center for the Study of Language and Information.
Koehn, P. (2009). A process study of computer-aided translation. Machine Translation, 23(4), 241
263.
Krings, H. (2001). Repairing texts: Empirical investigations of machine translation port-editing
processes (Trans. G. Koby, G. Shreve, K. Mischericow, & S. Litzar). Ohio: Kent State
University Press.
Lacruz, I., Gregory, M. S., & Angelone, E. (2012). Average pause ratio as an indicator of cognitive
effort in post-editing: A case study. In S. OBrien, M. Simard, & L. Specia (Eds), Proceedings
of the AMTA 2012 workshop on post-editing technology and practice (WPTP 2012). Retrieved
from http://amta2012.amtaweb.org/AMTA2012Files/html/2/2_paper
Langlais, P., & Lapalme, G. (2002). TransType: development-evaluation cycles to boost translators
productivity. Machine Translation, 17(2), 7798.
Mesa-Lao, B. (2013). Introduction to post-editing The CasMaCat GUI . Retrieved from http://
bridge.cbs.dk/projects/seecat/material/hand-out_post-editing_bmesa-lao.pdf
OBrien, S. (2004) Machine translatability and post-editing effort: How do they relate? In
Translating and the computer. London: Aslib.
OBrien, S. (2005). Methodologies for measuring the correlations between post-editing effort and
machine translatability. Machine Translation, 19, 3758.
OBrien, S. (2007). An empirical investigation of temporal and technical post-editing effort.
Translation and Interpreting Studies, 2(1), 83136.
OBrien, S. (2006). Pauses as indicators of cognitive effort in post-editing machine translation
output. Across Language and Cultures, 7(1), 121.
Plitt, M., & Masselot, F. (2010). A productivity test of statistical machine translation post-editing
in a typical localisation context. In The Prague bulletin of mathematical linguistics no. 93 (pp.
716). ISBN 978-80-904175-4-0. doi:10.2478/v10108-010-0010-x.
Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples).
Biometrika, 52, 591611.
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit
rate with targeted human annotation. In Proceedings of AMTA-2006 (pp. 223231).
Chapter 5
Abstract This chapter reports the results of a longitudinal study (LS14) in which
the CASMACAT post-editing workbench was tested with interactive translation
prediction (ITP). Whereas previous studies with the CASMACAT workbench
(Sanchis-Trilles et al., Machine Translation, 2014) or similar systems (Langlais
et al., Machine Translation, 15, 7798, 2004) tested user interaction only for a
few days, the aim of this study was primarily to find out whether and how the
performance of professional post-editors improved over time when working with
the CASMACAT ITP feature. We were also interested in uncovering any specific
profiles of translators depending on personal factors such as previous experience in
95
96
V. Alabau et al.
post-editing and typing skills. Finally, the aim was also to collect feedback from
the post-editors in order to know more about their views regarding this type of
technology.
Keywords CASMACAT workbench Interactive post-editing Interactive translation prediction Learning behavior in interactive post-editing Production
time Typing time
5.1 Introduction
The way texts are produced changes with every technological invention. From paper
and pencil to type-writers and computers, each new technology gives rise to new
types of texts, new styles of authoring, and new ways of how texts are generated
and perceived. Today we are experiencing increased automation of text production,
in particular through the Internet and through novel forms of editing, authoring and
translating digital content.
Within EU CASMACAT project (see Sanchis-Trilles et al., Machine Translation,
2014 and also Chap. 3 in this volume), we have developed an advanced postediting platform with an interactive translation prediction mode, context dependent
completions during the translation process (Langlais et al. 2004). Even though
this feature was designed to help translators in their translation production, within
a 3-days field study in a professional translation agency1 Carl et al. (2013) it
semed to hamper translators rather than help them to produce faster translations.
Investigating some of the screen recordings, we hypothesized that post-editors
might need to get more extended exposure to the CASMACAT workbench as its
novel editing features might require completely different translation styles and
translation procedures, which first would have to be learned (Sanchis-Trilles et al.
2014). This assumption is in line with experiences gained in a similar translation
prediction system, TRANSTYPE (Langlais et al. 2004), where it was suggested
that over a longer period [the system] is expected to give a much better picture of
its possibilities.
Accordingly, we conducted a longitudinal study (LS14) which involved five posteditors working alternatively with CASMACATs traditional post-editing mode and
the Interactive Translation Prediction (ITP) mode over a period of 6 weeks. The
aim was to test whether post-editors become faster when working with ITP as they
become more acquainted with this type of assistive technology, and to investigate
whether exposure to this workbench over a longer period of time has an effect on
editing behaviour.
Field trials of the CASMACAT workbench were carried out at Celer Soluciones SL, Madrid, who
were partner in the CASMACAT consortium
97
The LS14 study took place in May and June 2014. It was followed in July 2014
by the third CASMACAT field trial (CFT14), for which a more detailed description
is contained in Chap. 7 of this volume. The CFT14 study was conducted at the same
translation agency, aiming at assessing whether post-editors profit from ITP online
learning as compared to traditional post-editing.2 Seven post-editors participated in
the CFT14 study from which four had also taken part in the previous longitudinal
study (LS14). As a side effect, we can thus investigate what the four post-editors
who participated in both studies have learned, compared to those three post-editors
who only participated in the CFT13 study.
The CFT14 study differs from the LS14 study with respect to:
the text type in LS14 was general news, while CFT14 was a specialized text from
the medical domain extracted from the EMEA corpus.3
The number of source text words was also quite different in these two studies:
LS14 involved 24 source texts of 1000 words each, while CFT14 involved only
two source text with 4500 words each (texts were much longer in CFT14, so as
to test the online learning effect with tokens that occurred several times within
each text).
Both studies combined add up to around 225,000 source text words which were
translated into 249,000 target text words. The studies are included in the publicly
available TPR-DB.4
Results show that LS14 participants became indeed faster over the period of
6 weeks working with the ITP system and, according to the projection of the data
collected, they could have been even more productive after 67 weeks of regular
exposure to this new technology.
A closer look at the way post-editors became acquainted with ITP suggests
that learning to work with this interactive technology requires a different way of
controlling the typing speed. In order to be able to fully benefit from the ITP
suggestions (i.e. the translation autocompletions) provided by the system, posteditors need to check more frequently the proposals of the ITP system. Since all
post-editors in the LS14 study were touch typists, they could only fully benefit
from the ITP suggestions once they gradually learned to avoid overwriting new
suggestions and thus saving typing effort.
Section 5.2 introduces the LS14 study. It gives background on the participants,
the experimental design and the results of the study. Section 5.3 compares behavioral
patterns of LS14 participants with CFT14, and tries to describe what exactly is being
learned over time. Section 5.4 corroborates these findings with the feedback from
participants, as acquired on the basis of questionnaires.
See also Chap. 3 for a comparison of online learning and active learning in the CASMACAT tool.
http://opus.lingfil.uu.se/EMEA.php.
98
V. Alabau et al.
More specific data on the participants age, level of experience, professional education, etc., is
available in the CRITT TPR Database (metadata folder).
99
provided by the CASMACAT server and the participants were asked to work under
the following conditions:
Condition 1: Traditional post-editing (P), i.e. no interaction is provided during
the post-editing process.
Condition 2: Interactive post-editing (PI), i.e. interaction is provided during the
post-editing process in the form of ITP.
Every week, all post-editors worked on the same four source texts counterbalancing texts/conditions among participants in order to avoid any possible text/tool-order
effect (two texts in condition 1 and two texts in condition 2). During the first and
the last week of the study, post-editors worked from Celer Soluciones SL while
their eye movements were recorded using an eye-tracker. From week 2 to week
4, post-editors worked from home as they usually do when completing jobs for
the company. Meeting the participants at the company the first week was useful to
make sure they understood the assignment before starting to post-edit. Post-editing
guidelines were given, similar to those discussed in Chap. 3, as well as a handson tutorial on how ITP works from the user perspective (condition 2). During the
last week of the experiment, participants returned to Celer Soluciones SL so that
a second sample of their eye movements could be recorded and so that we could
gather their feedback and their comments on the technology they had been using.
Each post-editor post-edited 1154 segments, i.e., in total more than 140,000
source text words (half of them in each condition, as shown in Chap. 2, Appendix A).
Presentation of texts and task order were counterbalanced, such that participants
post-edited in the PI condition first and post-edited in the P condition afterwards
half the time. In addition, texts were grouped in two lists: two participants postedited list A (during their weekly assignments) in condition P and post-edited list B
in condition PI, while the remaining three participants post-edited list A in condition
PI and post-edited list B in condition P.
5.2.4 Results
In Sect. 5.2.4.1 we provide an overall comparison of the translation durations, in
terms of FdurN, KdurN and PdurN, which show that on average all translators slow
down in the PI mode. Section 5.2.4.2 shows individual differences in post-editing
behaviour: for some of the post-editors total post-editing time can be predicted
by typing durations, while for other types of post-editors typing duration is less
indicative of the total post-editing time (see Sect. 5.3).
100
V. Alabau et al.
Cond
PI
P
PI
P
PI
P
PI
P
PI
P
PI
P
Total
Mins
0:744
0:595
0:5
0:416
0:429
0:353
0:569
0:362
0:447
0:354
0:538
0:416
0:477
Mdel
0:399
0:545
0:223
0:346
0:187
0:319
0:280
0:329
0:181
0:314
0:254
0:371
0:312
FdurN
563:64
529:71
456:53
439:87
623:81
573:68
684:30
701:46
320:72
284:43
529:80
505:83
517:82
KdurN
254:3
215:86
173:06
157:46
223:79
167:51
230:22
161:53
158:18
138:20
207:91
168:11
188:01
PdurN
113:33
88:51
68:24
68:51
85:26
63:77
130:28
88:32
69:99
54:25
93:42
72:67
83:05
101
Fig. 5.1 LS14 studyFdurN (horizontal) vs. KdurN (vertical) for all five participants
102
V. Alabau et al.
Fig. 5.2 LS14 studyproductivity as reflected in KdurN taking into account 6 weeks
103
2. The ITP mode involves a great amount of internet traffic: a new translation
prediction is sent over the internet for (almost) every keystroke. This adds to the
traffic from the gaze samples (at a rate of approximately 300 Hz), which are also
sent over the internet to a remote server, so that a delay in response was frequently
observed in the office of the translation agency when using CASMACAT in the
PI setting
In addition to this, using an eye-tracker involved limited head movement and
sometimes recalibration during the process of post-editing was necessary. Together,
these aspects may have had a negative effect on participants productivity in weeks
1 and 6, orin other wordsthe data might show a lab effect.
The productivity drop for week 6 under PI can also be found in the difficulty
of the texts themselves: TER values were computed for all the texts in LS14, and
values were particularly higher for texts in week 6. We could identify text 20 in
week 6 (post-edited under PI by participants P01, P03 and P05) as one of the most
difficult texts to post-edit. Text 20 in LS14 was of a more specialized nature of legal
text. This different degree in text specialization could be the reason for both lower
MT quality and thus requiring more edits from the post-editors, as reflected in the
higher number of edits recorded in TER values.
Assuming that working at home and working in the office are two different conditions, we calculated a learning projection based only on the 4 weeks
when post-editors worked in the office. Figure 5.3 plots the two conditions in
Fig. 5.3 LS14 studyproductivity projection as reflected in Kdur based only on the data from
weeks 25 (working from home)
104
V. Alabau et al.
LS14 showing that post-editing under the PI condition could have become
theoreticallymore efficient already after 6 weeks.
The two regression lines in Fig. 5.3 are based on simple linear models and
show the projection of the average post-editing time under the PI and the P
conditions over a hypothetical timeframe of 7 weeks. The grey areas around the
linear regression lines represent the 95 % confidence region for each regression.
According to this projection, it is between weeks 6 and 7 that post-editors would
become more efficient under the PI condition than under the P condition. While
this is a hypothetical assumption, assuming a linear relationship between time spent
working on the CASMACAT workbench and Kdur, this projection clearly shows a
learning effect for the PI condition, which is absent in the P condition.
Activity units are presented and discussed in Chap. 2. For an alternative approach to define activity
microunits, see also Chaps. 8 and 14 in this volume.
105
Table 5.2 Information about participants in the LS14 and CFT14 studies
Participants
Gender
Years of translator training
Years of professional experience
Post-editing experience
Took part in LS14 study
Took part in the CFT14 study
P01
F
4
8
Yes
Yes
Yes
P02
M
4
8
Yes
Yes
Yes
P03
F
0
1
Yes
Yes
Yes
P04
F
3
3
No
Yes
Yes
P05a
M
14
14
Yes
Yes
No
P05
F
5
27
Yes
No
Yes
P06
F
4
3
Yes
No
Yes
P07
M
4
11
Yes
No
Yes
106
Table 5.3 Comparing
properties of EMEA corpus
translations and the news
translations
V. Alabau et al.
Study
LS14
CFT14
TType
News
EMEA
HTra
0:612
0:445
CrossS
1:60
1:44
CrossT
1:29
1:23
SLen
25:0
21:0
TLen
27:85
22:93
A comparison of the PIO mode and active learning is discussed in Chap. 3, this volume.
10
11
See also Chaps. 9 and 13 in this volume for a discussion on word translation literality, and how
the Cross and the HTra features are indicators for this end.
107
Fig. 5.4 Ratio of typing time (Kdur) and production time (Fdur) for C1 and C2 translators
We take the traditional post-editing mode (P) as a baseline and compare the
differences in relative typing effort for the ITP mode across different post-editors,
and the two groups C1 and C2 with and without extended exposure to CASMACAT.
We find that this measure provide another suited indicator for translator profiles, and
to capture the learning IPT effects.
Figure 5.4 shows that most of the post-editors in the C1 group ({P01, P02, P03})
have a lower proportion of coherent keystroke activities (Kdur/Fdur) in the PIO
mode than in the P mode. That is, in the interactive ITP mode these post-editors
seem to have learned to accept interactive suggestions which reduces the amount of
their coherent typing time, which is not the case for the translators in the C2 group.
C1 translators seemed to accept the interactive translation suggestions more often
than the new C2 translators by less frequently overwriting the ITP proposals.
Post-editor P04 is an exception in the C1 group, which might be explained by
the fact that she did not have any prior experience with post-editing MT output
and performed already in the most unpredictable way during the LS14 study (see
Fig. 5.1). P05 has the highest KdurN=FdurN ratio, indicating her ability to make
use of her time in the most productive way. Comparing the performance patterns in
Fig. 5.4 and taking into account that P04 is (one of) the least experienced translators,
while P05 is the most experienced one suggest that the KdurN=FdurN measure
captures some important features.
All post-editors self-rated their typing skills as excellent in an introductory
questionnaire and, indeed, their typing speed caused many cases of overwriting
behaviour as they continued typing even though the right suggestions by the ITP
system were already pasted in the target text. Learning to control this overwriting
behaviour was also reported by the post-editors themselves when providing user
feedback, as reported in the next section.
108
V. Alabau et al.
Advanced touch typists need to be aware of the fact that they will only benefit
from ITP when they stop overwriting most of the suggestions offered by the system.
As was also visible in the collected screen recordings, P01 and P02 are the two
participants with more cases of overwriting behaviour due to their fast typing speed.
With respect to the second question, four out of the five post-editors in LS14
answered It depends (on the text type, quality of the machine translation, etc.).
P02 was the only one who would always prefer to translate instead of post-edit.
The third question in the questionnaire wanted to explore how likely it was that
translators would adopt the CASMACAT workbench as a professional tool. P02 and
P05a were the only ones who would not use the workbench for further post-editing
12
The questionnaire used to collect the user feedback presented in this section is available at this
introductory questionnaire.
109
projects claiming that existing commercial CAT tools already serve this purpose.
P01, P03 and P04 stated that they would adopt this workbench for post-editing
purposes in the future.
When asked about the benefits of ITP, the responses collected were diverse: P05a
stated that he was not able to mention any advantages and P02 argued that he rarely
benefited from the suggestions provided by the system. The rest of the participants
offered a more positive view on ITP, acknowledging, for instance, that the idea
behind ITP certainly helps to decrease the technical effort (typing). However, they
would have to invest more time in order to increase productivity using this novel
workbench by learning not to overwrite many of the ITP suggestions. In line with
this finding, P01 mentioned I have to retrain myself on typing for ITP purposes.
With respect to the disadvantages of ITP, all participants (except P03) mentioned
that it is difficult to become familiar with the fact that the target text is constantly
changing. It is difficult to pay attention to the source text, the target text and, in
addition, to all the suggestions triggered by the ITP. In addition, P02 suggested that
another area of the screen could be used to show these predictionssimilar to how
translation memory matches are shown in a separate window.
The feedback collected seemed to offer a clear cut difference between the
extremely positive attitude towards ITP shown by P03 (the only one without
translator training and less experience) and the negative views offered by P05a (the
participant with most years of formal training and many years of experience). These
two extremes in terms of experience and formal training certainly played a decisive
role for ITP acceptance.
5.5 Discussion
The aim of this study was to explore the benefits of working with interactive
machine translation combined with online learning techniques for post-editing
purposes. Results from the LS14 study showed how professional translators needed
an average of 6 weeks (see Fig. 5.3) to become familiar with interactivity features for
post-editing purposes. The crucial factor in order to obtain a successful interaction
between the post-editor and the ITP featured in CASMACAT is directly related
to their typing behaviour. Only after post-editors stop overwriting most of the
suggestions provided by the system can productivity gains be reached by using ITP.
Touch typists find this trade-off between typing speed and the suggestions provided
by the system somehow difficult to achieve. This study shows that after weeks
of interaction, a successful interaction can be achieved. It would be interesting
to conduct further studies to explore if non-touch typists or non-professional
translators with a slower keyboard activity, become more easily acquainted with
this technology within a shorter timespan.
Most of the participants reported that they would prefer to work without
interactivity but with online learning, a technique which is described in more detail
in Chaps. 3 and 7 in this volume.
110
V. Alabau et al.
Acknowledgements The work described in this chapter was carried out under the auspices of
the EU project CASMACAT: Cognitive Analysis and Statistical Methods for Advanced Computer
Aided Translation, supported by the European Union 7th Framework Programme Project 287576
(ICT-2011.4.2). Website: http://www.casmacat.eu.
References
Carl, M., Martinez, M.G., Mesa-Lao, B., Underwood, N., Keller, F., & Hill, R. (2013). CASMACAT
project tech report: D1.2: Progress report on user interface studies, cognitive and user
modeling. European Commission.
Langlais, P., Lapalme, G., & Loranger, M. (2004). Transtype: Development evaluation cycles to
boost translators productivity. Machine Translation, 15, 7798.
Sanchis-Trilles, G., Alabau, V., Buck, C., Carl, M., Casacuberta, F., Garca-Martnez, M., et al.
(2014). Interactive translation prediction versus conventional post-editing in practice: A study
with the CasMaCat workbench. Machine Translation, 28(34), 217235.
Singla, K., Carmona, David O., Gonzales, A., Carl, M., & Bangalore, S. (2013). Predicting posteditor profiles from the translation process. In Proceedings of the Workshop on Interactive and
Adaptive Machine Translation, AMTA Workshop, Vancouver, Canada (pp. 5160).
Chapter 6
111
112
J. Daems et al.
6.1 Introduction
With the increasing need for faster and cheaper translations due to the increasing
amount of text to be translated, computer-aided translation has become more and
more widespread. While correcting machine translation output by means of postediting is now a relatively common task for translators, professional translators are
still reluctant to do it, and it is still not clear exactly how regular translation differs
from post-editing.
A better understanding of the differences between human translation and postediting can improve the field of translation in numerous ways. On the one hand,
the knowledge can be used to improve translation tools to better aid translators
with their work, by indicating in which cases a translator should be allowed
to work from scratch, or in which cases he can benefit from the presence of
machine translation output. On the other hand, insight in these differences can
help understand the reluctance of professional translators to post-edit and can help
colleges and universities to teach translation students the appropriate skill sets
required for the increasingly technological translation work. Recent studies indicate
that certain types of college students would make decent post-editors (Yamada
2015).
In this chapter, we focus on the usage of external resources by student translators
translating and post-editing newspaper articles from English into Dutch. For both
types of activity, we compare the number and type of resources consulted. We also
investigate whether consulting different types of resources and spending more or
less time consulting external resources leads to a decrease or increase in productivity
and/or quality of the final product.
113
translation, search queries can be related to source text meaning, meaning transfer
or target text production. For post-editing, however, the machine translation output
comes into play as well. Whereas the presence of this MT output is intended to
facilitate and speed up the translation process, professional translators seem to
benefit less from post-editing than translation trainees (Garcia 2011). This could
be caused by insecurity about the quality of the MT output, which leads to a higher
number of consulted resources, which could, in turn, negatively affect productivity.
A better understanding of the usage of external resources during translation and
post-editing is needed to obtain a more profound insight into successful problemsolving strategies with regard to quality and productivity.
External resources are usually registered by means of screen capture software
such as Camtasia Studio (Gpferich 2010). The drawback of this software, however,
is the fact that the data still needs to be replayed and manually encoded for
automatic analysis, which can be quite time-consuming. TAP can provide some
idea of the resources consulted, but participants utterances are often incomplete
and researchers still need to look at the screen recordings in parallel to make sense
of their data (Ehrensberger-Dow and Perrin 2009). Some previous research has
made use of data gathered with the TransSearch tool to get a better insight in
translators queries (Macklovitch et al. 2008), but they are limited to one type of
resource (TransSearch) and dont take other types of resources into account. The
present study attempts to solve these issues by introducing a new method for the
analysis of external resources by means of Inputlog (Leijten and Van Waes 2013),
a keystroke logging tool originally intended for writing research, which logs all
Windows-based applications. In a recent study, Inputlog has been used to analyze
the external resources used by a professional communication designer when creating
a proposal (Leijten et al. 2014). To the best of our knowledge, Inputlogs logging
of external resources has not been used for translation research before the present
study. Weve opted for a combination of CASMACAT and Inputlog to be able to
fully grasp the translation process with external resources. As described in Chap. 2,
Sect. 2.7.1, an extra table for the TPR-DB can be created, which accommodates the
Inputlog data and allows for a more thorough analysis of external resources, adding
an extra layer to the translation process research options the TPR-DB currently
provides.
6.3 Methodology
6.3.1 Participants
Participants were ten masters students of translation, who had passed their English
General Translation exam. Eight participants were female, two participants were
male, and ages ranged from 21 years old to 25 years old. Two participants
wore contact lenses and one participant wore glasses, yet the calibration with
114
J. Daems et al.
the eyetracker was successful for all three participants. Students had no previous
experience in post-editing. To prevent exhaustion effects, each session was spread
over two half days on different days. Participants received a gift voucher of 50 euros
for each half-day session, amounting to 100 euros per participant.
as well. We selected articles from different topics with the highest possible Lexile
levels (between 1160 L and 1190 L2 ), and selected 150160 words from each article
1
2
newsela.com
The authors would like to thank MetaMetrics for their permission to publish Lexile scores in the
present chapter. https://www.metametricsinc.com/lexile-framework-reading
115
Table 6.1 Latin square design, mixed text order and task order
Participant
Session1
Session2
P1
P3
P5
P7
P9
P2
P4
P6
P8
task1
PE_1
PE_8
PE_7
PE_6
PE_5
HT_1
HT_8
HT_7
HT_6
P10
HT_5
task2
task3
PE_2
HT_3
PE_1
HT_2
PE_8
HT_1
PE_7
HT_8
PE_6
HT_7
HT_2
PE_3
HT_1
PE_2
HT_8
PE_1
HT_7
PE_8
HT_6
PE_7
task4
task5
task6
HT_4
HT_5
HT_6
HT_3
HT_4
HT_5
HT_2
HT_3
HT_4
HT_1
HT_2
HT_3
HT_8
HT_1
HT_2
PE_4
PE_5
PE_6
PE_3
PE_4
PE_5
PE_2
PE_3
PE_4
PE_1
PE_2
PE_3
PE_8
PE_1
PE_2
task7
task8
PE_7
PE_8
PE_6
PE_7
PE_5
PE_6
PE_4
PE_5
PE_3
PE_4
HT_7
HT_8
HT_6
HT_7
HT_5
HT_6
HT_4
HT_5
HT_3
HT_4
Columns are labeled with participant codes (ranging from P1 to P10), cells contain codes for the task type (PE
post-editing, HT human translation) and text (ranging from 1 to 8)
116
J. Daems et al.
The first session consisted of the following steps: first, participants filled out an
introductory survey, asking them about their experience with an attitude towards
post-editing; second, they performed the LexTALE test (Lemhfer and Broersma
2012) to be able to measure their English proficiency; third, they copied a text of
150 words, so that they could get used to the keyboard and the chin rest of the eyetracker; fourth, they translated a text in the CASMACAT interface, consisting of four
segments that were post-edited and four segments that were translated manually, to
get them acquainted with the tool and task; and finally, participants translated two
texts and post-edited two texts. For both types of task, the students were instructed
to make sure the final product was of publishable quality. Each segment in the
CASMACAT interface contained one sentence.
The second session started with another warm up task within CASMACAT,
consisting of four segments to be post-edited and four segments to be translated
manually, followed by the actual tasks: two texts to be translated manually and
two texts to be post-edited. After these tasks, participants had to look at the texts
again and highlight the most problematic passages for one translation task and one
post-editing task. They were asked to add comments to these passages in a Word
document. At the end of the session, participants had to fill out another survey,
asking them about their experience and their attitude towards post-editing.
6.4 Analysis
The final dataset consisted of CASMACAT and Inputlog data (xml-files) for all
80 sessions. Using the scripts provided with the TPR-DB, the CASMACAT xmlfiles were prepared for word alignment. A first, automatic, alignment was done
with GizaCC (Och and Ney 2003), which we then manually corrected with the
YAWAT tool (Germann 2008). Data from the aligned files was extracted and
converted to more manageable table formats with another TPR-DB script (see
Chap. 2). From the Inputlog data, we extracted the focus events with the provided
software (focus events contain information on the opened application or screen, time
spent in the application, and keystrokes). We then manually grouped the different
events into categories: dictionary, web search, concordancer, forum, news website,
encyclopedia, etc. Figure 6.1 shows an overview of the most common categories for
human translation and post-editing. As can be seen, most types of external resources
are only sporadically used, with the exception of search engines, concordancers,
dictionaries, and encyclopedias. We therefore limit ourselves to these four categories
for further analysis, and group the other external resources together in a generic
category other.
A next step was to combine the CASMACAT and Inputlog data for subsequent
analysis. Since this is the first study where data from both tools are combined, the
TPR-DB had to be updated to accommodate for the new data. An InjectIDFXscript was developed to merge Inputlog data with the CASMACAT xml-files.
CASMACAT only logs the keystrokes and events within the CASMACAT interface.
117
Fig. 6.1 General overview of resource types used in and human translation (HT) and post-editing
(PE), expressed in total number of resource hits (left) and total duration (right) over all 80 sessions
The xml-files themselves contain a blur-event whenever a person leaves the CASMACAT interface and a focus-event whenever they return to the CASMACAT
interface, but whatever happens between the blur and the focus-event is unknown.
By adding the Inputlog data to the xml-files, we can analyze what happens when
a person leaves the CASMACAT interface as well. We added an extra table: the
EX-table, containing information on external resources consulted, the time spent in
the resource, and keystrokes made within the external resource. We added an extra
column to the EX-file where we added the categories we had assigned to the various
Inputlog events. An extract from an EX-file can be seen in Table 6.2 below.
Looking at the Focus column and corresponding category label in Table 6.2,
we see the participant moving from the main document (CASMACAT, EXid 3) to
a new tab in Google Chrome (EXid 4), where he types woorden : : : (see edit),
leading him to the Dutch spelling website Woordenlijst (EXid 5). He then types
groot-bri to look up the Dutch spelling of Britain (Groot-Brittanni). After this
search, he returns to the CASMACAT interface (EXid 6) for 2 min, after which he
again opens a new tab in Google Chrome (EXid 7) for the next search: linguee,
allowing him to go to the Linguee concordancer (EXid 8), where he looks up the
translation of in fact (EXid 9) before returning to the CASMACAT document once
more (EXid 10).
It is currently impossible to automatically map external resources to the correct
segment. In the data file, there is a column for the last segment that was open
before the CASMACAT interface was left, and the first segment to be opened after
returning to the CASMACAT interface, but the search itself could be related to
either one, or even an entirely different segment. For example, a person can look
up a word in a dictionary while translating the first segment of a text. If the person
goes back to the CASMACAT interface without closing the screen with the search
Woordenlijst Nederlandse
TaalOfficile
SpellingGoogle Chrome
Translate
T1_T5_PE_P9.xlf204
Google Chrome
Nieuw tabblad
Google Chrome
Linguee j Nederlands-Engels
woordenboek (en andere
talen)Google Chrome
in factNederlandse
vertalingLinguee
woordenboek
Google Chrome
Translate
T1_T5_PE_P9.xlf204
Google Chrome
10
223,618
218,853
216,135
212,587
89,075
85,138
81,778
53,975
Time
264,006
4765
2718
3548
123,512
3937
3360
Dur
9633
9633
9633
9633
9630
9630
9630
9629
ST segN
9632
9632
9632
9632
9629
9629
9629
1
ST segL
12 C 13 C 14
12 C 13 C 14
15 C 16
15 C 16
75
75
75
70
70
70
70
12 C 13 C 14
15 C 16
75
1
STidL
STidN
193
193
193
193
122
122
122
KD idN
192
192
192
192
121
121
121
1
KD idL
eed
n fact
linguee
groot-bri
woorden[..]nlij
Edit
MAIN
CONCORDANCER
CONCORDANCER
NAVIGATION
MAIN
SPELLING
NAVIGATION
MAIN
Category
Each time the participant switches to another screen or application, a focus event is recorded, with code EXid and a label found in column Focus. Time is time in ms since the
beginning of the session, Dur is the time in ms spent in a particular focus event. STsegL represents the last segment opened in CASMACAT before leaving the tool, STsegN is the
next segment opened after returning to the CASMACAT tool. STidL and STidN represent the last source token before leaving CASMACAT and the next token after returning to
CASMACAT. KDidL and KDidN contain the ID of the last keystroke before leaving CASMACAT and the next keystroke after returning to CASMACAT. The actual characters
typed within a focus event are shown in the column edit. Each focus event is given a corresponding category
:::
Nieuw tabblad
Google Chrome
Translate
T1_T5_PE_P9.xlf204
Google Chrome
Focus
:::
3
EX id
118
J. Daems et al.
119
query on it, the next time that person opens the search query, this will show up
exactly like the search made during the first segment in the data. It would require a
lot of extra manual work to label each external resource with the correct segment.
In the future, we will try to better map the CASMACAT and Inputlog data by
looking at keystrokes or by filtering on the time spent on certain pages. At the
moment, however, we grouped the information from the EX-files per session, and
not per segment so as to not incorrectly link certain resources to segments. This
information was added to the more general SS-file, a table containing an overview
of the different sessions. For the different categories (Dictionary, Concordancer,
Encyclopedia, Search, and Other) we added a column containing the number of
times that resource was consulted in that particular session, and a column containing
the time spent in that resource during the session. To be able to better compare the
data across all sessions, we normalized the counts and durations by dividing them
by the number of source text tokens.
120
J. Daems et al.
Fig. 6.2 Effect plot of relationship between task (HT D human translation, PE D post-editing) and
predicted time (in ms) spent in external resources normalized per ST token. Error bars represent
95 % confidence intervals
conclude that the null model without fixed effects (and AIC value of 1256.8) is not
supported enough, so we drop it in favour of the model with fixed effect (and AIC
value of 1251.3). The model summary further showed that significantly more time
is spent in external resources in human translation, compared to post-editing: about
297 ms 105 (standard errors). The effect plot obtained with the effects package
(Fox 2003) is depicted in Fig. 6.2 below. This plot indeed confirms that less time
is spent in external resources when post-editing than when translating. Though
the confidence intervals in Fig. 6.2 overlap to some extent, this does not affect
the statistical significance found (Goldstein and Healey 1995). Visual inspection
of normal Q-Q plots indicated right skewed data, which is presumably due to the
natural boundary at zero, which is an integral part of the data: It is impossible to
spend less than 0 s in external resources, fifty per cent of data points are below
1000 ms, with very few observations above 2000 ms.
In addition to the overall comparison of time spent in external resources, we
wanted to check whether the time spent in each type of external resource differed
between both methods of translation. We restructured our data of the session
summary table (cf. Chap. 2, Sect. 2.3) to be able to perform the appropriate analysis.
An excerpt of the new data file can be seen in Table 6.3 below.
121
Table 6.3 Restructured data for comparative analysis of usage of external resources between
human translation and post-editing
Session
P01_P01
P01_P01
P01_P01
P01_P01
P01_P01
Participant
P01
P01
P01
P01
P01
Text
T1
T1
T1
T1
T1
Task
P
P
P
P
P
ExternalSource
Dictionary
Concordancer
Encyclopedia
Search
Other
CountSource
0.033898305
0.084745763
0
0.096045198
0
DurSource
228,3,785,311
369,7,909,605
0
417,0,225,989
0
The column CountSource contains the number of times each resource was consulted during a
particular session, normalized per ST token, and the column DurSource contains the time spent in
each external resource during a particular session, also normalized per ST token
122
J. Daems et al.
Fig. 6.3 Effect plot of predicted time (in ms) spent in each type of external resource, normalized
per ST word, for both task types (left: HT D human translation, right: PE D post-editing)
123
Fig. 6.4 Effect plot of predicted total time (in ms) normalized per ST token for both task types
(HT D human translation; PE D post-editing). Error bars represent 95 % confidence intervals
total time, and the time spent in external resources is significantly higher for human
translation than for post-editing, the time spent in external resources is a much better
predictor of overall time than the task type. The model summary shows that every
millisecond spent in external resources per ST token corresponds to a total time per
token to increase by 1.348 ms (0.145; p < 0.001), thus causing us to reject the
hypothesis that the time spent in external resources reduces the overall time needed.
The effect plot can be seen in Fig. 6.5 below. Visual inspection of residual plots did
not reveal any obvious deviations from homoscedasticity or normality.
124
J. Daems et al.
Fig. 6.5 Effect plot of relationship between time spent in external resources normalized per ST
token and total time normalized per ST token (both in ms)
final text as a good text in the target language and culture, and quality with regards
to the correspondence between source and target text. Acceptability and adequacy
each contain various subcategories (such as, for example, grammar, spelling, style
and lexicon for acceptability; and word sense, deletions and contradictions for
adequacy), allowing for a fine-grained error analysis. Each error category also
receives an error weight from zero to four, indicating the severity of the error for
the specific text type (for example, a contradiction error receives a weight of four,
whereas a capitalization error receives a weight of one). We do also provide an
overall quality score. The overall score is calculated by summing up the error scores
for acceptability and adequacy and subtracting those acceptability items which were
caused by adequacy errors, so as to not penalize the same problem more than once.
For example, a word sense error (adequacy) can also lead to a logical problem
(acceptability), as is the case in the following situation: The source text contains
the verb to spend, meaning to spend money (e.g. families continue to spend
cautiously), but this is translated as doorbrengen in Dutch, meaning to spend
time. The word doorbrengen in this sentence is both a word sense error and a
logical problem in the target text. Rather than summing up both error scores in
these situations, we only count the error score for the word sense error. Two of
the authors highlighted and labeled all errors in the translations, after which we
held a consolidation phase where problematic cases were discussed and resolved.
Our analyses were conducted on data containing only those errors both annotators
agreed on. As with the information on external resources, the error count and score
for each category was added to the session file (SS) and normalized by dividing
through the number of words in the source text.
125
126
J. Daems et al.
Fig. 6.6 Effect plot of the predicted relationship between time spent in external resources
normalized per ST token and overall error score normalized per ST token, for both types of task
(left: HT D human translation, right: PE D post-editing)
127
those students are not critical enough: the fact that they perceive human translation
as being more tiring could indicate that they struggle with human translation
potentially leading to high error scoresand the fact that they perceive post-editing
as less tiring could indicate that they trust the machine translation output too much
again leading to higher error scores. These assumptions warrant further investigation
in future research.
6.4.3.2 Acceptability
After looking at quality in general, we took a closer look at our two aspects
of quality: acceptability and adequacy, beginning with the first. Inspection of
exploratory box plots showed no obvious difference between the acceptability score
normalized per ST token for both tasks, which was confirmed by fitting a simple
linear model with acceptability error score as dependent, and task as predictor
variable. In this model, task was not a significant predictor of the acceptability error
score (p D 0.35), which is in line with the findings from the overall error score.
We then set out to statistically assess the relationship between time spent in
external resources and acceptability error score. We performed a linear mixed
effects analysis with normalized acceptability error score as dependent variable and
task and normalized time spent in external resources with interaction as predictor
variables. Participant was added as a random effect, with added random slope
for task. This model, however, did not significantly perform better than the null
model (p D 0.57). Backward elimination of non-significant effects with the step
function showed that none of the predictor variables significantly added to the
model. Only participant as random effect with random slope for task was retained,
leading us to conclude that neither the overall time spent in external resources nor
task type has a significant effect on the acceptability error score, but acceptability
error score is most likely influenced by between participant differences. In their
2010 paper, Carl and Buch-Kromann also found no significant relationship between
longer translation times and the fluencywhich corresponds to our notion of
acceptabilityof student translators.
The following step was to see whether time spent in specific external resource
types had an effect on acceptability error score. We performed a linear mixed effects
analysis to assess the relationship between the total acceptability error score per ST
token and the time spent in the various external resources per ST token. The full
model contained the duration of all external resources as possible predictor variables
(dictionary, encyclopedia, search, other, concordancer). Text and participant were
added as random factors, with added random slope for task. We used the step
function from the lmerTest package to assess the necessity of each variable through
automatic backward elimination of effects.
On the basis of this analysis, we again only retained participant as a random
effect, with random slope for task, and the duration for dictionary as a predictor
variable. This was the only predictor variable found to have an impact on overall
acceptability quality. The final model was tested against a null model without
128
J. Daems et al.
Fig. 6.7 Effect plot of the predicted relationship between time spent in dictionaries normalized
per ST token and acceptability error score normalized per ST token
predictor variable, and was found to provide a significantly better fit (p D 0.01762),
reducing AIC from 384.9 to 388.53.
The effect plot can be seen in Fig. 6.7 below. Residual plots did not reveal any
obvious deviations from homoscedasticity or normality. Each millisecond spent in
dictionaries affects the acceptability error score per ST token with 0.000016 points
(0.000006). So each second spent to look something up in a dictionary can reduce
the acceptability error score for that word with approximately 0.016 units. We can
conclude that dictionaries seem to be the only external resource that significantly
reduces the acceptability errors made, making it perhaps the most useful resource
with regards to acceptability issues.
6.4.3.3 Adequacy
A second aspect of quality is adequacy. We again fit a linear model, this time with
normalized adequacy error score as dependent variable and task as predictor variable. As was the case for acceptability, no significant effect was found (p D 0.527).
We then performed a linear mixed effects analysis with normalized adequacy
error score as dependent variable and normalized time spent in external resources as
predictor variable to assess the relationship between time spent in external resources
and adequacy quality. Participant and text were added as a random effects, with
added slope for task. This model, however, did not perform better than a model
without fixed effects (p D 0.7), increasing the AIC value from 346.67 to 344.82.
Backward elimination of non-significant effects with the step function from the
lmerTest package showed only text to be a significant random effect, without
129
slope. We can conclude that the overall time spent in external resources does not
significantly influence the obtained adequacy error score. This finding is in line with
the findings by Carl and Buch-Kromann (2010) that there is no notable correlation
between accuracywhich corresponds to our notion of adequacyand translation
time.
The next step was to look at the influence of the different types of resources. We
applied the same methodology to assess the relationship between the total adequacy
error score normalized per ST token and the time spent in the various external
resources normalized per ST token. Again, the full model contained the duration
of all external resources as possible predictor variables (dictionary, encyclopedia,
search, concordancer, other), as well as the task predictor variable. Text and
participant were added as random factors, with added random slope for task. We
used the step function from the lmerTest package to assess the necessity of each
variable.
On the basis of this analysis, we only retained task as a random effect, without
random slope. This time, the only predictor that came out of the analysis as
having a significant effect on overall adequacy error score, was the time spent in
encyclopedias. The final model was tested against a null model without predictor
variable, and was found to provide a significantly better fit (p D 0.04182), reducing
AIC from 352.39 to 354.53.
The effect plot can be seen in Fig. 6.8 below. Residual plots did not reveal any
obvious deviations from homoscedasticity or normality. Each millisecond spent in
encyclopedia affects the adequacy error score per ST token with 0.000056 points
(0.000027). So each second spent to look something up in an encyclopedia can
increase the adequacy error score for that word with approximately 0.056 units.
Fig. 6.8 Effect plot of the predicted relationship between time spent in encyclopedias normalized
per ST token and adequacy error score normalized per ST token
130
J. Daems et al.
6.5 Conclusion
We have conducted a balanced experiment comparing the usage of external
resources in human translation and post-editing for general text types, and the
effects on time and quality of a text, using a unique combination of state-of-the-art
keystroke logging tools. We discussed the addition of Inputlog data to the TPR-DB
by means of EX-files (see Chap. 2), containing information on the usage of external
resources in a format that is easy to use with the existing TPR-DB tools. This study
moves beyond the limitations of previous studies, that either had to make do with
manual observation of external resources (Gpferich 2010) or looked at data from
within one type of external resource only (Macklovitch et al. 2008).
We found a significant difference in time spent in external resources for both task
types (with translation requiring more time). In contrast with our expectations, we
found no statistical evidence for the hypothesis that translators use different types
of resources, and in different quantities when translating or post-editing, though
there seems to be a trend to spend more time in each resource when translating than
when post-editing. Significantly less time is spent in encyclopedias and other types
of resources compared to dictionaries, concordancers and search engines, for both
types of translation.
The overall time needed to translate a text was significantly higher for translation
than for post-editing, which is in line with previous findings (Plitt and Masselot
2010). We further found that the time spent in external resources significantly
131
increases the total time needed to translate a word, indicating that even though the
resources might help translators solve translation problems, this goes at the cost of
overall productivity. While participants needed significantly more time to translate
than to post-edit a word, the effect of time spent in external resources was greater
than the effect of the task type.
In a final analysis, we looked at the effect of external resources on the quality
of a text. The overall quality of a translation did not seem to be significantly
influenced by one specific type of resource, but rather by the overall time spent in
external resources, as well as by the task type. When looking at post-editing, longer
consultation of external resources was accompanied by higher overall error scores,
whereas the opposite was true for human translation, where longer consultation of
external resources was accompanied by lower overall error scores. This leads us
to believe that participants are more successful in problem solving by consulting
different resources when translating than when post-editing. This finding is in
line with the suggestion by Yamada that post-editing requires different skills from
human translation (2015). With regards to the acceptability aspect of quality, we
found no significant difference between human translation and post-editing. When
looking at the effect of each type of external resource on acceptability quality, we
found that extra time spent consulting dictionaries does bring about an increase
in acceptability quality, perhaps making it worth the loss in productivity. With
regards to the adequacy aspect of quality, we again found no significant difference
between human translation and post-editing. When looking at the effect of each
type of external resource on adequacy quality, we found that spending more time in
encyclopedias does not bring about a decrease in error score, but rather an increase.
This indicates that longer searches do not necessarily lead to better translations with
regards to adequacy.
In sum, we can conclude that, whereas search strategies during the translation
process are more effective than those used when post-editing, post-editing is still
faster than human translation without negatively affecting the final quality of the
product.
132
J. Daems et al.
References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on
Automatic Control, 19(6), 716723.
Alabau, V., Bonk, R., Buck, C., Carl, M., Casacuberta, F., Martnez, M., et al. (2013). CASMACAT:
An open source workbench for advanced computer aided translation. The Prague Bulletin of
Mathematical Linguistics, 100, 101112. doi:10.2478/pralin-2013-0016.
Angelone, E. (2010). Uncertainty, uncertainty management and metacognitive problem solving in
the translation task. In G. Shreve & E. Angelone (Eds.), Translation and cognition (pp. 1740).
Amsterdam; Philadelphia: Benjamins.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). lme4: Linear mixed-effects models using
Eigen and S4. R package version 1.1-7.http://CRAN.R-project.org/package=lme4
Burnham, K., & Anderson, D. (2004). Multimodel inference: Understanding AIC and BIC in
model selection. Sociological Methods & Research, 33, 261304.
Carl, M. (2012). The CRITT TPR-DB 1.0: A database for empirical human translation process
research. In S. OBrien, M. Simard, & L. Specia (Eds.), Proceedings of the AMTA 2012
workshop on post-editing technology and practice (WPTP 2012) (pp. 918). Stroudsburg, PA:
Association for Machine Translation in the Americas (AMTA).
Carl, M., & Buch-Kromann, M. (2010). Correlating translation product and translation process
data of professional and student translators. In Proceedings of EAMT, Saint-Raphal, France.
Daems, J., Macken, L., & Vandepitte, S. (2013). Quality as the sum of its parts: A two-step
approach for the identification of translation problems and translation quality assessment for
HT and MTCPE. In Proceedings of the MT summit XIV workshop on post-editing technology
and practice (pp. 6371).
Daems, J., Macken, L., & Vandepitte, S. (2014). On the origin of errors: A fine-grained analysis
of MT and PE errors and their relationship. In N. Calzolari, K. Choukri, T. Declerck, H.
Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings
of the ninth international conference on language resources and evaluation (LREC14) (pp.
6266). Reykjavik, Iceland: European Language Resources Association (ELRA).
Ehrensberger-Dow, M., & Perrin, D. (2009). Capturing translation processes to access metalinguistic awareness. Across Languages and Cultures, 20(2), 275288.
Fox, J. (2003). Effect displays in R for generalised linear models. Journal of Statistical Software,
8(15), 127. http://www.jstatsoft.org/v08/i15/
Garcia, I. (2011). Translating by post-editing: Is it the way forward? Machine Translation, 25,
217237.
Germann, U. (2008). Yawat: Yet another word alignment tool. In 46th annual meeting of the
association for computational linguistics: Human language technologies; demo session, 20
23. Columbus, OH.
Goldstein, H., & Healey, M. (1995). The graphical presentation of a collection of means. Journal
of the Royal Statistical Society, 158, 175177.
Gpferich, S. (2010). The translation of instructive texts from a cognitive perspective. In F. Alves,
S. Gpferich, & I. Mees (Eds.), New approaches in translation process research (pp. 565).
Frederiksberg: Samfundslitteratur.
133
Jakobsen, A. (2003). Effects of think aloud on translation speed, revision and segmentation. In F.
Alves (Ed.), Triangulating translation: Perspectives in process oriented research (pp. 6995).
Amsterdam: Benjamins.
Jakobsen, A., & Schou, L. (1999). Translog documentation. In G. Hansen (Ed.), Probing the
process in translation: Methods and results (pp. 136). Frederiksberg: Samfundslitteratur.
Krings, H. (2001). Repairing texts. Empirical investigations of machine translation post-editing
processes. Kent, OH: Kent State University Press.
Kuznetsova, A., Brockhoff, P., & Christensen, R. (2014). lmerTest: Tests in linear mixed effects
models. R package version 2.0-20. http://CRAN.R-project.org/package=lmerTest
Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30(3), 358392.
doi:10.1177/0741088313491692.
Leijten, M., Van Waes, L., Schriver, K., & Hayes, J. (2014). Writing in the workplace: Constructing
documents using multiple digital sources. Journal of Writing Research, 5(3), 285337.
Lemhfer, K., & Broersma, M. (2012). Introducing LexTALE: A quick and valid lexical test for
advanced learners of English. Behavior Research Methods, 44, 325343.
Macklovitch, E., Lapalme, G., & Gotti, F. (2008). TransSearch: What are translators looking for? In
AMTA-2008: MT at work: Proceedings of the eighth conference of the association for machine
translation in the Americas (pp. 412419), Waikiki, Hawaii, St. Honolulu.
Och, F., & Ney, H. (2003). A systematic comparison of various statistical alignment models.
Computational Linguistics, 29(1), 1951.
Plitt, M., & Masselot, F. (2010). A productivity test of statistical machine translation post-editing
in a typical localization context. Prague Bulletin of Mathematical Linguistics, 93, 716.
R Core Team. (2014). R: A language and environment for statistical computing. Vienna, Austria:
R Foundation for Statistical Computing. http://www.R-project.org/
Yamada, M. (2015). Can college students be post-editors? An investigation into employing
language learners in machine translation plus post-editing settings. Machine Translation, 29,
4967.
Chapter 7
Investigating Translator-Information
Interaction: A Case Study on the Use
of the Prototype Biconcordancer Tool Integrated
in CASMACAT
Julin Zapata
J. Zapata ()
School of Translation and Interpretation, University of Ottawa, Ottawa, ON, Canada
e-mail: jzapa026@uottawa.ca
Springer International Publishing Switzerland 2016
M. Carl et al. (eds.), New Directions in Empirical Translation Process Research,
New Frontiers in Translation Studies, DOI 10.1007/978-3-319-20358-4_7
135
136
J. Zapata
Keywords Human-information interaction Information behaviour Information retrieval Information tools Usability
7.1 Introduction
In our day, translation is essentially both a computer-interaction task and an
information-interaction task. Indeed, throughout history, human translators have
used an array of tools not only to write their translations but also to search and
store information. In the digital age, information and communication technologies
(ICTs),1 and in particular language technologies (LTs),2 are integral parts of the
translation field, and have decidedly had a significant impact on translation research,
practice and teaching.
The current chapter introduces the notion of translator-information interaction
(TII) as the field of study that investigates translators interaction with (digital)
information and information tools. This new notion complements that of translatorcomputer interaction (TCI), coined by Sharon OBrien in 2012. TII and TCI
represent logical extensions of the fields of human-information interaction (HII)
and human-computer interaction (HCI) respectively. Now, although TII and TCI are
emerging fields of research, the interaction of translators with computers and digital
information is not a recent phenomenon, as OBrien (2012, pp. 103104) explains:
Already with the introduction of the electronic typewriter, with only two lines of memory,
and the use of dictaphones, translation became a computer-interactive task. This was
followed by the introduction of word-processing software [ : : : ][,] a development that would
have required some translators to interact with a computer for the first time. Not long after
the mass embracing of word processing, came the introduction of Translation Memory tools
[and] terminology management programs, which are [ : : : ] not restricted to the [parallel]
storage of terms [in two languages], but also store phrases and sometimes even sentences or
larger chunks of text [ : : : ].
In sum, in the age of ICTs, translators have adopted different types of computer
tools in an effort to facilitate their work and carry out their tasks effectively
(Austermhl 2001; Bowker 2002). For instance, parallel bilingual resources, as
ICTs are defined as the bulk of technological applications based on computing, microelectronics,
telecommunications and multimedia, the combination and interconnection of which allow people
to search, capture, process and transmit data of different nature (text, audio, image, video, etc.);
to interact with each other and with machines; to access information; and to spread and share
information (Tour et al. 2009, p. 35).
LTs are defined in this chapter as the bulk of natural language processing (NLP) applications
that facilitate the active or passive use of a natural language. Certain LTs are developed for the
general public, while others are developed for language professionals (e.g., writers, translators,
terminologists, etc.). LTs may be divided in two categories: spoken-language-based and writtenlanguage-based. Each one of these categories may be divided into two types: passive applications
(e.g., unchangeable information on the web or electronic/online dictionaries and term banks) and
active applications (e.g. text processing software, spellcheckers and speech recognition systems).
137
Fig. 7.1 Example of a search result in the prototype biconcordancer tool integrated in the
CASMACAT translators workbench
described above by OBrien, have been built and used for over two decades
(Langlois 1996). The present chapter deals in particular with an increasingly popular
type of parallel bilingual resource: bilingual concordancers, or biconcordancers.
This type of informational tool allows the user to search for a word, or any character
string, within a previously-aligned bilingual parallel corpus. Hence, the search result
consists of a list of segments in a language A containing the searched character
string, and their corresponding segments in a language B, as illustrated in Fig. 7.1.
By way of a literature review and a case study, a new approach in empirical
translation process research (TPR) is proposed in this chapter, that is, the investigation of translators interaction with (digital) information and informational tools, or
TII. More specifically, the chapter reports on translators interaction with the first
prototype of a biconcordancer (BiConc) integrated in the CASMACAT workbench.
The BiConc tool was introduced in the third CASMACAT field trial (CFT14),
a post-editing experiment conducted with seven English-to-Spanish professional
translators in a Madrid-based translation company.
Several questions motivate TII research: how well do human translators work
with the information and informational tools they currently have at their disposal?
How accurate, rich and relevant is the information they find? How user-friendly
are informational tools for translators? How can the information and the tools be
improved to maximize translators performance and well-being, and the quality
of their work? As far as the CASMACAT workbench is concerned, how can the
performance of the built-in BiConc tool be assessed and improved? What are the
advantages and drawbacks of integrating an informational tool to a translation
environment, as compared to having an array of external web-based resources?
These are some of the questions that motivated the present investigation, which
138
J. Zapata
remains exploratory given the scope and limitations of a pilot study and of the
Translation Data Analytics (TDA)3 project, and are partly dealt with in this chapter.
Overall, the chapter argues that human-centered research is essential not only in
the understanding of the cognitive processes involved in translation activity, which
is TPRs ultimate goal (Balling and Carl 2014; Jakobsen and Jensen 2008; Jakobsen
2003, 2011; OBrien 2009), but also in the development and the improvement of
tools intended to better address translators professional needs (Carl et al. 2011).
Thus, this case study and subsequent investigations in the same vein can be used
to inform the efficient integration of the BiConc tool and other informational
resources to the CASMACAT workbench and other future-generation translation
environments.
This pilot study was carried out within the framework of the TDA project held in July-August
2014 at the Centre for research and innovation in translation and translation technology (CRITT),
located at the Copenhagen Business School, in Denmark. The aim of the TDA project was to
explore and analyse translator-computer interaction data available in the CRITT TPR-DB in
an effort to assess and elaborate methods to produce data-driven user profiles, to investigate
differences in communication styles, and to identify patterns of user behavior for more and
less successful manmachine communication. The TDA project was supported by the European
Unions 7th Framework Program (FP7/2007-2013) under grant agreement 287576 (CASMACAT).
139
to (1) improve upon or enhance a given experience or (2) create a quiet different
experience than before (Harper et al. 2008, p. 58):
In both situations, initial research is conducted by learning more about peoples current
experiences [ : : : ]. Ethnographic studies, logging of user interaction and surveys are
commonly deployed. Based on the findings gathered, we begin to think about why, what,
and how to design something better. To aid the process, usability and user experience
goals are identified and conceptual models developed. Prototypes are built, evaluated, and
iterated, demonstrating whether the user goals have been met or whether the new user
experience is judged to be enjoyable, pleasurable or valuable by the target group.
Thus, usability studies combining tool use and translation processes are therefore
more than necessary in translation research, as OBrien (2012, pp. 116117) argues:
[TCI] would likely benefit from an increased focus on ethnographic-style, cognitive
ergonomic studies of both translation tools and the translation process itself [ : : : ]. More
experimental studies of translator-tool interaction could be carried out using formal usability
research methods such as screen recording, eye tracking, and observation, the results of
which could then be used by translation technology developers to improve the specifications
of tools for the benefit of translators and, ultimately, the end users of those translations.
As stated in the introductory section, the work described in this chapter aims
at proposing a new approach in empirical TPR, that is, the investigation of the
way translators interact with (digital) information and informational tools. Thus,
TII would complement OBriens notion of TCI. Furthermore, the idea that TII is
a larger discipline that encompasses TCI is put forward. Indeed, some HCI and
HII researchers argue that HII constitutes a larger discipline, since it looks beyond
computers. It focuses on the interaction between humans and the information in
the environment, in all its complexity, regardless of the tools used to facilitate
such interaction (Fidel 2012; Gershon 1995; Marchionini 2008); the computer just
happens to be one of the mediums that facilitate or mediate the interaction with the
information we need and produce. Humans have always been in constant interaction
with information, be it via machines or not. Our world is based on information
(Kurzweil 2013, pp. 23).
The study of the interaction between humans and information is not new.
However, with the advent of ICTs and, in particular, of the Internet, the field of
HII has become particularly popular within the research communities in computer
science and an array of other disciplines (Fidel 2012, pp. 1721). The massive influx
of mobile, Internet-connected devices has led humans to new ways of accessing
enormous quantities of information and services at any time and from practically
anywhere, making it necessary to investigate HII from every angle and every field,
and to strengthen HII as a multidiscipline.
Two research areas related to HII are particularly well grounded today, and
offer a great potential in empirical TPR: information retrieval (IR) and information
behavior (IB). The former investigates the models and mechanisms of (computer)
systems that allow or facilitate the retrieval of information. The latter examines
information research strategies, information evaluation criteria, and the modalities
and contexts of information use. In other words, while IR focuses on developing and
improving informational tools, IB investigates the ways of browsing the different
140
J. Zapata
According to Fidel (2012, p. 85) context is important because, even before carrying out any search,
it is context that shapes the informational needs, since the motivation to search for information is
not only cognitive, but also contextual.
The team of researchers listed below are to be acknowledged for their work on the CASMACAT
workbench and, in particular, for running the CFT14 experiment and providing us with the data
presented in this section: Vicent Alabau, Michael Carl, Francisco Casacuberta, Mercedes Garca
Martnez, Jess Gonzlez-Rubio, Bartolom Mesa-Lao, Philipp Koehn, Daniel Ortiz-Martnez, and
Moritz Schaeffer.
141
The principal goals of this field trial were: (1) to assess the benefits in terms
of productivity derived from introducing online-learning techniques; (2) to investigate how post-editors use informational tools during the post-editing process, in
particular the built-in BiConc tool; (3) to assess how professional reviewers use the
newly-introduced CASMACAT electronic pen functionalities while reviewing posteditors output; and (4) to collect feedback from reviewers using the electronic pen
as an additional input method for revision (ibid.).
All post-editors were freelance translators recruited by Celer Soluciones SL,
a Madrid, Spain-based translation company. Participants were 35 years old on
average. They were all regular users of language technologies in their day-today work. All participants but one had experience post-editing machine-translated
texts as a language service.6 More detailed data on the participants age, expertise,
education, etc., is available in the CRITT TPR database7 (metadata folder; see also
Hvelplund and Carl (2012) for a description).
Participants were all given the time to familiarize themselves with the CASMACAT workbench; some of them were using it for the first time. Likewise, in
order to ensure an equal distribution of texts and conditions across participants,
variables were counterbalanced from participant to participant.
To measure whether participants become faster when post-editing with interactive translation prediction and online learning techniques (goal 1 of this field trial),
task completion times and keystroke activity were measured and analyzed. Time
was measured using FDur, KDur and PDur values (see Chap. 2, Sect. 2.4.6, for a
definition of these values). In order to measure the productivity benefits derived from
introducing online-learning techniques during the post-editing process, the amount
of technical effort (i.e. the number of insertions and deletions needed to correct
the raw SMT output) was calculated for the two conditions. Keystroke activity
was measured by using Mdel values (i.e., number of manually generated deletions)
and Mins values (i.e., number of manually generated insertions). It is important to
make the distinction between manual and automatic insertions and deletions since
the interactive translation prediction functionality triggers a number of automatic
insertions and deletions that do not require any technical effort (i.e. typing activity)
from the post-editor (ibid,). Table 7.1 compiles the keyboard activity and production
time measures across participants.
Now, usability studies such as the CFT14 should take into account the
translation/post-editing process as a whole in order to control for any possible
confounding variables that may have an impact on the data. Results of the CFT14
(see Alabau et al. (2014)) (also reported in this volume; see Chap. 4) show in
particular that post-editors did not seem to be faster under the PIO condition.
However, a more in-depth qualitative analysis of the process data collected shows
Only participant 4 (P4) reported that she did not have any experience in post-editing. As it will
be seen in the Methodology section below, this does not have an impact on the results of the pilot
experiment reported in this chapter.
142
J. Zapata
Cond
P
PIO
P
PIO
P
PIO
P
PIO
P
PIO
P
PIO
P
PIO
PIO
P
P C PIO
Ins/ST char
0.88
0.73
0.85
0.66
0.45
0.47
0.54
0.67
0.63
0.45
0.51
0.40
0.68
0.41
0.54
0.65
0.60
Del/ST char
0.79
0.38
0.70
0.25
0.41
0.32
0.46
0.21
0.53
0.31
0.45
0.14
0.63
0.32
0.27
0.57
0.42
Fdur
469
467
418
572
420
579
657
517
331
325
704
433
530
444
476
504
490.43
Kdur
290
245
265
234
227
257
217
261
262
253
230
230
197
217
242
241
241.79
Pdur
138
117
129
105
71
95
112
142
132
120
84
88
63
75
106
104
105.07
that an explanation for this can be found in the participants information behaviour.
Actually, working with online-learning techniques was observed to have a positive
impact in terms of efficiency gains, but only when the time used by post-editors to
search information is not taken into account (ibid.). Thus, it is evident that overall
task completion times might not be a good indicator of performance when the posteditor needs to conduct informational searches to verify the quality of and improve
the SMT system output. Now, even though participants did not become faster in
terms of task times, their keyboard activity, as reflected in Mins and in particular
in Mdel values, shows that post-editors had to type less when post-editing with
interactivity and online learning techniques (condition PIO) as opposed to doing
traditional post-editing (condition P). This means that online-learning techniques
may help post-editors to save some effort during their work: Participants working
under the P condition deleted 0.65 keystrokes and inserted 0.57 keystrokes on
average per source text (ST) character. However, in the PIO condition, they inserted
0.54 keystrokes and deleted 0.27 keystrokes per ST character on average. Thus, a
comparison of keyboard activity in both conditions shows that there was a decrease
in the number of insertions and deletions in the PIO condition. Since both texts
were comparable in size and translation difficulty, this decrease in technical effort
(i.e., typing activity) must be attributed to the expected benefits of online-learning
techniques during the post-editing process. See also Chap. 3 for similar findings.
This being said, based on this data alone, one cannot explain the fact that
there were no significant benefits in terms of efficiency gains when overall task
times are considered. Preliminary observations of screen recordings of all postediting sessions pointed to the fact that participants often double-checked, in
various informational resspurces, solutions proposed by the SMT system, even when
143
7.2.2 Methodology
For the purposes of this pilot TII study, only the segments in which CFT14
participants used the CASMACAT BiConc tool were examined. By using the
BiConc, post-editors were able to retrieve information such as term equivalents and
collocations (see Fig. 7.1 in the introduction Sect. 7.1), which would guide them
in making an informed decision while solving a particular translation problem.
The BiConcs search results are sorted by their relative frequencies (i.e., the
most probable translations are shown first) based on the training data available in
CASMACAT.
Using the CFT14 log files (i.e., the event.xml files), a script using the Cygwin8
terminal was run to extract data about the post-editing segments where the BiConc
tool was used at least once. A total of 55 instances of BiConc use were found. For
each one, the script provided us with the following data: Event ID (i.e., information
on the participants identity, the text number and the post-editing condition; e.g.,
P01_P2 (meaning Participant 1, post-editing condition P, text 2); segment ID
(e.g., 10804); and token(s) searched (e.g., autonomic). With the segment ID in
hand, it was then possible to extract, from the CFT14 log files, the source segment
(i.e., the original segment in English), the raw SMT output, and the participants
final target (i.e., the final segment in Spanish after the entire project was saved). An
MS Excel spreadsheet was created to store and analyse these data. For each one of
the 55 instances found, the data was stored in columns as follows: Event ID, segment
ID, token(s) searched, source segment, raw SMT output, and final target segment.
The core of this pilot investigation was the examination of screen-capture
videos.9 Thanks to these videos, it was possible to observe and analyze the
way translators interacted with the BiConc tool (and other external informational
resources) in order to solve a given problem while post-editing those segments.
Additional columns were then added to the Excel spreadsheet to store data
such as information relevance (see Experimental results and analysis below); the
8
9
The videos are available in .fbr format in the following address: http://bridge.cbs.dk/field_trial3/
VIDEO/. While playing the files, it is necessary to forward the video to the specific segment being
analyzed. The segment ID can be seen on the left hand side of the CASMACAT user interface.
144
J. Zapata
external informational resources used, if any; and notes (i.e., other observations and
hypotheses, some of which are reported in Table 7.2 in the following section).
Fig. 7.2 Use count of BiConc tool per participant and per condition
10
By examining the videos, it was possible to notice that the BiConc tool was not accessible to P4
in neither condition (i.e. that the BiConc tool button did not appear on the CASMACAT interface).
The reason for this issue is unknown. Thus, only half of participants who had access to the tool
actually made use of it.
11
145
explanation for this difference, it is necessary not only to look into the experimental
design (see Sect. 7.2.1) but also to closely examine the screen recordings for P3s
post-editing process. P3 post-edited text 1 in PIO making use of the BiConc but with
few cases of successful information retrieval (see Fig. 7.3), which seems to have
affected her confidence in the BiConc when post-editing the second text (under the
P condition), where she still made a fair number of searches during the post-editing
task, but preferred external resources over the CASMACAT built-in BiConc tool.
In addition to the number of times post-editors actually used the BiConc, it was
also important to investigate the number of times such searches led to successful
cases of information retrieval. This can be associated with the concept of relevance,
extensively discussed in the HII literature.12 As pointed out by Fidel (2012, p.
26), the evaluation process is almost always necessary when retrieving information
(from digital information systems). Indeed, once information is acquired, a person
examines and evaluates that information to discern what is relevant (and what is not)
to the particular problem they are trying to solve.
Determining information relevance has been considered a monumental, complex
endeavour, primarily because the judgement of relevance can be both subjective and
dynamic (ibid., pp. 2732). As this challenge is being acknowledged, it is argued
that, as far as this chapter is concerned, the assessment of information relevance is
based merely on whether or not the information found in the BiConc tool by the
post-editor was the information used13 to solve the problem at hand (in other words,
if the information found was the information kept in the final target text, when the
entire project was being approved and saved).
12
While relevance has been mainly associated with the performance evaluation of information
systems, it has also been associated with the human processes that take place when people
determine how relevant a piece of information is, and the elements that shape these processes
(Fidel 2012, p. 27).
13
As it can be observed in the screen videos, post-editors may use the information found in
different ways: they can copy/paste it, or they could type it into the post-editing interface, for
instance.
146
J. Zapata
A close examination of the search queries and results (and final target segments)
reveals that the relevance rate varies among participants, with participants P1 and P7
having more relevant than non-relevant cases (see Fig. 7.3). On average, only 47 %
of the BiConc searches (26 out of 55) provided participants with satisfying results
(i.e., post-editors kept this information in the final target segment).
Furthermore, having a close look at the screen recordings, it can also be observed
that participants who used the BiConc also used other Internet resources, such as
term banks, dictionaries and corpora, to complement their information retrieval
efforts. In addition to the CASMACAT BiConc, P1, P3 and P7 also searched information using Google (a search engine); Linguee (a biconcordancer); WordReference
(a bilingual dictionary); and IATE and TermCat (terminology databases).
Remarkably enough, as shown in Fig. 7.4, for cases where the BiConc was used,
the most frequently used external resource was another biconcordancer tool (i.e.,
Linguee), which was also observed to be extensively used by all CFT14 participants
throughout the entire experiment.14 These results reveal the extent to which todays
professional translators favour this type of parallel bilingual resource over any other
type of tool (e.g., dictionaries or term banks), complementing thus the results of
some recent studies in the same vein (cf. Simard (2013); Daems et al., Chap. 6 in this
volume). Hence, it can be concluded from this pilot investigation alone that online
biconcordancers need to be constantly enhanced and improved, and integrated more
tightly into translation environments.
It is worthwhile noting at this point that quantitative data (e.g., use counts and
relevance measurements) may not be sufficient to investigate the TII process and
14
This observation is based on screen recording examinations (i.e., by looking at the videos it was
possible to observe that this particular resource was extensively used by all post-editors throughout
the experiment). However, no exact figures on the total use of external resources in the CFT14
are available. Logging software such as Inputlog (Leijten and Van Waes 2013) will be included in
future investigations.
147
Table 7.2 Some information behaviour noted during the BiConc tool use analysis using screen
recordings
1. Having found a useful solution in the BiConc, the translator uses external resources to
double-check information; results turn out to be the same
2. Although a good solution was proposed by the BiConc, the translator opts for a solution
coming arguably from their cognitive background
3. Having searched for a term in both the BiConc and an external resource, with no results, the
translator opts to leave notion implicit
4. The BiConc was used only after a query in an online bilingual dictionary yielded no results
5. When typing a good solution provided by the BiConc, the interactive post-editing system
automatically inserted another good solution. The translator opts to keep the latter
6. A good solution was found in the BiConc, but translator made an adaptation, based on the
information found
7. The translator copied/pasted a solution from BiConc. The font format from the BiConc
(type, size and color) was kept in the text field; then, the translator opened a text editor to
convert text into plain text, and copied/pasted it again into the CASMACAT environment
8. The translator searched both in the BiConc and in external resources, but solution provided
by the BiConc was preferred
9. The BiConc took several seconds to display results; the translator could not wait and
searched in an external resource; when returning to the CASMACAT workbench, the results of
the query in the BiConc was being shown and turned out to be the same as found in the external
tool
15
This behaviour can be attributed to one, two or all three participants who made use of the BiConc
tool.
148
J. Zapata
Hornbk (2006)) (see observations 5, 7 and 9, for instance) and browsing the
different sources of information (observation 1), and on why a certain piece of
information appears to be relevant or not (observation 8)? Can we learn, through
interview questions, why does the post-editor prefer some tools and resources over
others (observation 4 in Table 7.2, and Fig. 7.4), or why would they prefer making
an adaptation, or why is a piece of information inappropriate to solve a given
problem (observation 6)? Hornbk (2006) describes how different sources of data
and an analysis of the relations between the different aspects of usability (efficiency,
effectiveness and user satisfaction), and between subjective and objective measures,
could provide a wider picture of the usability and the quality-in-use of a system or
a systems feature.
For the purposes of this TII study, data triangulation would have been ideal,
but was not possible given the scope and limitations of a pilot experiment and of
the TDA project, as stated in the introductory section. In future experiments, these
combined observations and further analyses will inform researchers, for instance,
about certain preferences of individuals or about the cognitive processes involved
in translation and information-retrieval tasks, or about technical problems with the
workbenchs user interface (see observation 7 in Table 7.2) or with the system as a
whole (see observation 9).
Lastly, it would be very appropriate, from a usability point of view, to design
and carry out longitudinal studies where the learning effect over a period of time
could be observed. Indeed, a longitudinal study with the CASMACAT workbench
was carried out before the built-in BiConc tool was introduced (see Chap. 5 in this
volume) and showed that over time post-editors become faster when using ITP. It
would be interesting to conduct further studies of this kind to investigate how the
interaction with the BiConc and other information tools can change over time, how
long it takes for a user to get fully acquainted with a given tool or with a given
feature of a tool, or if there is a possible trade-off between different features of a
system (e.g., it would be interesting to observe if a tight integration of information
tools into a translation environment and an acquaintance with the tools by the user
after a certain period of time can increase the benefits of the ITP feature in terms of
efficiency gains).
Having discussed the results of this pilot investigation and formulated a few areas
for future work, let us now point towards new directions in TII research.
149
16
Cognitive psychology studies have shown that some cognitive functions such as reading, learning
and memorizing are affected by the (intensive) use of the Internet. In fact, people will turn to
a search engine to search answers to even the simplest question. Just knowing that a piece of
information is readily available anywhere and anytime leads humans not to memorize it (Duval
2012).
150
J. Zapata
7.5 Conclusion
In this chapter, the notion of translator-information interaction (TII) was introduced
as the field of study that investigates translators interaction with information,
complementing thus Sharon OBriens notion of translator-computer interaction
(2012). To illustrate TII research, the chapter reported on a pilot study examining
translators interaction with a prototype biconcordancer (BiConc) tool integrated
in the CASMACAT workbench during the third CASMACAT field trial (CFT14).
A systematic analysis of such interaction was possible through screen recording
observations, which allowed to look well beyond the data provided by the CFT14
log files alone. This investigation was nonetheless of exploratory nature given the
scope and limitations of a pilot study and, even more importantly, the complexity
inherent to TII research. The study of the interaction between humans and information is complex because it implies considering every element and every aspect of
the informational work: the interaction process and the changes that result from
that interaction at the level of the individuals searching for information and the
tools or systems used to retrieve the information (Marchionini 2008, p. 171). It is
also worth considering a possible interplay between the information provided by
the various tools and the translators cognitive background (i.e., their knowledge).
The translator looks for a given piece of information they do not know or they are
uncertain about. Now, when judging the quality of a suggestion by the system, trust
(i.e., trust in oneself) may also play a significant role. In other words, as observed
in the behaviour described in Table 7.2 in Sect. 7.3, the interaction between the
post-editors cognitive background and the information provided by the tools is
potentially an interaction of trust.
With this chapter, several research questions for future TII research were raised:
how well do human translators work with the information and informational tools
they currently have at their disposal? How accurate, rich and relevant is the information they find? How user-friendly are informational tools for translators? How can
the information and the tools be improved to maximize translators performance
and well-being, and the quality of their work? How can the performance of an
existing tool be assessed and improved? What are the advantages and drawbacks
of integrating an informational tool to a translation environment, as compared to
having an array of external resources? These questions can only be partly dealt with
in a pilot investigation like the one described here. Only a larger-scale study with
a larger sample size and combining different sources of data can provide a wider,
and potentially better, picture of the TII processes and the usability of information
systems and tools.
From this exposition, it may be concluded that TII studies, however complex they
are, will be essential in the development and the improvement of tools intended to
better address the needs of translators at the digital age. In the words of, Carl et al.
(2011),
151
In the age of translation technology, mobile computing and ubiquitous information, research on TII will become increasingly important in empirical TPR.
Behavioural studies that explore information interaction will play a crucial role
in the design and development of new tools that are user-friendly and adapted
to translators informational needs and to the changing reality of the translation
industry.
References
Alabau, V., Carl, M., Garca-Martnez, M., Gonzlez-Rubio, J., Mesa-Lao, B., Ortiz-Martnez, D.,
et al. (2014). D6.3: Analysis of the third field trial. Technical report, CasMaCat project.
Aubert, B., Cohendet, P., & Montreuil, B. (2010). Linnovation et les technologies de linformation
et des communications. Qubec: CEFRIO.
Austermhl, F. (2001). Electronic tools for translators. Manchester: St. Jerome.
Balling, L. W., & Carl, M. (2014). Production time across languages and tasks: A large-scale analysis using the CRITT translation process database. In J. W. Schwieter & A. Ferreira (Eds.), The
development of translation competence: Theories and methodologies from psycholinguistics
and cognitive science (pp. 239268). Newcastle Upon Tyne: Cambridge Scholars.
Borja, A. (2008). Corpora for translators in Spain. The CDJ-GITRAD corpus and the GENTT
project. In M. Rogers & G. Anderman (Eds.), Incorporating corpora: The linguist and the
translator (pp. 243265). Clevedon: Multilingual Matters.
Bowker, L. (2002). Computer-aided translation technology: A practical introduction. Ottawa:
University of Ottawa Press.
Carl, M., Dragsted, B., & Jakobsen, A. L. (2011). On the systematicity of human translation
processes. In Actes de la confrence de Tralogy. Retrieved from http://lodel.irevues.inist.fr/
tralogy/index.php?id=103
Duval, C. (2012). Limpact du Web en 4 questions. La Recherche, 467(September), 4650.
Fidel, R. (2012). Human information interaction. An ecological approach to information
behaviour. Cambridge, MA: MIT Press.
Gershon, N. (1995). Human Information Interaction. In 4th international world wide web
conference.
Harper, R., Rodden, T., Rogers, Y., & Sellen, A. (2008). Being human: Human-computer
interaction in the year 2020. Cambridge: Microsoft Research.
Hornbk, K. (2006). Current practice in measuring usability: Challenges to usability studies and
research. International Journal of Human Computer Studies, 64(2), 79102.
Hvelplund, K. T., & Carl, M. (2012). User activity metadata for reading, writing and translation
research. In V. Arranz, D. Broeder, B. Gaiffe, M. Gavrilidou, M. Monachini, & T. Trippel (Eds.), Proceedings of the eighth international conference on language resources and
evaluation. LREC 2012: Workshop: describing LRs with metadata: Towards flexibility and
interoperability in the documentation of LR (pp. 5559). Paris: ELRA.
17
Knight, A., Pyzark, G. & Green, C. (2007). When two methods are better than one: combining
user study with cognitive modeling. In CHI07 proceedings (pp. 17831788).
152
J. Zapata
Jakobsen, A. L. (2003). Effects of think aloud on translation speed, revision and segmentation. In F.
Alves (Ed.), Triangulating translation. Perspectives in process oriented research (pp. 6995).
Amsterdam: Benjamins.
Jakobsen, A. L. (2011). Tracking translators keystrokes and eye movements with Translog. In C.
Alvstad, A. Hild, & E. Tiselius (Eds.), Methods and strategies of process research. Integrative
approaches in translation studies (pp. 3755). Amsterdam: Benjamins.
Jakobsen, A. L., & Jensen, K. T. H. (2008). Eye movement behaviour across four different types
of reading task. Copenhagen Studies in Language, 36, 103124.
Karamanis, N., Luz, S., & Doherty, G. (2011). Translation practice in the workplace: A contextual
analysis and implications for machine translation. Machine Translation, 25(1), 3552.
Kurzweil, R. (2013). How to create a mind. The secret of human thought revealed (p. 336).
New York: Penguin.
Langlois, L. (1996). Bilingual concordancers: A new tool for bilingual lexicographers. In Proceedings of the 2nd international conference of the american machine translation association.
Retrieved from http://mt-archive.info/AMTA-1996-Langlois.pdf
LeBlanc, M. (2013). Translators on translation memory (TM). Results of an ethnographic study
in three translation services and agencies. The International Journal for Translation and
Interpreting Research, 5(2), 113.
Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using inputlog to
analyze and visualize writing processes. Written Communication, 30(3), 358392.
Marchionini, G. (2008). Human-information interaction research and development. Library and
Information Science Research, 30(3), 165174. Retrieved from http://www.ils.unc.edu/~march/
Marchionini_Inf_interact_LISR_2008.pdf.
OBrien, S. (2009). Eye tracking in translation-process research: Methodological challenges and
solutions. Copenhagen Studies in Language, 38, 251266.
OBrien, S. (2012). Translation as humancomputer interaction. Translation Spaces, 1(1), 101
122. doi:10.1075/ts.1.05obr.
Shneiderman, B. (2008). Foreword. In A. Sears & J. A. Jacko (Eds.), The human-computer
interaction handbook: Fundamentals, evolving technologies and emerging applications (2nd
ed., pp. xixxx). New York: Lawrence Erlbaum Associates.
Simard, T. (2013). Analyse comparative de la couverture et de lacceptabilit des solutions
dun dictionnaire bilingue spcialis, dune banque de donnes terminologiques et dun
concordancier en ligne: application au domaine de la traduction mdicale. University of
Ottawa. Retrieved from http://www.ruor.uottawa.ca/fr/handle/10393/24929
Taravella, A. M., & Villeneuve, A. O. (2013). Acknowledging the needs of computer-assisted
translation tools users: The human perspective in human-machine translation. The Journal of
Specialised Translation, 19(January), 6274. Retrieved from http://www.jostrans.org/issue19/
art_taravella.pdf.
Tour, M. A., Mbangwana, M., & Sne, P. A. (2009). Que sont les TIC : Typologie des outils et des
systmes. In T. Karsenti (Ed.), Intgration pedagogique des TIC en Afrique. Stratgies daction
et pistes de rflexion (pp. 3356). Ottawa: CRDI.
Part III
Chapter 8
Abstract Advanced translation workbenches with detailed logging and eyetracking capabilities greatly facilitate the recording of key strokes, mouse activity,
or eye movement of translators and post-editors. The large-scale analysis of the
resulting data logs, however, is still an open problem. In this chapter, we present
and evaluate a statistical method to segment raw keylogging and eye-tracking
data into distinct Human Translation Processes (HTPs), i.e., phases of specific
human translation behavior, such as orientation, revision, or pause. We evaluate the
performance of this automatic method against manual annotation by human experts
with a background in Translation Process Research.
Keywords Computer-aided translation Computer-assisted translation Postediting Quantitative data analysis Translation processes Unsupervised
sequence modelling
8.1 Introduction
8.1.1 Background
Krings (2001, p. 24) once described the construction of a machine translation
system capable of translating as well as a human being as being more difficult to achieve than mans conquest of the moon. Nevertheless, state-of-the-art
machine translation (MT) systems have nowadays reached a level of quality where
their incorporation into human translation workflows significantly increases the
productivity of professional translators in a post-editing (PE) set-up, where bilingual
experts revise MT output rather than translate from scratch (Green et al. 2013). The
S. Lubli ()
School of Informatics, The University of Edinburgh, Edinburgh, UK
Autodesk Development Srl, Neuchatel, Switzerland
e-mail: samuel.laubli@autodesk.com
U. Germann
School of Informatics, The University of Edinburgh, Edinburgh, UK
e-mail: ugermann@inf.ed.ac.uk
Springer International Publishing Switzerland 2016
M. Carl et al. (eds.), New Directions in Empirical Translation Process Research,
New Frontiers in Translation Studies, DOI 10.1007/978-3-319-20358-4_8
155
156
157
Keystroke
Mouse Click
Fixation (Source)
Fixation (Target)
04:30
05:00
Fig. 8.1 Illustration of the basic aim of our work: learning sequence classification models based
on keylogging and eye tracking data to automatically segment recorded translation sessions into
sequences of human translation processes (shaded areas; this illustration is not based on actual
data)
sessions, and (2) segment recorded translation sessions into sequences of the
identified processes, thus revealing when and how often a translator executed each
of them, as illustrated in Fig. 8.1. This information will enable Translation Process
researchers to identify new activity patterns and regularities in massive amounts of
data.
8.1.2 Approach
In line with current TPR, we assume that when translating or post-editing, human
translators go through a sequence of distinct, latent, high-level Human Translation
Processes (HTPs), such as reading the source text, reading a draft translation,
revising the draft translation, etc. Translators jump back and forth between HTPs,
but execute only one at any given time. Which one that is, cannot be observe directly.
However, each process manifests itself in a characteristic pattern of observable
behaviour: key strokes, mouse activity, eye movement, etc. (Carl and Jakobsen 2009;
Carl 2010).
For modeling purposes, we assume that the probability of executing a particular
HTP next is fully determined by the current HTP, so that we can model HTPs
as states in a first-order Hidden Markov Model (HMM), and their characteristic
patterns of observable behaviour as their emissions.
Once a Hidden Markov Model (HMM) of the overall translation or post-editing
process has been trained (Sect. 8.4.2), we use the Viterbi algorithm (Viterbi 1967)
to segment and annotate raw translation activity logs with HTP labels.
Our claim is that the method proposed here makes it possible to infer meaningful
translation processes from unlabelled keylogging and eye tracking data automatically.
158
8.2 Foundations
8.2.1 MT Research and Post-editing
While post-editing of automatically produced draft translations is becoming more
and more wide-spread as a standard mode of operation in professional translation,
post-editing does not work equally well for everyone. Plitt and Masselot (2010), for
example, found that post-editing increased the translation throughput by 131 % for
their fastest, but by only 20 % for their slowest translator. Similar studies (Guerberof
2009; Green et al. 2013) also report considerable variance in the benefits of postediting over translation from scratch (with a positive effect on average).
A number of factors have been identified that might contribute to this variance:
translation direction (Plitt and Masselot 2010), translators professional working
experience (Plitt and Masselot 2010; Green et al. 2013), text types (Lubli et al.
2013), text difficulty (Green et al. 2013), and MT quality (Koehn and Germann
1
We acknowledge that this view is not shared by everyone. For example, Moritz Schaeffer
(personal communication) argues that post-editing should be considered a task/skill distinct from
translation per se. Not every effective translator is an effective post-editor, and vice versa.
159
2014), and others. But as both MT and industrial research are primarily interested
in whether and how much rather than why and how post-editing accelerates the
translation process with a certain MT system or in a particular working environment,
the impact of the aforementioned factors on the actual performance of post-editing
is hardly ever investigated in detail. As a result, surprisingly little is known about
why post-editing is faster than translation from scratch, and why certain translators
benefit more from it than others.
With regard to exploring the effects of post-editing in more detail, the aforementioned studies suffer from their focus on post-editing time alone. In a study
that investigated the cognitive effort involved in post-editing, Koponen et al. (2012)
recorded not only the time, but also the keystrokes of eight post-editors. They
conclude that, while post-editing time in relation to segment length can be a good
indicator of cognitive effort, recorded keystrokes are very useful as a way to
understand how translators work, and that studying the strategies of different posteditors can be potentially very useful for post-editing practice. Here, MT research
comes into contact with translation process research.
Temporal
Post-Editing Effort
Cognitive
Post-Editing Effort
Technical
Post-Editing Effort
160
Also referred to as thinking-aloud protocols (Toury 1995) and talk-aloud protocols (Gerloff 1986).
aquavit
964
overcome
962
with
960
perhaps
_
el
_l
e
_
o
c
n
_
o
c
n
_
p
e
a
u
vzsrtl
pr p pus
e
a
.
quavit
po_
relaae
l
por
ve nddi cid
i o_
vencido
tal_vez
silla
su
958
sil
a
_
s
u
_
a
e
c
_r
e
o
m
b
h
48
gn
_:
Se U
off
falls
se_cae_de
man
A
el_aquavit
_,
956
his
_r.ti
g
e
a
u
n
d
954
chair
_ _ _ela
_
g
e
a
u
nrt.i
d
161
2372000
se_
_de
se_cae_de
hombre
Un
2378000
2384000
2390000
2396000
2402000
2408000
Fig. 8.3 Translation progression graph of an excerpt from a recorded post-editing session
and PUs: although using a certain threshold can be motivated empirically (Carl and
Jakobsen 2009; Carl and Kay 2011), any fixed threshold does not take variance
between individual translators into account. The PU threshold to identify coherent
sequences of writing, for example, would need to be higher for slower than for faster
typists.
Apart from analysing FUs and PUs quantitatively, many TPR studies are based on
qualitative analyses of visualised TAD. In particular, translation progression graphs
(Perrin 2003) are used to visualise text insertions and deletions, eye fixations, and
also FUs and PUs in recorded translation sessions over time. Aggregations and
visualisations enable TPR experts to identify and characterise human translation
processes (HTPs) in TAD, but the use of translation progression graphs in particular
limits TPR studies to small subsets of the available data. The graph shown in
Fig. 8.3, for example, visualises 36 s from a post-editing session with a total
duration of 45 min. Furthermore, one could argue that the visual analysis of
translation progression graphs by an expert involves a considerable amount of data
interpretation and is thus by no means objective. Nevertheless, TPR studies based on
analyses of FUs and PUs and/or progression graphs have yielded valuable findings
about the processes involved in human translation and post-editing (see for example
Hvelplund 2011; Dragsted and Carl 2013; Massey and Ehrensberger-Dow 2014;
Schaeffer and Carl 2014, to name but a few).
Martnez-Gmez et al. (2014) proposed the first unsupervised approach to
translator modelling, showing that automatically identified activity patterns in
the TAD of recorded translation sessions can be used to predict indicators of a
translators expertise.
In the remainder of this chapter, we will show that activity patterns inferred
through unsupervised learning can not only be used to make predictions, but also
162
CASMACAT is based on the web-based M ATECAT workbench, which is deployed and actively
used in production at several translation and IT companies (Federico et al. 2014).
163
164
8.3.3.1 Annotators
We engaged 12 participants (six male, six female) in the annotation task. Participants
were aged between 22 and 38 years (mean: 27.8) and pursued a masters (7) or
Ph.D. (5) degree. All of the participants were familiar with at least the foundations
of translation and post-editing processes. Eleven annotators stated that their previous (8) and/or current (11) degree programme was related to translation process
research or translation studies, and six had experience working as professional
translators. Participants were reimbursed EUR 40.00 for classifying 825 segments
and completing two short surveys, which took them roughly 3.5 h in total.
8.3.3.3 Tagset
All annotators were asked to classify each snippet as one of the following six
HTPs:
Orientation: source text (Os)
The translator is reading without inserting or deleting text, mainly focussing on
the source text.
Orientation: target text (Ot)
The translator is reading without inserting or deleting text, mainly focussing on
the target text.
Orientation: source and target text (Ost)
The translator is reading without inserting or deleting text, focussing on both the
source and the target texts.
Revision: linear (Rl)
The translator is editing the target text. Every word is edited only once, in linear
order.
Revision: scattered (Rs)
The translator is editing the target text. Some source words are edited several
times.
165
Pause (P)
The translator is idle, e.g., because he or she is waiting for the interface to
respond.
This tagset was designed to characterise different phases of the post-editing
process and is described in detail in Chap. 14. We chose it mainly due to its
clarity and simplicity. Krings (2001) and others (cf. Sect. 8.2.2) provide more finegrained taxonomies to characterise post-editing processes, but we considered them
to be less suitable for the classification task at hand since the differences between
class definitions are often very subtle. For example, Krings distinguishes between
reading an entire text or sentence (SOURCE/READ) and giving direct attention to
an element (usually a word; SOURCE/FOCUS) of the source text (Krings 2001,
p. 514f). Using the six classes defined by Oster, we were hoping to ensure that
annotators easily understand the classification task as well as reproduce their own
annotations, which was confirmed by a post-experimental survey and (mostly) high
intra-annotator agreements (cf. Sect. 8.3.3.4), respectively.
(8.1)
166
Table 8.1 Intra-annotator
agreement () in the
CFT14-Gold dataset over
three and six HTP classes
3 Cl.
0.94
0.95
0.83
0.80
0.86
0.84
0.79
0.85
0.76
0.73
0.64
0.54
6 Cl.
0.89
0.87
0.86
0.68
0.67
0.65
0.64
0.59
0.59
0.56
0.42
0.31
in screen recordings of post-editing sessions: orientation (O), revision (R), and pause
(P).
Table 8.1 lists the intra-annotator agreements for each participant based on 100
doubly-classified snippets. The scores indicate that (1) most annotators were rather
consistent in labelling the snippets using the full tagset, with A6 and A8 being
the exceptions, and (2) thatnot surprisinglynearly all annotators performed
better when only three basic classes were distinguished. Apparently the differences
between linear (Rl) and scattered (Rs) revision, and between the three subclasses of
orientation (Os, Ot, and Ost) are too subtle for human annotators to consistently tell
them apart.
This conjecture is supported by the inter-annotator agreement scores shown in
Table 8.2. Annotators showed high agreement in distinguishing between the three
basic classes O, R, and P. Among these, they agreed least on identifying pauses (P),
but even here, nearly nine out of twelve annotators agreed on average (:
N 0.73).
In contrast, the fine-grained distinctions between the subclasses of orientation and
revision were a lot more controversial. This holds for scattered revision (Rs; D
0:15), but also for linear revision (Rl; D 0:43) and, to a lesser extent, source
text orientation (Os; D 0:49). The scores indicate that it is particularly difficult
to differentiate between scattered and linear revision. This was confirmed by the
annotators in a post-experimental survey.
167
Table 8.2 Number of snippets, inter-annotator agreement (Fleiss ), and mean agreement with
gold standard label () per class with three and six classes
Three classes
202
0.65
0.87
452
0.76
0.93
P
All
46
700
0.51
0.68
0.73
0.90
Six classes
Os
Ot
Ost
Rl
Rs
P
All
#
16
81
104
406
41
51
700
0.49
0.54
0.70
0.43
0.15
0.51
0.45
0.75
0.79
0.81
0.72
0.62
0.71
0.73
8.4.1 Observations
The HTPs characteristic patterns can be described in terms of the (1) type,
(2) frequency, and (3) combinations of observable actions that they trigger. For
example, orientation phases typically comprise multiple eye fixation and few if
any keystrokes (the latter e.g. for navigation); revision involves fewer or shorter
eye fixations and more keyboard activity. Moreover, the co-occurrence of actions
also provides hints at what a particular observation was triggered by. Pressing the
backspace key within a sequence of many alphanumeric keystrokes, for example,
suggests that the translator is currently probably drafting. If the backspace key is
however pressed once or twice after a mouse click, this suggests that the translator
is currently revising.
In order to capture such patterns of co-occurrence, we slice the translation log
evenly into short time windows and count for each event type the number of
occurrences in each time interval. Each time window can thus be represented as
Of course we could ask them, but that would interrupt precisely those mental processes that we
want to eavesdrop on and force the translator to reflect on what might otherwise be a subconscious,
automatic process. This is one of the main arguments against think-aloud experimental protocols.
168
169
a
Keystroke
Mouse Click
Fixation (Source)
Fixation (Target)
04:30
05:00
b
Keystroke
Mouse Click
Fixation (Source)
Fixation (Target)
04:30
05:00
Parametrise the recorded translation sessions into time windows of equal length.
c
Keystroke
Mouse Click
Fixation (Source)
Fixation (Target)
3 13 3
4 3 13 14
1
3 1
2
5
1
11 13
3 2 14 7 5 16 8 11 12 2 14 18
04:30
05:00
Count the number of occurrences per action type in each window. The counts in each window form an
observation, i.e., each column corresponds to a feature vector.
Fig. 8.4 The feature extraction process. Note that the keystroke event types (deletion, control,
navigation, and alphanumeric) have been folded into a single event type in this illustration. (a)
Define a subset of relevant action types. (b) Parametrise the recorded translation sessions into time
windows of equal length. (c) Count the number of occurrences per action type in each window.
The counts in each window form an observation, i.e., each column corresponds to a feature vector
170
HTP H0
HTP H1
E
S
HTP H2
In fact, a mixture of Poisson distributions would have been the appropriate choice here, as the
action counts are not continuous but discrete data. The mixture model approach allows us to
better fit the asymmetrically distributed data with the symmetric distributions such as the Poisson
distribution, because of the skewedness of the actual data. An even better option would be to
use more general two-parameter models such as the Conway-Maxwell-Poisson distribution, which
allows a better fit to heavy or thin tails in the distribution (see Shmueli et al. 2005 for details on the
Conway-Maxwell-Poisson distribution).
Label (HTP)
Keystroke
Mouse Click
FixationST
FixationT T
Time
171
H0 H0 H0 H1 H1 H1 H0 H1 H1 H1 H1 H0
0
2
11
3
0
0
13
2
0
0
0
14
3
0
1
7
13
0
0
5
04:00
3
0
0
16
0
0
5
8
04:30
4
1
0
11
3
0
0
12
13
0
0
2
14
3
0
14
0
1
0
18
05:00
Fig. 8.6 Segmenting translator activity data (TAD) into human translation processes (HTPs) based
on a GMM-HMM model learnt from unlabelled data
The transition probabilities and observation probability densities are then trained
with the EM algorithm (Baum 1972; Dempster et al. 1977).8 Good settings were
determined by grid search over a range of reasonable parameter settings (cf.
Sect. 8.5.1).
8.5 Experiments
The automatic segmentation and labelling procedure described here was evaluated
against the performance of human experts in an annotation task whose objective it
was to annotate excerpts of translation logs with conjectures about the underlying
HTP (Sect. 8.5.2). In order to do so, we first had to determine good settings for the
8
We have implemented the modelling approach described in this chapter in segcats, available at http://github.com/laeubli/segcats. The clustering and EM algorithms are based on
scikit-learn version 14.1 (see Pedregosa et al. 2011, and http://scikit-learn.org/stable/).
172
173
8.5.1.3 Findings
Our analysis of the impact of the e, w, k, and m parameters on the likelihood of the
training data (LL) can be summarised9 as follows:
Shorter window lengths w generally increase LL. The best models are learnt
from recorded translation sessions that have been parametrised into w D 500 ms
segments.
There is no clear impact of the set of monitored actions e on LL. An interesting
finding is that while using eye tracking data (E30 vs. E10 or E20 ) increases LL with
short window lengths, it has the opposite effect with longer window lengths.
In general, LL increases sharply with the number of HMM states n up to n D 4
or 5, then increases moderately up to n D 7, and remains relatively stable with
n >D 7. The best models have nine or ten HMM states.
Models with fewer than four GMM components m per HMM state perform
considerably worse than models with four or more components. This finding
is very consistent across models with different parameter values for e, w, and n.
The best models in terms of LL have between seven and ten GMM components.
The model that scored best in terms of LL overall was trained on TAD segmented
into time windows of length w D 500 ms, considering keystrokes, mouse clicks, and
eye fixations (e D E30 ). It has k D 10 HMM states with m D 8 GMM components
each, but similar results were achieved with k D 8 or 9 and m D 710.
174
We therefore aggregated the assignments and compared how often experts and
models assigned the O, R, and P labels in each of the seven excerpts in CFT14Gold. In other words, we did not assess how well experts and models classified
what a post-editor was doing at a certain time, but rather how exactly they estimated
the total time that the post-editor spent on the orientation (O), revision (R), and
pausing (P) HTPs in the whole excerpt (5 min). We compared the number of 3 s
long segments per HTP in the gold standard to the labels assigned by individual
experts and models, measuring the root mean square error (RMSE) for each of the
HTPs (classes), i.e.
v
u n
u1 X
RMSEx D t
.Oxt xt /2 ;
n t
(8.2)
(8.3)
RMSEw:avg: is the sum of all RMSE per class weighted by their relative class
frequency f in the gold standard:
RMSEw:avg: D . fO RMSEO / C . fR RMSER / C . fP RMSEP /
(8.4)
8.5.2.2 Data
The experiment was based on the logs of the aforementioned seven post-editing
sessions in the CFT14 dataset (Sect. 8.3.2). The model was trained on the complete
raw TAD from these seven sessions. The test data consisted of seven excerpts (5 min
each) from the same seven post-editing sessions that were annotated manually, as
described in Sect. 8.3.3.
Relative Frequency
175
H0
H1
H2
H3
H4
H5
H6
H7
H8
H9
Session time (s) ... 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 ...
Model output
... H2 H2 H5 H7 H1 H1 H3 H3 H3
H3
H1
H3
H3
H6
H3
H4
H1
Mapping
... R
Aggregation
...
...
...
0.2
0.4
Relative Frequency
H9 ...
0.0
Mapped HTPs
Fig. 8.7 Mapping of automatically learnt HTPs (H0 H9 ) to orientation (O), revision (R), and
pause (P) in Session_G.csv from the CFT14-Gold dataset. The session was automatically
labelled with H0 H9 . The mapping of these states to O, R, and P was defined manually (cf. Fig. 8.8)
we manually mapped each of the models ten HTPs10 to either the orientation (O),
revision (R), or pause (P) HTP from the TPR tagset described in Sect. 8.3.3.3. This
process is illustrated in Fig. 8.7.
The mapping was based on visual analysis of the probability densities for the
observable translation actions in each HTP. If the probability mass for alphanumeric
and/or deletion keystrokes in an automatically learnt HTP was centered clearly
above zero, we tagged it as an instance of revision (R). Otherwise, we tagged
it as orientation (O), unless the probability mass for all observable actions was
centered around zero, in which case we tagged it as pause (P). Figure 8.8 shows
the probability densities for all actions in the first HTP of the model, which we
mapped to the revision (R) HTP according to the aforementioned guidelines.
With this mapping of automatically learnt HTPs to the HTPs used to describe
the post-editing process in TPR, we were able to compare the number of segments
classified as O, R, and P by the model to the number of segments classified as O,
10
Recall that each HMM state corresponds to an automatically learnt HTP in the models learnt
through the unsupervised sequence modelling approach proposed in Sect. 8.4.
176
8 10
Density
-1.0
0.00
4
8 10
8 10
mouseDown
fixationSource
8 10
Density
0.0
0.2
Density
-1.0
4
Observation
0.4
keyDownNav
0.0 0.5 1.0
Observation
0.4
Observation
0.2
0
Observation
0.0
Density
0.10
Density
0.06
Density
0.02
0.6
keyDownDel
0.20
keyDownNormal
8 10
Observation
8 10
Observation
0.06
0.00
Density
0.12
fixationTarget
8 10
Observation
Fig. 8.8 Probability densities for observable translation actions in the first HTP (H0 ) of the bestperforming model trained on the CFT14 post-editing sessions
177
8.5.2.4 Results
Table 8.3 shows the RMSE values for all human annotators (A112) and the bestperforming statistical model (M) against the gold standard, ordered by total RMSE
(summed deviation for O, R, and P; lower is better). The best-scoring annotators
classifications (A3) deviate, on average, by 2.13 segments per class from the gold
standard. As each segment corresponds to 3 s of a post-editing session, this value
can be interpreted as follows: if asked how long a translator spends on orientation
(O), revision (R), and pause (P) in a given excerpt of 5 min (i.e., 300 s), annotator
A2s predictions will deviate by 6.39 s (2.13 segments 3 s) per class from the gold
standard on average; more precisely, his or her predictions for the time spent on
orientation will deviate by 10.02 s (3.34 segments 3 s), by 4.8 s for revision (1.6
segments 3 s), and by 6.21 s (2.07 3 s) for pause.
In direct comparison with the 12 human annotators, the statistical model (M)
ranks 11th out of 13. It is remarkable that two annotators performed worse than the
statistical model (M) in this evaluation. Even when we concede that there may have
been external reasons for their poor performance (distraction, lack of time or lack
of commitment), we have to keep in mind that all annotators had prior experience in
TPR, and that the type of annotation they were asked to produce is closely related
to core skills in TPR: interpreting translation logs. Moreover, as they were shown
video replays of the translation process as monitored (cf. Sect. 8.3.3.2), they had
a lot more information at their disposal than the statistical models: annotators saw
the actual source and target texts, the directions and durations of successive eye
fixations, etc. The statistical model, in contrast, bases its classifications solely on
the number of keystrokes, mouse clicks, and eye fixations in isolated segments of
500 ms and the immediately preceding HTP (by virtue of the first-order Markov
Model), and is very limited in the scope if its model. Unlike humans, it has a very
Table 8.3 Root mean square
error (RMSE) per class and
annotator (A112) or model
(M) in number of 3 s long
segments
A3
A4
A9
A1
A7
A5
A12
A11
A2
A6
M
A8
A10
P
2:07
3:74
4:00
2:10
4:74
5:74
1:36
7:06
6:14
5:84
6:22
4:33
6:55
RMSE total
Sum
W. Avg.
7:01
2:13
9:37
2:77
9:44
2:56
9:52
3:53
12:55
3:23
12:98
3:50
15:57
6:71
16:58
4:37
16:96
6:34
17:14
5:66
22:68
7:50
25:08
9:72
34:22 13:98
178
limited notion of context and cannot remember or aggregate over time what the
post-editors were doing.
The inter-annotator agreement (see Table 8.2) indicated that human expert
annotators by and large can agree on the classification of segments. The fact that
the model under evaluation is within the range of performance of human annotators
is strong evidence that the HTPs it automatically inferred from data are meaningful
within the context of TPR.
179
8.7 Availability
The tools developed during the course of this work have been released as free, opensource software under the GNU General Public License v3.0. They are available at
https://github.com/laeubli/segcats and https://github.com/laeubli/viscats. The manually annotated translation sessions used for evaluation (cf. Sect. 8.3.3) are available
at http://www.casmacat.eu/?n=Main.Downloads.
Acknowledgements This work was supported in part by the European Union Seventh Framework
Programme for Research, Technological Development and Demonstration (FP7/20072013) under
grant agreement no. 287576 (CASMACAT).
References
Alabau, V., Buck, C., Carl, M., Casacuberta, F., Garca-Martnez, M., Germann, U., et al.
(2014). Casmacat: A computer-assisted translation workbench. In Proceedings of the 14th
Conference of the European Chapter of the Association for Computational Linguistics (EACL),
Gothenburg, Sweden (pp. 2528).
Arthur, D., & Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA (pp.
10271035).
Baum, L. E. (1972). An inequality and associated maximization technique in statistical estimation
for probabilistic functions of Markov processes. In Proceedings of the 3rd Symposium on
Inequalities, Los Angeles, CA (pp. 18).
Carl, M. (2010). A computational framework for a cognitive model of human translation processes.
In Proceedings of ASLIB Translating and the Computer (Vol. 32), London, UK.
Carl, M. (2012). Translog-II: A program for recording user activity data for empirical reading and
writing research. In Proceedings of the 8th International Conference on Language Resources
and Evaluation (LREC), Istanbul, Turkey (pp. 41084112).
Carl, M., & Jakobsen, A. L. (2009). Towards statistical modelling of translators activity data.
International Journal of Speech Technology, 12(4), 125138.
Carl, M., & Kay, M. (2011). Gazing and typing activities during translation: A comparative study
of translation units of professional and student translators. Meta, 56(4), 952975.
Carl, M., Garca, M. M., & Mesa-Lao, B. (2014). CFT13: A resource for research into the postediting process. In Proceedings of the 9th International Conference on Language Resources
and Evaluation (LREC), Reykjavik, Iceland (pp. 17571764).
Chinea-Rios, M., Sanchis-Trilles, G., Ortiz-Martnez, D., & Casacuberta, F. (2014). Online
optimisation of log-linear weights in interactive machine translation. In Proceedings of the 9th
Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland (pp. 35563559).
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data
via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 138.
Dragsted, B., & Carl, M. (2013). Towards a classification of translation styles based on eye-tracking
and keylogging data. Journal of Writing Research, 5(1), 133158.
Elming, J., Winther Balling, L., & Carl, M. (2014). Investigating user behaviour in post-editing
and translation using the CASMACAT workbench. In S. OBrien, L. Winther Balling, M. Carl,
M. Simard, & L. Specia (Eds.), Post-editing of machine translation (pp. 147169). Newcastle
upon Tyne: Cambridge Scholars Publishing.
180
Federico, M., Bertoldi, N., Cettolo, M., Negri, M., Turchi, M., Trombetti, M., et al. (2014).
The MateCat tool. In Proceedings of the 25th International Conference on Computational
Linguistics (COLING), Dublin, Ireland (pp. 129132).
Fleiss, J. L. (1971) Measuring nominal scale agreement among many raters. Psychological
Bulletin, 76(5), 378.
Gerloff, P. (1986). Second language learners reports on the interpretive process: Talk-aloud
protocols of translation. In J. House & S. Blum-Kulka (Eds.), Interlingual and intercultural
communication: Discourse and cognition in translation and second language acquisition
studies (pp. 243262). Tbingen: Narr.
Green, S., Heer, J., & Manning, C. D. (2013). The efficacy of human post-editing for language
translation. In Proceedings of the 2013 ACM SIGCHI Conference on Human Factors in
Computing Systems (CHI), Paris, France
Guerberof, A. (2009). Productivity and quality in the post-editing of outputs from translation
memories and machine translation. International Journal of Localisation, 7(1), 1121.
Hvelplund, K. (2011). Allocation of cognitive resources in translation: An eye-tracking and keylogging study. Ph.D. thesis, Copenhagen Business School, Copenhagen, Denmark.
Jakobsen, A. L. (1999). Logging target text production with Translog. Copenhagen Studies in
Language, 24, 920.
Jakobsen, A. L. (2003). Effects of think aloud on translation speed, revision and segmentation. In
F. Alves (Ed.), Triangulating translation. Benjamins translation library (Vol. 45, pp. 6995).
Amsterdam, Netherlands: John Benjamins.
Jakobsen, A. L. (2011). Tracking translators keystrokes and eye movements with Translog.
In C. Alvstad, A. Hild, & E. Tiselius (Eds.), Methods and strategies of process research:
Integrative approaches in translation studies. Benjamins translation library (Vol. 94, pp. 37
56). Amsterdam, Netherlands: John Benjamins.
Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension.
Psychological Review, 87(4), 329354.
Koehn, P., & Germann, U. (2014). The impact of machine translation quality on human postediting. In Proceedings of the Workshop on Humans and Computer-assisted Translation
(HaCaT), Gothenburg, Sweden (pp 3846).
Koponen, M., Aziz, W., Ramos, L., & Specia, L. (2012). Post-editing time as a measure of cognitive
effort. In Proceedings of the AMTA 2012 Workshop on Post-Editing Technology and Practice
(WPTP), San Diego, CA (pp. 1120).
Krings, H. P. (1995). Texte reparieren. Empirische Untersuchungen zum Proze der Nachredaktion
von Maschinenbersetzungen. Habilitation thesis, University of Hildesheim, Hildesheim,
Germany.
Krings, H. P. (2001). Repairing texts: Empirical investigations of machine translation post-editing
processes. Kent, OH: Kent State University Press.
Krings, H. P. (2005). Wege ins Labyrinth Fragestellungen und Methoden der bersetzungsprozessforschung im berblick. Meta, 50(2), 342358.
Lubli, S. (2014). Statistical modelling of human translation processes. Masters thesis, University
of Edinburgh, Edinburgh, UK.
Lubli, S., Fishel, M., Massey, G., Ehrensberger-Dow, M., & Volk, M. (2013). Assessing postediting efficiency in a realistic translation environment. In Proceedings of the 2nd Workshop on
Post-Editing Technology and Practice (WPTP), Nice, France (pp. 8391).
Martnez-Gmez, P., Minocha, A., Huang, J., Carl, M., Bangalore, S., & Aizawa, A. (2014).
Recognition of translator expertise using sequences of fixations and keystrokes. In Proceedings
of the Symposium on Eye Tracking Research and Applications (ETRA), Safety Harbor, FL (pp.
229302).
Massey, G., & Ehrensberger-Dow, M. (2014). Looking beyond the text: The usefulness of
translation process data. In D. Knorr, C. Heine, & J. Engberg (Eds.), Methods in writing process
research. Frankfurt am Main, Germany: Peter Lang (pp. 8189).
OBrien, S. (2006). Pauses as indicators of cognitive effort in post-editing machine translation
output. Across Languages and Cultures, 7(1), 121.
181
Chapter 9
M. Schaeffer ()
Center for Research and Innovation in Translation and Translation Technology, Department of
International Business Communication, Copenhagen Business School, Frederiksberg, Denmark
Institute for Language, Cognition and Computation, University of Edinburgh, Edinburgh, UK
e-mail: moritzschaeffer@gmail.com
B. Dragsted L.W. Balling M. Carl
Center for Research and Innovation in Translation and Translation Technology, Department of
International Business Communication, Copenhagen Business School, Frederiksberg, Denmark
K.T. Hvelplund
Department of English, Germanic and Romance Studies, University of Copenhagen, Copenhagen,
Denmark
Springer International Publishing Switzerland 2016
M. Carl et al. (eds.), New Directions in Empirical Translation Process Research,
New Frontiers in Translation Studies, DOI 10.1007/978-3-319-20358-4_9
183
184
M. Schaeffer et al.
to have similar syntactic structures. These items are therefore more likely to prime
structurally. Source items which have few translation alternatives are more likely
to share a semantic representation and are hence more likely to prime semantically
than items with more translation alternatives. Findings support the literal translation
hypothesis.
Keywords Co-activation Priming Translation Entropy Eye movements
9.1 Introduction
It has been a subject of debate in translation process research (TPR) whether
translation is a sequential process or whether and to what extent comprehension
and production activities may occur in parallel (Carl and Dragsted 2012; Balling
et al. 2014). In the sequential, or vertical perspective, human translation is described
(Gile 1995) as a process in which the translator first reads a source-language
(SL) segment, then formulates a meaning hypothesis, i.e., assigns a meaning
to the translation segment by drawing on SL and general world knowledge, and
possibly external information sources, and then checks the meaning hypothesis for
plausibility. Having finished the processes involved in understanding the source text
(ST), the translator moves on to reformulating the meaning hypothesis in the target
language (TL), drawing again on general world knowledge and on knowledge of
the TL, and checks for fidelity and general acceptability, continuously revising the
target text (TT) until a satisfactory version has been arrived at. In the same vein,
according to the Interpretive Model (Lederer 1994) translation is a process in which
the translator understands the text, deverbalizes its language and re-expresses its
sense in the TL.
Common to these models is that they view ST reading as a phase distinct from
the reformulation phase and characterised largely by the same processes as reading
for monolingual comprehension. In contrast to this, the horizontal/parallel view
holds that TL reformulation commences during ST comprehension, and that the
process involved in reading for translation is different from reading for monolingual
comprehension (see e.g. Jakobsen and Jensen 2008; Schaeffer et al. forthcoming). In
line with this view, Carl and Dragsted (2012) propose that the ST is understood and
meaning hypotheses are generated only to the extent required to keep on producing
target text. Deep ST understanding is prompted by problems occurring in the TT.
If TT production is interrupted, for instance because the translator is not able to
retrieve an appropriate TL equivalent or is considering which translation to choose
out of several alternatives (see below), the missing information needs to be retrieved.
This may lead to increased eye movement activity and gaze time on a ST word or
passage with a view to verification or reinterpretation (ibid.: 143144).
Schaeffer and Carl (2013: 185) propose a different kind of model in which
: : : both horizontal and vertical processes are always active at the same time.
Schaeffer and Carl (ibid) argue that : : : that the horizontal process is an early
185
process while the vertical processes depend on context which becomes available
later, as processing advances in the chunk or text : : : .
This study assumes that translators read the ST with TT production in mind;
hence, different processes are involved in reading for translation than in reading for monolingual comprehension. Previous studies which found evidence of
co-activation of the two linguistic systems during ST reading, i.e., studies which
found support for the hypothesis that translation is a parallel/horizontal process,
employed late eye movement or other late behavioural measures. This study tests
the hypothesis that target-language-specific aspects have an impact during very early
stages of ST processing. If target language specific aspects have an impact on early
eye movement measures, this would allow for a much stronger claim regarding the
horizontal/parallel view, because early eye movement measures are more indicative
of automatic processes than late measures, and any effect is more likely to allow for
conclusions regarding bilingual lexicon.
This study analyses a subset of the CRITT TPR-DB, described in Chap. 2, in
order to test whether target-language-related aspects have an effect on early and
late eye movement measures on the source text. The study only considers eye
movements on the ST, given the object of interest is whether or not the two linguistic
systems are co-activated during reading for translation: it is therefore not of interest
to us whether target-language-related aspects have an effect on eye movements
during TT reading.
186
M. Schaeffer et al.
187
on wordn , regardless of when these took place. In this sense, total reading time is a
very late measure of word processing and includes post-lexical integration processes
and gaze during translation revisions. The measure skipping rate or probability of a
fixation describes the number of times or the likelihood that a wordn is not fixated
at all. A wordnC1 to the right of the fixation on wordn can be pre-processed during
fixation on wordn and may be guessed (e.g. Ehrlich and Rayner 1981), and it may
therefore not need to be fixated at all. Skipping rate is normally computed on the
basis of a first run, i.e., on the basis of a forward movement through the text. The
first run normally ends if either the end of the sentence is reached or a regression is
made. The probability of a fixation reported here is different. The probability of a
fixation reported here describes whether a word has been fixated at allirrespective
of whether this occurred in the first or any subsequent runs through the text. Thus,
a probability of a fixation of zero reported here refers to a situation in which a word
received no fixation at all during the whole of the session.
Whether an observed effect occurs during early or late eye movement measures
may be an indicator of whether it is cognitively determined or evidence of willed
behaviour. A very early effect may give an indication regarding the automatic
cognitive mechanism underlying the effect while late eye movement measures are
more likely to reflect rather conscious behaviour.
Popular measures of gaze activity in TPR have been average fixation duration,
total reading time on the word, segment or whole text and pupil size. One of the
few studies that have applied first fixation duration (FFDur) to examine aspects of
cognitive processing during translation is Rydning and Lachaud (2010). Comparing
FFDur of professional translators and bilingual non-translators, the authors found
that professional translators were able to recognise the meaning of polysemous
words outside of context more quickly than bilinguals (2010: 99) as indicated by
significantly shorter FFDur. This finding was further supported by the same effect
on TRT. In the introduction we argue that during translation, source and targetrelated processes are tightly intertwined. The following section presents preliminary
and coarse evidence to support this view.
188
M. Schaeffer et al.
Fig. 9.1 Distribution of percentage of concurrent ST reading and TT typing of total production
time (per production unit)
Fig. 9.2 Distribution of the percentage of the production time translators concurrently read the
ST while typing the TT (only for those production units where some concurrent ST reading while
typing occurred)
be activated simultaneously. The ParalS feature in the PU files of the TPR-DB (see
Chap. 2) captures this kind of manifest concurrency. Figures 9.1 and 9.2 visualise
the distribution of the percentage of the production time translators read the ST
189
while typing the TT (see Sect. 9.3 for a brief outline of the data sets included in this
study).
Figures 9.1 and 9.2 illustrate that, while it is relatively rare for concurrent ST
reading and TT typing to occur for a complete production unit (Fig. 9.1), there are
many shorter stretches of time when translators read the source text while typing the
target text (Fig. 9.2). Approximately 40 % of the PUs in the data (9148 out of a total
of 23,294) has some concurrent typing while reading the ST. These findings mirror
earlier research which has found that the mean duration of instances of this kind of
manifest concurrency is around 429 ms, and considerably longer for ST processing
units (846 ms) and TT processing units (1141 ms) (Hvelplund 2011: 143).
The next section introduces the literal translation hypothesis in relation to the
two features of the CRITT TPR-DB which will be used to predict early and late eye
movement measures during translation.
190
M. Schaeffer et al.
Fig. 9.3 Distribution of absolute Cross values (values over 16 have been excluded for ease of
presentation and because there are very few of these)
source and the target. Figure 9.3 shows that a Cross value of 1 is very frequentthe
distribution has a large peak at Cross 1 (Fig. 9.3). However, about 40 % of the words
in the data has a Cross value above one (14,653 out of a total of 37,519). The vast
majority (over 90 %) of words have more than one alternative translation (Fig. 9.4).
191
The fact that most words have more than one translation alternative (see below)
highlights that the literal translation hypothesis describes an ideal or prototype that
often remains an interim representation. It is more likely that translators maintain
the same (relative) word order in the TT as in the ST (Cross value of 1). What
Fig. 9.3 suggests is that translators tend to maintain the (relative) source word order
in individual translation units. In the case of relative word order, the literal default
often goes beyond an interim representation and finds its way into the final TT.
The degree to which two items are literal, according to the definition above, may
predict how easy it is to process this item. If the literality of an item has an effect
on early eye movement measures, it is likely that this has to do with how the brain
represents language during translation, that horizontal processes occur early, and
that it is the co-activation of the two linguistic systems which makes reading from
translation fundamentally different from monolingual reading. If, however, literality
only has an effect on late eye movement measures, it is more likely that translation
is a more sequential process. If there is an effect on both early and late measures,
this would lend support to the model proposed by Schaeffer and Carl (2013) who
argue that horizontal processes take place early and vertical monitor processes take
place late.
192
M. Schaeffer et al.
translation and of priming (see below), if items, which are likely to share a
representation or which are likely to be closely linked, result in facilitation. The
next sections will briefly introduce relevant priming studies.
193
associated with a lexical item are bundled. The Sense Model argues that the
observed translation asymmetry is due to the fact that L1 items have more associated
senses than L2 items. According to the Sense Model, L2 words share fewer senses
with L1 translation equivalents, because they have a smaller number of associated
senses, while L1 primes have a higher number of associated senses and therefore
the priming effect from L1 to L2 is also stronger. Finkbeiner et al.s (2004) Sense
Model argues that the degree of semantic overlap between two words predicts how
strong the priming effect is. A large semantic overlap between two words results in
a strong priming effect and if the overlap is small or the L2 senses are not known to
the bilingual, the priming effect is weaker. The number of alternative translations for
a single ST word may also partly represent the (lack of) semantic overlap. There are
a number of single word studies, which have shown that words with more than one
possible translation are recognised and produced more slowly (e.g. Tokowicz and
Kroll 2007; Laxn and Lavaur 2010; Boada et al. 2012; Eddington and Tokowicz
2013).
In sum, the strength of the priming effect may depend on the degree to which
a linguistic item fulfils the literality criteria described above: the more literal
(according to the above definition) an item is, the stronger the priming effect. If
the syntactic structures in ST and TT are similar it is likely that the word order
is also similar and if the overlap in semantic representations for a lexical item
is similar, it is likely that a word will tend to be translated in the same way by
different translators. In other words, the degree to which two items share structural
and semantic representations may predict the strength of the priming effect. If Cross
and HTra have an effect on early eye movement measures, this would lend support
to models of the bilingual lexicon which posit non-selectivity. Such an early effect
would further lend support to the DCFM and the Sense Model, given that HTra is
a continuous variable which may describe the graded overlap in terms of semantics
between a source word and its translation.
194
M. Schaeffer et al.
TT equivalents. The authors of both these studies argue that this is evidence for
co-activation.
Balling et al. (2014) found an effect of congruence on total reading time of a
group of words, such that ST segments which required re-ordering in the TL were
read for longer, and interpreted these findings as evidence for co-activation.
While the studies previously discussed worked with various measures of total
reading time, the first study, to our knowledge, which tested the effect of the target
language on early eye movement measures is Schaeffer et al. (forthcoming). This
study manipulated the number of target words required to translate a single source
word. Two kinds of items were embedded in the same sentence frames: one-to-one
(the ST word was likely to be translated using just one TT word) and one-to-many
(the ST word was likely to be translated using more than one TT word). Participants
read these sentences in two conditions: reading for comprehension and reading for
translation. Schaeffer et al. found a 20 ms effect on average fixation durations:
during reading for translation, the average fixation duration across the whole
sentence was 20 ms longer than during reading for comprehension. This increase
in average fixation duration cannot be explained in terms of motor aspects of target
text production, because participants were asked to first read the source sentence and
were told to only start typing once they knew how they would translate the sentence.
Schaeffer et al. further found that participants made on average 16 fixations more
per sentence during reading for translation and the number of regressions also
doubled, as did total reading time. The significant increase in all relevant eyemovement measures suggests that reading for translation is fundamentally different
from reading for comprehension. For first fixation duration, the effect of the number
of required TT words was only found when reading for translation, not during
monolingual comprehension. However, the effect was relatively large (23 ms). This
study suggests that target language-specific aspects are activated already very early
during ST reading for translation.
In sum, most studies which found evidence of co-activation during translation
employed late eye movement measures or reaction times. The current study
therefore aims to investigate if and to what extent earlier eye movement measures
in reading for translation show evidence of co-activation. More specifically, the aim
is to study whether early eye movement measures are affected by the number of
alternative translations for a single source text word and by word order differences.
Drawing on the CRITT-TPR database (Carl 2012), we evaluate the hypothesis that
the number of TT alternatives (HTra) and Cross value have an effect on early eye
movement measures.
195
196
M. Schaeffer et al.
assumption that misspellings are approximately evenly distributed over the data set.
In addition, datapoints which were more than 2.5 standard deviations above or below
a participants mean for a particular dependent variable were excluded. This resulted
in the exclusion of less than 3 % for each dependent variable (apart from skipping
probability where less than 5 % were excluded).
Table 9.1 Dependent variables, predictors and random factors for the LMEMs described in more
detail below
Dependent
variables
FFDur
FFDurS
RPDur
TrtS
ProbFix
Reg
Predictors
Prob1 C LenS C STseg
C HTra C abs(Cross)
Random variables
C (1jParticipantUnique) C (1jItem) C
(1jTextUnique) C (1jStudy)
197
Fig. 9.5 (ae) The effects of frequency (Prob1), word length (LenS), segment identifier (STseg),
word translation entropy (HTra), and absolute values of crossing word re-ordering (Cross) on first
fixation durations (FFDur)
198
M. Schaeffer et al.
Fig. 9.6 (ad) The effect of frequency (Prob1), word length (LenS), segment identifier (STseg),
and absolute Cross on (log) gaze durations
9.4.1 Results
9.4.2 First Fixation Durations
First fixation durations represent the first contact with a wordn , before the eyes either
re-fixate wordn or move on to word to the left or right. All the low-level aspects of
word recognition such as integration of visual features of letters occur during a
first fixation duration in addition to processing of morphological and phonological
aspects all of which result in lexical access. In addition to the processing of wordn ,
wordnC1 is pre-processed in terms of visual features such as word length.
The effect of frequency on FFDur was significant and in the expected direction.
The effect of LenS was significant, but in the opposite direction of what would
intuitively be expected; recall, however, that FFDur is the duration of a single
fixation, the first on the word, which does not automatically become longer for
longer words (e.g. Hyn and Olson 1995). As indicated in Table 9.2 and Fig. 9.6b,
the effect of LenS on FPDurS was in the expected direction, suggesting that the
longer reading times for longer words are due to re-fixations. The significant effect
199
Fig. 9.7 (ad) The effect of frequency (Prob1), word length (LenS), scaled absolute Cross, and
scaled word translation entropy on the probability of a fixation
of STseg on first fixation durations suggests that translators become faster as they
progress in the translationeven for such an early measure.
Both Cross and HTra were positively significantly correlated with FFDur. This
suggests that target-language-specific aspects play a role at the earliest stages of
reading, i.e. SL and TL are co-activated from the very first visual encounter with
an ST word. In addition, words with fewer alternative translations and lower Cross
values require less effort to process than words with a higher number of alternative
translations and higher Cross. This may indicate that these ST words are more likely
to prime and facilitate their TT equivalents than words with a higher word translation
entropy and higher Cross value. The Cross effect was relatively large, suggesting
that re-ordering and structural priming play a large role during the early stages of
reading during translation. This seems to confirm the marginally significant effect
found in Chap. 10. Together, this further lends support to the literal translation
hypothesis, as defined above, in that the default rendering procedure during ST
reading in translation is to generate an interim representation in which ST word
order and TT word order are identical, where ST and TT items correspond one-toone and in which each ST word has only one possible translated form. When this
is not possible, because of context, target norms or for any other reason, cognitive
effort increases.
200
M. Schaeffer et al.
Fig. 9.8 (ad) The effect of frequency (Prob1), word length (LenS), segment identifier (STseg),
and word translation entropy (HTra) on the probability of a regression
Ideally, linear mixed models should have normally distributed residuals; visual
examination of the residual distribution showed that this was not the case for
this analysis and the following analysis of gaze duration; instead, the residuals
showed a somewhat bimodal distribution. This suggests that, although this model
explains a number of aspects of the translation process, there may be important
variation that our predictors do not capture. This is not surprising given the large
number of variables that may affect translation. It may be noted that mean first
fixation durations, gaze durations and regression path durations are relatively short
compared to monolingual reading. It may also be noted that the effects of word
length and frequency are rather small in comparison to monolingual reading: the
effect of e.g. frequency on FFDur is typically in the region of 2030 ms and on
gaze duration normally around 5060 ms while here, it is around 6 ms and 20 ms,
respectively. However, it is unlikely that this is task related. It is more likely that this
is because of the way fixations are calculated in the different studies.
201
Fig. 9.9 (ac) The effect of frequency (Prob1), word length (LenS), and segment identifier (Stseg)
on regression path durations.
202
M. Schaeffer et al.
Fig. 9.10 (ae) The effect of frequency (Prob1), word length (LenS), segment identifier (STseg),
word translation entropy (HTra), and absolute Cross on total reading times
t
2.59
SE
4.24
0.01
0.03
7.88
0.03
0.08
4.34
2.81
5.51
1.77
4.15
5.96
2.36
3.14
4.76
12.59
p < .1, *p < .05, **p < .01, ***p < .001
FFDur
FPDurS
Reg
RPDur
ProbFix
TrtS
FFDur
FPDurS
Reg
RPDur
ProbFix
TrtS
SE
4.49
4.96
0.02
7.22
0.03
8.21
Prob1
1.86
2.98
0.04
2.27
0.14
1.03
HTra
1.10
4.99E-03
3.90E-08
0.0769
p
9.70E-03
p
3.45E-05
1.91E-09
0.0182
1.69E-03
1.97E-06
<2E-16
**
***
**
***
***
*
**
***
***
0.05
8.96
LenS
1.12
1.07
0.03
2.54
0.15
9.91
Cross
3.62
2.70
Table 9.2 LMEM results for all predictors and dependent variables
0.03
2.13
SE
1.49
1.60
SE
2.25
2.50
0.01
3.63
0.02
4.12
1.75
4.21
t
2.43
1.69
t
4.99
4.27
3.26
7.00
9.71
24.04
0.08082
2.51E-05
p
0.0151
0.0915
p
6.50E-07
2.05E-05
1.10E-03
3.05E-12
<2E-16
<2E-16
***
*
***
***
**
***
***
***
SE
2.04
2.29
0.01
3.14
3.79
STseg
3.24
4.31
0.04
1.94
1.31
3.46
t
15.90
18.82
5.16
6.17
5.45E-04
p
<2E-16
<2E-16
2.41E-07
2.18E-09
***
***
***
***
***
204
M. Schaeffer et al.
The model for probability of a fixation did not converge when all the random
effects were included. We therefore excluded one random effect that showed the
least variation in the other models, namely Text. STseg was also excluded, because
the model did not converge when this predictor was included. In addition, Cross and
HTra were scaled (the variables were scaled by subtracting the mean and dividing
by the standard deviation). As expected, higher frequency words were less likely
to be fixated than lower frequency words. Equally expected was the effect of word
length on the probability of a fixation (positively associated, such that longer words
were more likely to be fixated). Very surprising are the effects of Cross and HTra on
the probability of a fixation. Although these effects were modest in size and, in the
case of Cross only a marginally significant effect was identified, the results suggest
that translators anticipate target-language-specific aspects of upcoming words and
skip these if they are easy to process. These effects underline the fact that activation
of target-related aspects occurs very early.
205
the right of wordn . RPDur is a relatively late measure and indicative of integration
problems.
The effects of frequency and word length on regression path duration were in the
expected direction and significant. The effect of STseg on regression path duration
was more modest than frequency and word length, but all were highly significant.
Again, the negative slope suggests that integration is less costly towards the end of
the text, given that translators have a relatively good discourse model of both the ST
and the TT towards the end of the text, making it easier to integrate difficult words.
Visual inspection of the residuals showed a relatively normal distribution. The fact
that neither Cross nor HTra had an effect on regression path durations confirms the
findings from earlier measures: shared semantic and structural representations are
activated automatically and early, and serve as a basis for production and monitoring
during much later processes.
206
M. Schaeffer et al.
However, it is not likely that TT production, i.e. the actual typing, is responsible
for the observed effects on early eye movement measures: in the study by Schaeffer
et al. (forthcoming), participants were instructed to only start writing once they had
a translation in mind, and eye movements were only recorded during the reading
phase and not after the TT production was started. The studies by Macizo and
Bajo (2006) and Ruiz et al. (2008) also separated the reading phase from the (oral)
text production stage and also found target-language-specific effects, but only when
the reading purpose was translation, not when the reading purpose was repetition.
In other words, even when reading and writing are kept experimentally separated,
target-language-specific effects on ST reading are observable.
We found an effect of word translation entropy and syntactic source-target
language reordering on first fixation durations and the probability of a fixation.
This supports the integrated nature of the bilingual lexicon and cross-linguistic
priming: relative word order and semantic overlap between lexical items of two
different languages can be quantified and has an observable effect on eye movements
during translation. We observed an early and a late effect of word translation
entropy and word order, which further confirms what has long been suggested in
translation studies (e.g. Englund Dimitrova 2005; Krings 1986), i.e. that translation
is subliminal and automatic to a certain extent (see also Wu and Thierry 2012) and
partly conscious and willed behaviour. Schaeffer and Carl (2013: 173) argue that
: : : identification of shared aspects is automatic and there is no conscious control
over how source and target are aligned cognitively : : : The evidence provided in
the current chapter supports this view and further supports more generally the model
proposed by Schaeffer and Carl (2013: 185) which posits : : : Early during source
text reading, shared representations are activated which then serve as a basis for
regeneration in the target language. The early effect of Cross and HTra is evidence
of horizontal, automatic processes while the late effect of these target-languagerelated aspects on TrtS is evidence of vertical monitor processes.
Our results show that words which have been translated in the same way by
different translators are more likely to prime and facilitate processing, while words
which are translation ambiguous (Eddington and Tokowicz 2013), i.e. words with
more than one possible translation, are less likely to prime and more likely to inhibit
processing already at a very early stage (during first fixation duration)most likely
because of the way the bilingual lexicon represents these items. Our results show
that he degree to which two items share structural and semantic representations
predicts the strength of the priming effect. That Cross and HTra have an effect
on early eye movement measures lends support to models of the bilingual lexicon
which posit non-selectivity, such as the BIAC model (Dijkstra and van Heuven
2002). This effect further lends support to the DCFM (De Groot 1992) and the
Sense Model (Finkbeiner et al. 2004), given that HTra is a continuous variable
which describes the graded overlap in terms of semantics between a source word
and its translation.
Most cross-linguistic structural priming studies have employed comprehensionto-production paradigms: typically, the prime sentence is read and the target
sentence is produced and the priming effect describes the influence of the read
207
sentence on the produced sentence. The study by Kidd et al. (2014) is the only
study, to the authors knowledge, to report structural cross-linguistic priming during
comprehension. These authors found a priming effect from a comprehended English
sentence to how a German sentence was comprehended. Given that for the present
study, we only considered eye movements on the source text and given that we
interpret the effects of HTra and Cross as priming effects, they are situated at the
border between comprehension and productionespecially in the case of the early
effects. The results show that the Cross value has an effect on first fixation durations
such that words with higher Cross values elicited significantly longer first fixation
durations. This finding is in line with Kidd et al. (2014), in that low Cross values
prime and facilitate processing, while words with a higher Cross value do not prime
and inhibit. In other words, what these findings suggest is that SL representations
prime TL-related processes during source text comprehension. Cross describes the
degree of overlap between source and target in terms of word order. When this
overlap is high, co-activated or shared structural representations facilitate source text
reading during translation, because, during this kind of reading, production-related
representations are already active at a very early stage. E.g. Schoonbaert et al. (2007)
found that cross-linguistic syntactic priming can be boosted if the verb in the prime
and target sentence is a translation equivalent. During translation, most target words
are of course translation equivalents of source words and the relatively modest boost
observed in priming studies can be assumed to be much stronger during translation.
It is interesting that STseg has a relatively large effect on first fixation durations
and that STseg has a significant and often relatively large effect on all relevant eye
movement measures (apart from probability of a fixation). The consistency and size
of this effect suggest that creating a discourse model is of great importance during
translation, making a faster processing possible.
Finally, it seems obvious, on the basis of the evidence presented here, that early
processes are horizontal and that the output from the early processes serves as a basis
for late, vertical processes. It is highly likely that production-related processes and
source language reading processes cannot be separated. This conclusion becomes
even stronger considering the naturalistic data used in the present investigation, in
comparison to some of the other studies mentioned. At the same time, those less
naturalistic studies show the interesting fact that separation of SL and TL processes
does not even occur when these processes are separated experimentally.
According to these findings, Malmkjaer is right when she argues that the literal
translation hypothesis is one of the very few phenomena, which qualifies : : : for the
status of cognitively determined universals : : : (2005: 17): it should be highlighted
that the data for the present study consisted of one source language (English) and
six rather distinct target languages (Danish, Spanish, Estonian, Chinese, Hindi,
and German). In other words, had we found these effects in one language only,
but not in others, it would be possible that the effects are specific to a particular
language combination, or a specific target language, rather than a phenomenon
which holds across language combinations. One other aspect of the current study
may lend further weight to Malmkjaers claim: while the processes during first
fixation durations are of course not completely automatic, an individual has far
208
M. Schaeffer et al.
less willed control over the processes which are at play during the first 250 ms
of the processing of a word than is the case for total reading time. In other words,
it is likely that the role of primed representations highlights cognitively determined
constraints rather than willed behaviour.
While the present study includes a relatively broad sample of target languages
compared to the literature, it is of course limited considering the vast number
of different languages across the globe and the findings will require further
corroboration. It remains beyond dispute, however, that a multitude of concurrent
processes are at play during (reading for) translation, which suggest that reading for
translation is fundamentally different from monolingual reading.
References
Baayen, R. H. (2013). languageR: Data sets and functions with Analyzing linguistic data: A
practical introduction to statistics. Available at: http://cran.r-project.org/package=languageR.
Balling, L. W., & Carl, M. (2014). Production time across languages and tasks: A large-scale analysis using the CRITT translation process database. In J. W. Schwieter & A. Ferreira (Eds.), The
development of translation competence: Theories and methodologies from psycholinguistics
and cognitive science (pp. 239268). Newcastle upon Tyne: Cambridge Scholars Publishing.
Balling, L. W., Hvelplund, K. T., & Sjrup, A. C. (2014). Evidence of parallel processing during
translation. Meta, 59(2), 234259.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). flme4g: Linear mixed-effects models
using Eigen and S4. Available at http://cran.r-project.org/package=lme4.
Bernolet, S., Hartsuiker, R. J., & Pickering, M. J. (2007). Shared syntactic representations in
bilinguals: Evidence for the role of word-order repetition. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 33(5), 931949.
Bernolet, S., Hartsuiker, R. J., & Pickering, M. J. (2013). From language-specific to shared
syntactic representations: The influence of second language proficiency on syntactic sharing
in bilinguals. Cognition, 127(3), 287306.
Boada, R., Snchez-Casas, R., Gaviln, J. M., Garca-Albea, J. E., & Tokowicz, N. (2012). Effect
of multiple translations and cognate status on translation recognition performance of balanced
bilinguals. Bilingualism: Language and Cognition, 16(01), 183197.
Brysbaert, M., & New, B. (2009). Moving beyond Kucera and Francis: A critical evaluation of
current word frequency norms and the introduction of a new and improved word frequency
measure for American English. Behavior Research Methods, 41(4), 977990.
Carl, M. (2012). The CRITT TPR-DB 1.0: A database for empirical human translation process
research. In AMTA 2012 workshop on post-editing technology and practice.
Carl, M., & Dragsted, B. (2012). Inside the monitor model: Processes of default and challenged
translation production. In Translation: Corpora, computation, cognition. Special issue on the
crossroads between contrastive linguistics, translation studies and machine translation, 2(1),
127145.
Carl, M., & Schaeffer, M. (forthcoming). Literal translation and processes of post-editing.
In: Translation in transition: Between cognition, computing and technology. Amsterdam:
Benjamins.
Chen, B., Jia, Y., Wang, Z., Dunlap, S., & Shin, J.-A. (2013). Is word-order similarity necessary for cross-linguistic structural priming? Second Language Research, 29(4), 375389.
doi:10.1177/0267658313491962.
209
210
M. Schaeffer et al.
Krings, H. P. (1986). Was in den Kpfen von bersetzern vorgeht: eine empirische Untersuchung
zur Struktur des bersetzungsprozesses an fortgeschrittenen Franzsischlernern. Tbingen:
Gnter Narr Verlag.
Kuznetsova, A., Christensen, R. H. B., & Brockhoff, P. B. (2014). lmertest: Tests for random and
fixed effects for linear mixed effect models (lmer Objects of lme4 Package). R package version
2.0-6. Available at http://www.cran.rproject.org/package=lmerTest/.
Laxn, J., & Lavaur, J.-M. (2010). The role of semantics in translation recognition: Effects
of number of translations, dominance of translations and semantic relatedness of multiple
translations. Bilingualism: Language and Cognition, 13(02), 157.
Lederer, M. (1994). La traduction aujourdhui. Le modle interprtatif. Paris: Hachette.
Loebell, H., & Bock, K. (2003). Structural priming across languages. Linguistics, 41(5), 791824.
Macizo, P., & Bajo, M. (2006). Reading for understanding and reading for translation: Do they
involve the same processes? Cognition, 99, 134.
Malmkjaer, K. (2005). Norms and nature in translation studies. Synaps, 16, 1319.
McConkie, G. W., & Yang, S.-N. (2003). How cognition affects eye movements during reading.
In J. Hyn, R. Radach, & H. Deubel (Eds.), The minds eye: Cognitive and applied aspects of
eye movement research (pp. 413427). Oxford: Elsevier.
OBrien, S. (2006). Eye-tracking and translation memory matches. Perspectives, 14(3), 185205.
R Development Core Team. (2014). R: A language and environment for statistical computing.
Vienna: R Foundation for Statistical Computing. Available at: http://www.r-project.org/.
Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research.
Psychological Bulletin, 124(3), 372422.
Rayner, K. (2009). Eye movements and attention in reading, scene perception, and
visual search. Quarterly Journal of Experimental Psychology, 62(8), 14571506.
doi:10.1080/17470210902816461.
Ruiz, C., Paredes, P., Macizo, P., & Bajo, M. T. (2008). Activation of lexical and syntactic target
language properties in translation. Acta Psychologica, 128, 490500.
Rydning, A. F., & Lachaud, C. (2010). The reformulation challenge in translation. In G. M. Shreve
& E. Angelone (Eds.), Translation and cognition (pp. 85108), vi, 381 pp.
Schaeffer, M. J., Paterson, K., McGowan, V. A., White, S. J., & Malmkjr K. (forthcoming).
Reading for translation. In A. L. Jakobsen & B. Mesa-Lao (Eds.), Translation in Transition.
Amsterdam: John Benjamins.
Schaeffer, M. J., & Carl, M. (2013). Shared representations and the translation process: A recursive
model. Translation and Interpreting Studies, 8(2), 169190.
Schoonbaert, S., Hartsuiker, R. J., & Pickering, M. J. (2007). The representation of lexical and
syntactic information in bilinguals: Evidence from syntactic priming. Journal of Memory and
Language, 56(2), 153171.
Shin, J.-A., & Christianson, K. (2009). Syntactic processing in Korean-English bilingual production: Evidence from cross-linguistic structural priming. Cognition, 112(1), 175180.
doi:10.1016/j.cognition.2009.03.011.
Tokowicz, N., & Kroll, J. F. (2007). Number of meanings and concreteness: Consequences of
ambiguity within and across languages. Language and Cognitive Processes, 22(5), 727779.
Wu, Y. J., & Thierry, G. (2012). Unconscious translation during incidental foreign language
processing. NeuroImage, 59(4), 34683473.
Chapter 10
Abstract The present work investigates the relationship between syntactic variation and priming in translation. It is based on the claim that languages share a
common cognitive network of neural activity. When the source and target languages
are solicited in a translation context, this shared network can lead to facilitation
effects, so-called priming effects. We suggest that priming is a default setting in
translation, a special case of language use where source and target languages are
constantly co-activated. Such priming effects are not restricted to lexical elements,
but do also occur on the syntactic level. We tested these hypotheses with translation
data from the TPR database, more specifically for three language pairs (EnglishGerman, English-Danish, and English-Spanish). Our results show that response
S. Bangalore
Interactions Corporation, New Providence, NJ, USA
B. Behrens
Department of Literature, Area studies and European Languages, University of Oslo, Oslo,
Norway
M. Carl
Center for Research and Innovation in Translation and Translation Technology, Department of
International Business Communication, Copenhagen Business School, Frederiksberg, Denmark
M. Ghankot
Indian Space Research Organisation, Hassan, Andhra Pradesh, India
A. Heilmann ()
English Linguistics Department, RWTH Aachen, Aachen, Germany
e-mail: arndt.heilmann@rwth-aachen.de
J. Nitzke
Department for Language, Culture and Translation Studies in Germersheim (FTSK), University
of Mainz, Mainz, Germany
M. Schaeffer
Center for Research and Innovation in Translation and Translation Technology, Department of
International Business Communication, Copenhagen Business School, Frederiksberg, Denmark
Institute for Language, Cognition and Computation University of Edinburgh, Edinburgh, UK
A. Sturm
University of Geneva, 36, rue Prvost-Martin, 1205 Geneva
Springer International Publishing Switzerland 2016
M. Carl et al. (eds.), New Directions in Empirical Translation Process Research,
New Frontiers in Translation Studies, DOI 10.1007/978-3-319-20358-4_10
211
212
S. Bangalore et al.
times are shorter when syntactic structures are shared. The model explains this
through strongly co-activated network activity, which triggers a priming effect.
Keywords Cognitive Effort Priming Eye Tracking Translog Literal translation Co-activation Keystroke Logging
10.1 Introduction
A range of single word studies have investigated the effect of translation ambiguity
on behaviour during translation recognition and translation production. Tokowicz
and Kroll (2007) noted that when their English-Spanish bilingual participants
translated single word stimuli with more than one possible translation, their response
times were slower in comparison to words with only a single translation. Tokowicz
and Kroll attributed this effect to active competition between translation alternatives.
A selection between all possible alternatives is cognitively effortful as all items have
to be compared with regards to their appropriateness, and once the appropriate item
has been singled out, others have to be suppressed.
Up to now, the effect of translation alternatives on behaviour during translation
has been dominated by single word studies (e.g. Laxn and Lavaur 2010; Boada
et al. 2012; Eddington and Tokowicz 2013). To the best of our knowledge, the
first study to investigate behavioural measures for translations of whole texts is
the one by Dragsted (2012). In her study, she found increased total reading time
(on source text words), number of fixations (on source text words) and pauses in
the production for words with high lexical variance. It therefore seems likely that
processing of source text (ST) lexical items involves the activation of target language
(TL) competitors. However, a raw count of the number of competitors does not
directly reflect their influence on behavioural times, since some options may be
more dominant than others, i.e. receive more neural activation. In the following, we
will explain how such influence can be accounted for.
As a transfer process including the reproduction of an initial source message in
another context, every translation is a selection of a final target formulation out
of many possible target formulations (Neubert and Shreve 1992). However, the
details of this selection process and the factors influencing it are largely unknown.
Whenever an ST is translated by n translators producing TTn translations, each
single translation TTi is selected out of many possible target texts. Each selection
of the actual elements of TTi is determined by the characteristics of the target
language, its morphology, syntax, pragmatics and stylistics, the translation brief
and target audience etc., but also by the individual translator, her background and
experience. Each final target text TTi is thus a selection from possible options in the
target language which were available to one particular translator at one particular
point in time. It is highly unlikely that any two translators will produce exactly the
same translation of the same source text. In cases where every translator produces
a different translation, one would assume the selection process to be cognitively
213
214
S. Bangalore et al.
p .s ! ti /
TT1
1.00
0.50
0.25
0.50
0.18
0.30
TT2
TT3
TT4
TT5
TT6
0.50
0.25
0.16
0.18
0.14
0.25
0.17
0.16
0.14
0.25
0.17
0.16
0.14
0.16
0.14
0.16
0.14
H
0.00
1.00
2.00
1.79
2.58
2.51
increases. The concept of Entropy is denoted by the symbol H and represents the
average amount of non-redundant information provided by each item entering a
system. Entropy H is computed based on the probability p of an item entering
the system and its information. The probabilities p(s ! ti ) of an ST item s and its
possible translation ti : : : n are computed as the ratio of the number of alignments
s ! ti counted in TTs over the total number of observed TT segments, as in Eq.
(10.1). The information of a probability p is defined as I(p) D log2 (p), and entropy
H is the expectation of that information as defined in Eq. (10.2):
counts .s ! ti /
#translations
(10.1)
Xn
pi I .pi / D
pi log2 .pi /
(10.2)
p .s ! ti / D
HD
Xn
iD1
iD1
Table 10.1 describes the effect of probability distributions on entropy (H): if all six
translators choose the same translation realization for a given word, the probability
of this translation is at its maximum (6/6 D 1) and entropy is at its minimum (0),
but as soon as translators opt for different target realizations, entropy increases: if
one option has a probability of 0.30 and five other options have each a probability
of 0.14, then entropy is relatively high (2.51). If there are four different options, but
all four options have the same probability (0.25), entropy is higher than when one
of the four options has a higher probability (0.50) than the other three (0.16, 0.17,
0.17). For example, the entropy value 2.51 calculated in the following way:
2:51 D 1 0:30 LOG2 0:30 C 4 0:14 LOG2 0:14
(10.3)
Instead of counting all possible translation alternatives for a given source item,
entropy captures the weight of each of these alternatives and may hence be a better
reflection of the cognitive environment of translators working on a given text. In
other words, it captures the distribution of probabilities for each translation option,
so that more likely choices and less likely options are weighted accordingly. The
following section examines possible factors which might have an influence on
entropy.
215
216
S. Bangalore et al.
Whereas the translation of worry into German requires three words sich Sorgen
machen, laugh can be translated by a single word lachen. Schaeffer et al. found
that the first fixation duration was 23 ms longer when more than one target word
was needed for the translation. Again, this effect occurred only in the reading for
translation condition. This study further supports the idea that translation occurs
online and that target items are activated early during source text reading.
Wu and Thierry (2012) lend further support to the automatic co-activation
of the two languages which they observed even though the experimental design
discouraged it. In their ERP study, participants were asked to press a button
in response to the presentation of circles or squares. Participants were told that
sometimes words would appear on the screen, but were instructed to ignore these.
15 % of these words were interlingual homophones, i.e. their Chinese translation
would sound similar to either of the words circle or square. Wu and Thierry
found an N200 effect for these homophones, suggesting that participants had to
inhibit their spontaneous reaction of pressing the button any time the English
word activated the Chinese words for either square or circle. Thus, co-activation
could be detected in an environment where it was explicitly discouraged and even
irrelevant to the task. We therefore assume that both the ST and TT language are
simultaneously activated during the entire translation process. That means that the
translator becomes engaged in exploring and selecting potential target text elements
as soon as she starts reading the source sentence. As both languages may be
activated to the same degree, it is likely that they influence one another during this
selection process. One form of this mutual influence is priming (see Sect. 10.2.3
below). In addition, given that in the studies by Bajo and colleagues, the effect was
more pronounced for bilinguals than for translators, it is possible that translators are
better able to control co-activation, due to their training and constant exposure to
both languages simultaneously.
The question remains however, whether it is not more beneficial for translators
to retain a specific source text construction if this is possible in the target structure.
Such a strategy would be cognitively less demanding than the search of an
alternative formulation. This question is addressed in the following section.
217
definition, a translation is literal when the following three literality criteria are
fulfilled:
1. Word order is identical in the ST and TT.
2. ST and TT items correspond one-to-one.
3. Each ST word has only one possible translated form in a given context.
Literality criterion 3 is of particular interest as it refers to translation entropy.
Expanding this criterion to syntactic features, we stipulate that the translations
are structurally literal if an ST sentence is translated into the target language
with a single syntactic structure by all translators in a given sample. Syntactic
entropy measures the uncertainty that different translators will produce the same
TT structure for a ST sentence. Syntactic entropy is an indicator for the literality
of translations on a syntactic level, and we introduce syntactic literality to the three
literality criteria above:
4. All translations of a given source sentence are translated into the target language
with the same syntactic structure.
Thus syntactically literal translation would be one with syntactic entropy of 0.
Using entropy measures, literality can be studied using a quantitative approach.
In line with Ivirs (1981) notion of formal correspondence, literality has been
associated with less cognitive effort than non-literal translations. Ivir (1981, 58)
describes the translation process as follows:
The translator begins his search for translation equivalence from formal correspondence,
and it is only when the identical-meaning formal correspondent is either not available or
not able to ensure equivalence that he resorts to formal correspondents with not-quiteidentical meanings or to structural and semantic shifts which destroy formal correspondence
altogether. But even in the latter case he makes use of formal correspondence.
218
S. Bangalore et al.
observed in studies involving one language for semantic representations, but more
relevant for the present purpose is that this has also been observed for structural
representations in tasks involving one language (cf. Pickering and Ferreira 2008).
In addition, there is some evidence that structural priming has also been observed in
studies involving two languages, i.e., in cross-linguistic structural priming studies
(e.g. Hartsuiker et al. 2004). These studies suggest that semantic and structural
representations are shared between languages when these are similar in the two
languages (e.g. Duabeitia et al. 2010; Bernolet et al. 2013). It is likely though,
that the mechanism underlying cross-linguistic structural priming requires a similar
construction i.e. congruent word order in both languages (Hartsuiker et al. 1999;
Hartsuiker and Westenberg 2000; Bernolet et al. 2007; Loebell and Bock 2003;
Kidd et al. 2014). If the word order of the source text can be transferred to the
translation, this can result in lower total reading times as has been shown by Jensen
et al. due to a possible automatic transfer of L1 syntax to all types of L2 processing
(Jensen et al. 2009, 333). However, there is also evidence that syntactic structures
can be primed across languages if the word order in both languages is different.
Desmet and Declercq (2006) tested a sentence completion task that showed syntactic
priming effects for relative clause attachment from Dutch to English, even though
word order restrictions such as verb final position of Dutch sentence is different
from the word order in English.
Shin and Christianson (2009) investigated priming effects of functionally equivalent dative-constructions in Korean and English with the help of a sentence
recall task. The English target sentence was presented via audio and was either a
double object or prepositional object construction. These sentences were followed
by a Korean prime either with a prepositional dative construction, post-positional
dative construction or double object construction. In their analysis, they found
evidence for an argument-order independent priming effect of post-positional dative
constructions, primed by functional correspondences, as this construction is the
functional equivalent of the canonical English prepositional dative. Similarly, Chen
et al. (2013) observed priming effects of English passive structures on Chinese
passive structures and vice versa, when participants were asked to describe a picture
after being exposed to a passive or active priming sentence in the other language.
Priming occurred despite different word orders. It is therefore possible that formal
correspondences between languages are a strong but not a necessary factor for crosslinguistic syntactic priming.
Cross-linguistic semantic priming has been associated with a facilitation effect
and structural cross-linguistic priming can thus be argued to also facilitate translation. Schoonbaert et al. (2007) found that cross-linguistic syntactic priming can be
boosted if the verb is a translation equivalent in prime and target sentence. During
translation, most words are of course translation equivalents and the relatively
modest boost observed in priming studies can be assumed to be much stronger
during translation.
Due to the nature of priming as a general psychological effect, it is to be expected
that translators are affected by a structure in a source text to a similar degree.
Translators that are thus primed by a syntactic structure, are likely to produce
219
translations with the same syntactic structure in the target language. For the measure
of syntactic variation, the logical consequence would be that lower entropy measures
are related to priming since a single translation choice with a high translation
probability can lower entropy drastically. Syntactic priming effects may depend
on several characteristics of the input, for example, a cognate verb with the same
argument frame.
220
S. Bangalore et al.
between two linguistic systems. In this sense, the copying data serves as a control
condition: if the syntactic entropy effects we observe in the translation condition
are also found in the copying task, it is likely that they represent monolingual
source-language-related processes. If, however, syntactic entropy has no effect on
behavioural measures during copying, it is likely that these effects are driven by task
and target-language-related processes.
10.4.2 Material
The translation data were extracted from the CRITT-TPR database (see Chap. 2):
(SG12 for German, KTHJ08 for Danish, BML12 for Spanish) The datasets contain
translations of the same six English source texts with the exception that the Danish
Study contains only the first three source texts. The datasets contain eye tracking
data from a Tobii T120 eyetracker, and keylogging information recorded with
Translog (Jakobsen and Schou 1999) and the resulting data was processed with
Translog II (Carl 2012) before analysis.
221
Table 10.2 Properties of the target texts of the translation and the copying condition respectively
into the four target languages: Session (number of target texts), Fdur (in hours), Kdur (in hours),
Tlen (number of target tokens)
Study
Session TL
TDA14
48
en
KTHJ08 69
da
SG12
47
de
BML12
63
es
Total
227
4
Task
C
T
T
T
2
Texts
16
13
16
16
6
Table 10.2 contains a detailed overview of the produced target texts: it indicates
the translation (Task), text copying (C), translation from-scratch (T) and participants
(Part) involved, the number of translation sessions (i.e. target texts produced),
as well as the duration and the total number of target language tokens for each
translation mode. Translation (and copying) duration is measured in two different
metrics:
Fdur: total production time for all segments, excluding pauses >200 s.
Kdur: total duration of coherent keyboard activity excluding keystroke pauses
>5 s. (in the following, we will use refer to Kdur as coherent typing activity for
ease of comprehension)
The BML12 study, for instance, contains 63 from-scratch translations which
were produced by 32 translators (participants). Each participant had to edit, post-edit
and to translate two texts in each mode, and texts were distributed in a randomized
order. As shown in Table 10.2, the translated texts together amount to 32,931 target
text words which were produced in the 227 translation sessions. Gaps of keystroke
activity for more the 200 s (almost 2.3 min) are excluded, under the assumption
that translation activities are interrupted in such instances. However, no such pauses
were observed in these studies (Fdur is a standard measure in the database and other
datasets do have pauses over 200 s).
Table 10.2 also contains information concerning the monolingual copying
condition (TDA14) which will be used to contrast the results from the data acquired
from the translation condition. A monolingual task that does not involve codeswitching of any kind should not reflect entropy measures. Note that during the
copying task 95 % of the text production time has been spent on coherent typing
(Kdur).
222
S. Bangalore et al.
10.5 Analysis
10.5.1 Annotation
A detailed description of the annotation used in this study, together with a discussion
of possible alternative annotations is available in Chap. 12. In this section, we
shortly summarize the main features.
The ST and the TT were parsed according to clause type, voice and type of
argument structure. Clause type was annotated as either an independent or dependent clause. Simple sentences as well as main clauses were tagged as independent
(I) while subclauses, were annotated as dependent (D). Voice was either annotated
as passive (P) or active (A). The third dimension captured verb-argument structures.
When the verb of the clause was subcategorized for a direct object or a complement,
it was referred to as transitive (T). When it subcategorized for a prepositional object
or no object it was labelled intransitive (I). Other argument structures considered
were ditransitive structure (D) but also clauses with empty subjects or extraposed
subjects (e.g. Es comprensible que : : : .[It is understandable that : : : ]). These cases
were tagged with (M) as in impersonal. A clause characterized as Transitive, Active
and Independent thus receives the tag TAI. Segments with multiple clauses and thus
multiple tags are merged to longer tags such as TAI_TAD representing a transitive
active main clause with a transitive and active subclause. The probabilities of the
different translations were computed on the basis of number of occurrences for each
tag.
To assess the first and the second research questions, syntactic structures in the
annotated translation data have been classified into two categories: PRIME and
DIFFERENT. We consider as PRIME every TT segment that preserves the structure
of the corresponding ST segment. The category DIFFERENT contains all segments
which show a structural change in the TT segment as compared to its corresponding
ST segment.
In addition to the original annotation, two new tags were assigned in a category
which describes the relationship between the syntactic structures in corresponding
source and target text segments. The tag PRIMED was attributed whenever ST and
TT structure were identical. The tag DIFFERENT was used whenever different
structures were used in the TT as compared to the corresponding ST segment.
The complete dataset was split up into language specific datasets. To identify
cases of priming, the target text segments were annotated in the same way as the
source text segments with the same annotation scheme as the source text. Source
and target structures were compared and categorized as either a prime if they were
the same or as different when their structures did not match (see Table 10.3). Title
segments were excluded from the analysis due to unusual grammatical properties.
This removed 10 % of the data so that 1156 observations remained for analysis.
To answer the research question on the relationship between entropy and priming
effects and cognitive effort (RQ2), behavioural translation data from the three
language pairs (English-Danish, English-German, English-Spanish) were annotated
223
Target
DAI
DAI TAD
Count
4
2
Comparison
PRIMED
DIFFERENT
Priming probability
0.5
for their syntactic structure and later jointly assessed in mixed linear models. A
monolingual copying task served as a baseline. The baseline measures, in contrast to
the translation condition, should not be affected by syntactic entropy since syntactic
entropy is driven by the TL and not the SL. This control condition will confirm
that syntactic entropy actually measures variation in translation and that it is not
due to processes related to monolingual ST comprehension. Further, controls were
integrated for the analyses of syntactic entropy by means of multivariate statistics
controlling for potential confounding factors.
224
S. Bangalore et al.
are the same as in the source text segment by the total number of translations of this
segment (see Eq. 10.4).
psyn STsyn DD TTsyn D
primed
#translations
(10.4)
A psyn of one means that all translators chose the same structure and a psyn of zero
that no translator chose the same source text structure. psyn enters the model as an
interaction effect that is modelled as the product of syntactic entropy and psyn .
The interaction effect of syntactic entropy and priming probability should be
negatively correlated with measures of total reading time for example, because
higher degrees of priming would facilitate processing and weaken the effect of
variation.
225
has been computed for source text segments and the translated target text segment
by dividing the total number of characters per segment by the number of words per
segment.
10.5.3.5 Inefficiency
Some segments may have been more prone to typographical errors and may have
undergone major restructuring efforts. Therefore, inefficiency was introduced as a
control variable. It is calculated by the number of characters produced during a
translation divided by the final amount of characters in the final translation (see also
Chap. 2).
226
S. Bangalore et al.
227
Fig. 10.1 Distribution of primed and non-primed structures in relation to entropy per languagepair
hypothesis, CrossS (see Chap. 2), i.e. the relative distortion from source text words
to target text words was modelled as a predictor of priming probability with the help
of a univariate linear regression. The model was significant with F(1, 1154) D 40.83,
p < 0.001, R2 D 0.03334.
Figure 10.2 shows that structuring effort correlates negatively with priming
probability. However, primed structures occur also in cases when the average CrossS
value exceeds the value of 1, which is the literal translation default. Structural
priming effects that occur despite congruence also corroborate studies by Chen et al.
(2013), Desmet and Declercq (2006) and Shin and Christianson (2009), who provide
evidence suggesting that word order similarity is not necessary for priming effects to
occur. However, the results clearly indicate higher chances of priming for segments
with lower to no restructuring effort. The lower entropy values for primed structures
may indicate that increased restructuring effort is eventually a source of deviation
from the syntactic representation of the source, since priming is inhibited.
228
S. Bangalore et al.
Formula
Variable
Syntactic entropy
Priming probability
CrossS
Number of clauses
Expertise
Average word length
Interaction effect
Standard error
t-value
Significance level
0.16
0.06
2.59
*
0.07
0.07
0.93
0.07
0.03
2.19
*
0.07
0.03
1.99
*
0.31
0.18
1.73
0.17
0.07
2.50
*
0.04
0.05
0.82
Syntactic Entropy turned out to be positively associated with total reading time
of the source text (Fig. 10.3). Similarly, the restructuring effort (CrossS), clause
complexity (number of clauses) and average word length of the segment displayed
positive and significant slopes. The effect for expertise was marginally significant
such that professional translators read the source faster than non-professional
translators.
The control variables displayed no unexpected behaviour, so that the model
seems to measure these dimensions quite well. The fact that no significant
229
Fig. 10.3 The effect of syntactic entropy on total reading time (source)
interaction effect could be observed for Syntactic Entropy and priming probability
nor for priming probability alone contradicts the hypothesis that the effect of entropy
on behavioural measures is further modulated by priming effects.
230
S. Bangalore et al.
Table 10.5 LMM for the effect of syntactic entropy on first fixation duration (source)
Formula
Variable
Syntactic entropy
Priming probability
Interaction effect
Standard error
t-value
Significance level
0.08
0.05
1.77
0.02
0.06
0.37
0.04
0.04
0.96
Formula
Variable
Syntactic entropy
Priming probability
Typing inefficiency
CrossS
Number of clauses
Average word length
Expertise
Interaction effect
Standard error
t-value
Significance level
0.13
0.04
3.22
**
0.05
0.04
1.36
0.52
0.02
27.56
***
0.01
0.02
0.22
0.00
0.02
0.28
0.41
0.03
14.36
***
0.29
0.15
1.96
0.03
0.05
0.49
231
Fig. 10.4 The effect of syntactic entropy on coherent typing activity (target)
232
S. Bangalore et al.
Table 10.7 LMM for the effect of syntactic entropy on total reading time (target)
Formula
Variable
Syntactic entropy
Priming probability
CrossS
Expertise
Typing inefficiency
Number of clauses
Average word length
Interaction effect
Standard error
t-value
Significance level
0.11
0.06
1.83
0.00
0.07
0.05
0.01
0.03
0.17
0.02
0.14
0.12
0.24
0.03
8.73
***
0.07
0.03
2.61
**
0.23
0.04
5.84
***
0.03
0.05
0.49
233
Formula
Spanish
German
Danish
Standard error
t-value
Syntactic entropy
0.02
0.09
0.23
Number of clauses
0.03
0.04
0.91
Average word length
0.34
0.12
2.93
Syntactic entropy
0.07
0.10
0.76
Number of clauses
0.04
0.04
1.04
Average word length
0.32
0.12
2.64
Syntactic entropy
0.07
0.10
0.63
Number of clauses
0.03
0.05
0.70
Average word length
0.44
0.19
2.37
Significance level
**
Standard error
t-value
Syntactic entropy
0.02
0.05
0.46
Syntactic entropy
0.05
0.05
0.98
Syntactic entropy
0.01
0.06
0.25
Significance level
234
S. Bangalore et al.
Formula
Spanish
German
Danish
Standard error
t-value
Significance level
Syntactic entropy
0.03
0.05
0.61
Number of clauses
0.01
0.02
0.69
Typing inefficiency
7.08
0.44
16.17
***
Average word length
0.57
0.07
8.10
***
Syntactic entropy
0.05
0.06
0.78
Number of clauses
0.01
0.02
0.77
Typing inefficiency
7.08
0.44
16.16
***
Average word length
0.56
0.07
7.67
***
Syntactic entropy
0.08
0.06
1.37
Number of clauses
0.01
0.03
0.32
Typing inefficiency
7.91
0.67
11.73
***
Average word length
0.47
0.10
4.78
***
235
Formula
Spanish
German
Danish
Standard error
t-value
Syntactic entropy
0.13
0.09
1.49
Number of clauses
0.00
0.03
0.14
Average word length
0.35
0.12
2.84
Typing inefficiency
2.94
0.73
4.04
Syntactic entropy
0.05
0.11
0.43
Number of clauses
0.01
0.03
0.34
Average word length
0.37
0.13
2.88
Typing inefficiency
2.91
0.73
4.01
Syntactic entropy
0.05
0.14
0.34
Number of clauses
0.02
0.06
0.36
Average word length
0.15
0.24
0.63
Typing inefficiency
2.28
1.25
1.82
Significance level
**
***
**
***
236
S. Bangalore et al.
It was surprising that the interaction between priming probability and syntactic
entropy was not significant. A deeper level of analysis might lead to different results
when, for example, levels of embedding and a finer analysis of the clause type
are assessed (see Chap. 12). But since the argument structure is captured in the
first dimension of the annotation scheme and subject variation is accounted for to
some degree by voice in the second dimension, priming effects due to functional
correspondences are probably reliable.
Furthermore, priming effects were hypothesized to be strongly modulated by
word order and less so by mere functional correspondence. The results for the linear
regression with CrossS confirmed the hypothesis, that congruent word order is a
strong but not a necessary condition for syntactic priming, since even higher priming
probabilities were possible when the literal translation threshold of a CrossS value
of 1 was exceeded.
While the hypothesis that priming effects are a major factor for decreased
syntactic variation could be confirmed, no significant facilitation effect could be
observed for the interaction effect of syntactic entropy and priming probability. This
may indicate that priming did not have the expected degree of influence on syntactic
variation. Other processes that regulate variation may have been underestimated.
For example, when a primed structure is incompatible with target norms, translators
may choose a different structure in order to produce a target sentence that is
compatible with target norms. If many translaters choose the same structure, the
resulting entropy value would by lowered in a similar fashion to a priming effect,
but it may not display a facilitation effect. This could be why the interaction
effect of priming probability and syntactic entropy was not behaving as expected.
Another possible explanation for this observation is that options that are primes
are monitored carefully to avoid such target language norm violations. This would
in turn lead to longer reading times. Decreased translation performance for nonnorm conforming structures has been noted by Vandepitte and Hartsuiker (2011).
In their study Dutch translators displayed difficulties when translating English SVO
structures containing inanimate subjects to Dutch when adhering to this structure.
Inanimate subjects tend to not take subject position in Dutch. A monitoring effect
may thus cancel the effect of priming in text production.
10.11 Conclusion
The results presented here corroborate the view of shared linguistic representational
structures. This chapter shows that the scope of shared linguistic representational
structures is not restricted to lexical items but extends to syntax since syntactic coactivation of multiple possible structures is reflected in longer behavioural measures
similarly to words with multiple translation alternatives. The results presented here
expand and lend further support to the literal translation hypothesis.
237
Acknowledgements This work was supported by EUs 7th Framework Program (FP7/2007-2013)
under grant agreement 287576 (CASMACAT).
References
Baayen, R. H. (2013). languageR: Data sets and functions with Analyzing linguistic data: A
practical introduction to statistics. Available at: http://cran.r-project.org/package=languageR
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). flme4g: Linear mixed-effects models
using Eigen and S4. Available at http://cran.r-project.org/package=lme4
Bernolet, S., Hartsuiker, R. J., & Pickering, M. J. (2007). Shared syntactic representations in
bilinguals: Evidence for the role of word-order repetition. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 33(5), 931949.
Bernolet, S., Hartsuiker, R. J., & Pickering, M. J. (2013). From language-specific to shared
syntactic representations: The influence of second language proficiency on syntactic sharing
in bilinguals. Cognition, 127(3), 287306.
Boada, R., Snchez-Casas, R., Gaviln, J. M., Garca-Albea, J. E., & Tokowicz, N. (2012). Effect
of multiple translations and cognate status on translation recognition performance of balanced
bilinguals. Bilingualism: Language and Cognition, 16(01), 183197.
Carl, M. (2012). Translog-II: A program for recording user activity data for empirical reading and
writing research. In LREC.
Carl, M., & Schaeffer, M. (forthcoming). Literal translation and processes of post-editing.
In Translation in transition: Between cognition, computing and technology. Amsterdam:
Benjamins.
Chen, B., Jia, Y., Wang, Z., Dunlap, S., & Shin, J.-A. (2013). Is word-order similarity necessary
for cross-linguistic structural priming? Second Language Research, 29(4), 375389.
Chesterman, A. (2011). Reflections on the literal translation hypothesis. In C. Alvstad, A. Hild,
& E. Tiselius (Eds.), Methods and strategies of process research: Integrative approaches in
translation studies (pp. 2335). Amsterdam: John Benjamins.
Desmet, T., & Declercq, M. (2006). Cross-linguistic priming of syntactic hierarchical configuration
information. Journal of Memory and Language, 54(4), 610632.
Dragsted, B. (2012). Indicators of difficulty in translation correlating product and process data.
Across Languages and Cultures, 13(1), 8198.
Duabeitia, J. A., Perea, M., & Carreiras, M. (2010). Masked translation priming effects with
highly proficient simultaneous bilinguals. Experimental Psychology, 57(2), 98107.
Eddington, C. M., & Tokowicz, N. (2013). Examining EnglishGerman translation ambiguity
using primed translation recognition. Bilingualism: Language and Cognition, 16(02), 442457.
Ehrlich, S. F., & Rayner, K. (1981). Contextual effects on word perception and eye movements
during reading. Journal of Verbal Learning and Verbal behaviour, 20(6), 641655.
Grosjean, F. (1997). The bilingual individual. Interpreting International Journal of Research and
Practice in Interpreting, 2, 163187.
Hartsuiker, R. J., Kolk, H. H. J., & Huiskamp, P. (1999). Priming word order in sentence
production. The Quarterly Journal Of Experimental Psychology, 52A(1), 129147.
Hartsuiker, R. J., Pickering, M. J., & Veltkamp, E. (2004). Is syntax separate or shared between
languages? Cross-linguistic syntactic priming in Spanish-English bilinguals. Psychological
Science, 15(6), 409414.
Hartsuiker, R. J., & Westenberg, C. (2000). Word order priming in written and spoken sentence
production. Cognition, 75(2), 2739.
Ivir, V. (1981). Formal correspondence vs. translation equivalence revisited. Poetics Today, 2(4),
5159.
Jakobsen, A. L., & Schou, L. (1999). Translog documentation. In G. Hansen (Ed.), Probing the
process in translation methods and results (pp. 136). Copenhagen: Samfundslitteratur.
238
S. Bangalore et al.
Jensen, K. T. H., Sjrup, A. C., & Balling, L. W. (2009). Effects of L1 syntax on L2 translation.
In F. Alves, S. Gpferich, & I. M. Mees (Eds.), Methodology, technology and innovation in
translation process research: A tribute to Arnt Lykke Jakobsen (pp. 319336). Copenhagen:
Samfundslitteratur.
Kidd, E., Tennant, E., & Nitschke, S. (2014). Shared abstract representation of linguistic structure in bilingual sentence comprehension. Psychonomic Bulletin & Review.
doi:10.3758/s13423-014-0775-2.
Kuznetsova, A., Christensen, R. H. B., & Brockhoff, P. B. (2014). lmertest: Tests for random and
fixed effects for linear mixed effect models (lmer Objects of lme4 Package). R package version
2.0-6. Available at http://www.cran.rproject.org/package=lmerTest/
Laxn, J., & Lavaur, J.-M. (2010). The role of semantics in translation recognition: Effects
of number of translations, dominance of translations and semantic relatedness of multiple
translations. Bilingualism: Language and Cognition, 13(02), 157.
Loebell, H., & Bock, K. (2003). Structural priming across languages. Linguistics, 41(5), 791824.
Macizo, P., & Bajo, M. T. (2006). Reading for repetition and reading for translation: Do they
involve the same processes? Cognition, 99(1), 134.
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity
for processing information. Psychological Review, 63, 8197.
Neubert, A., & Shreve, G. (1992). Translation as text. Kent, OH: Kent State University Press.
Pickering, M. J., & Ferreira, V. S. (2008). Structural priming: A critical review. Psychological
Bulletin, 134(3), 427459.
Pollatsek, A., Reichle, E. D., Juhasz, B. J., Machacek, D., & Rayner, K. (2008). Immediate and
delayed effects of word frequency and word length on eye movements in reading: A reversed
delayed effect of word length. Journal of Experimental Psychology: Human Perception and
Performance, 34(3), 726750.
Prior, A., Wintner, S., Macwhinney, B., & Lavie, A. (2011). Translation ambiguity in and out of
context. Applied Psycholinguistics, 32(01), 93111.
Pym, A. (2003). Redefining translation competence in an electronic age. Defence of a minimalist
approach. Meta: Translators Journal, 48(4), 481497.
R Development Core Team. (2014). R: A language and environment for statistical computing.
Vienna: R Foundation for Statistical Computing. Available at: http://www.r-project.org/.
Ruiz, C., Paredes, N., Macizo, P., & Bajo, M. T. (2008). Activation of lexical and syntactic target
language properties in translation. Acta Psychologica, 128(3), 490500.
Schaeffer, M. J., Paterson, K., McGowan, V. A., White, S. J., & Malmkjr K. (forthcoming).
Reading for translation.
Schoonbaert, S., Hartsuiker, R. J., & Pickering, M. J. (2007). The representation of lexical and
syntactic information in bilinguals: Evidence from syntactic priming. Journal of Memory and
Language, 56(2), 153171.
Shannon, C. E. (1951). Prediction and entropy of printed English. The Bell System Technical
Journal, 30(1), 5064.
Shin, J.-A., & Christianson, K. (2009). Syntactic processing in Korean-English bilingual production: Evidence from cross-linguistic structural priming. Cognition, 112(1), 175180.
Tokowicz, N., & Kroll, J. F. (2007). Number of meanings and concreteness: Consequences of
ambiguity within and across languages. Language and Cognitive Processes, 22(5), 727779.
Toury, G. (1995). Descriptive translation studies and beyond (Vol. 75). Amsterdam: John
Benjamins. Benjamins translation library v4.
Vandepitte, S., & Hartsuiker, R. J. (2011). Metonymic language use as a student translation
problem: Towards a controlled psycholinguistic investigation. In C. Alvstad, A. Hild, &
E. Tiselius (Eds.), Methods and strategies of process research: Integrative approaches in
translation studies (pp. 6792). Amsterdam: John Benjamins.
Wu, Y. J., & Thierry, G. (2012). Unconscious translation during incidental foreign language
processing. NeuroImage, 59(4), 34683473.
Chapter 11
239
240
M. Schmaltz et al.
11.1 Introduction
Research using Translog-II (Carl 2012) in conjunction with eye-tracking data in
translation studies (Carl and Jakobsen 2009; Jakobsen 2011; Hvelplund 2011;
Carl and Dragsted 2012; Sjrup 2013; Balling and Carl 2014; Mesa-Lao 2014;
among others) has focused on tasks involving Western European languages and
consequently alphabetical scripts. However, studies focusing on languages which
use logographic scripts are still incipient. This chapter reports on a study of fromscratch translation and post-editing tasks carried out from Portuguese into Chinese
by Chinese translators of Portuguese (L2). Drawing mostly on the methodology
used by Sjrup (2013), we carried out a study to examine gaze and key logging data
from six participants while translating and six other participants while post-editing
a 79-word news report. These data are available in CRITT Translation Process
Research Database as MS13 (translation session 16, post-editing session 18).
Building on Halliday and Hasan (1976) and Hasan (1984), referents pertaining
to the main cohesive chain of the source text (labelled chain A) were defined as our
focus of enquiry and contrasted with items in a secondary cohesive chain traceable
in the same text (labelled chain B). Our assumption was that tracking participants
(referents) in chain A would be critical for the 12 translators to build a coherent
interpretation of the source text (ST) and would require them to retrieve the identity
of what was being talked about by referring to another expression either in the cotext or the context of the situation and culture. A higher number of fixations in
eye and keyboard activities were thus expected during reading and production of
chain A. A secondary chain, in contrast, would have a lesser contribution to the ST
and TT (target text) coherence and would thus demand less attention, as well as
fewer keyboard and eye activities.
241
cohesion (more precisely, reference) and lexical cohesion are recruited in participant
tracking. By reference is meant a relationship in meaning construed through the
use of a personal reference item (personal pronoun or possessive determiner) that
enters into a semantic relationship with an item mentioned either before in the text
(anaphora) or afterwards (cataphora). Occasionally, reference is made to entities
that cannot be retrieved from the text and need to be established situationally; this is
referred to as exophoric reference. When two items share the identity of a referent,
this is termed co-referentiality. Items sharing identity can also be linked through
lexical cohesion, be that repetition, synonymy or hyponymy.
Cohesive ties, i.e. semantically bonded items, are particularly important when
they form so-called cohesive chains, responsible for strong integration of cohesive
ties and a more coherent text. A cohesive chain built on participant tracking may
be realized through co-reference or lexical cohesion categories that are valid for
language in general but that ultimately need to be interpreted in a particular text.
Thus, this type of cohesive chain is crucial to text organization and comprehension.
Conversely, secondary chains are not essential to participant tracking and are built
upon lexical relations that are not text-specific, but general to the lexicon of the
language.
When text is processed in translation tasks, Hatim and Mason (1990) argue,
translators rely both on contextual and co-textual cues in order to identify cohesive
items deemed relevant to a coherent construction of the TT. These cues can be
sought in the immediate co-text or demand integration of items that are more distant
in the text.
Cohesion has not been extensively examined in translation process research.
Denver (2009) investigated adversative-concessive logical-semantic relations in
translations from Spanish into Danish. The author found different right and
wrong choices among translators and students, but no trace of mental activity in
processing relations realized through conjunctions in Spanish, i.e. no verbalization
or keystroke, pause or revision signalling that the relations constituted translation
problems for the participants.
Angelone (2010) studied uncertainty management and metacognitive problem
solving of a professional translator, two students and a bilingual. He classified the
textual level at which the participants metacognitive activity was employed into
lexis, term, collocation, phrasal, syntax, sentential, macro level, and unclassified.
The macro level category refers, according to the author, to beyond sentence
considerations, such as cohesion, coherence, and gender. Only a small part of the
elicited verbalizations fell into the sentential and macro level categories.
Both authors relied on think-aloud protocol data; Denver also used key logging
data, and Angelone also used screen recordings. To the best of our knowledge, no
other translation process research using eye-tracking has addressed translators or
post-editors processing of cohesion in STs and TTs.
Staub and Rayner (2007) claim that many eye-tracking studies have focused on
syntactic parsing, but few have looked into how discourse processing (including
cohesion) affects eye movements in reading. Staub and Rayner (2007, 335) argue
that recognizing individual words and analyzing grammatical structures of each
242
M. Schmaltz et al.
sentence does not suffice to understand a text; the reader must also maintain a
representation of the entities and events that have been mentioned, and relate the
information that is currently being processed to this stored representation.
Basically, eye-tracking studies focusing on cohesion have so far shown an
increase in fixation times due to: long distance between an anaphor and its
antecedent (OBrien et al. 1997); antecedent being a low-frequency word (van
Gompel and Majid 2004); and reading a target word and drawing conclusions that
have not been explicitly stated in the text (OBrien et al. 1988).
11.3 Methodology
The results described in this chapter are part of a larger empirical-experimental
project carried out by the AuTema-PostEd Group, which aims at tapping into
translation and post-editing processes as a source of insight into the role of
translators understanding in task problem solving. In this chapter, we report the
results regarding the Portuguese(L2)-Chinese(L1) translation and post-editing of a
text about the China Gold Research Institute.
11.3.2 Participants
Originally, 23 professional translators performed two translation tasks (L1 into
L2, and L2 into L1) and two post-editing tasks (one in their L1 and another
one in their L2) using machine-translated (MT) input provided by the software
243
Translation
Post-editing
Translation
Subject
P03
P09
P11
P15
P19
P23
P04
P06
P07
P12
P18
P21
Mean
Mean fixation
duration (in ms)
317
366
434
439
417
365
421
299
222
309
301
465
389.67
336.17
Gaze time on
screen (in %)
78.07
91.98
93.41
82.85
82.28
86.81
89.24
82.98
63.81
82.67
49.59
83.69
85.90
75.33
1
2
Six participants Text 1 data were excluded because of technical issues related to tokenization or
alignment problems (see Sect. 11.3.6).
244
M. Schmaltz et al.
All participants provided informed consent. They were Chinese nationals and had
an undergraduate degree in Portuguese Studies or a Masters degree in ChinesePortuguese Translation Studies. All had been granted a fellowship to live in a
Portuguese speaking country, and all of them used glasses or contact lenses. None
of them had experience in post-editing. Each was paid MOP 90.00 to take part in
the experiments.
The participants were instructed to sit approximately 55 cm away from the eyetracker monitor. They were also told that they could move freely, but were told to
keep their eyes on the monitor as much as possible.
245
Table 11.2 Main chain and cohesive devices of ST (selected tokens in italics)
Type of chain
A
Tokens in ST
o volume de produo
atingiu 403 toneladas
the production volume
increased reaching 403
tons
Isso
This
O aumento
The increase
essa quantidade
such amount
O gigante asitico
The Asian giant
no pas
in the country
o torna
turns it into
na China
in China
Explicitation of
referents as
Co-reference established
through
Not applicable (first
item)
Demonstrative pronoun
The relative position and length of the selected tokens (words and noun groups) in
chain A and chain B were accounted for in our statistical analysis (see Sect. 11.4.2).
11.3.5 Task
After a brief warm-up session, which consisted of a copy test before the experiment,
each participant was asked to perform four tasks, randomly assigned to participants:
two translations (one into their L1 and another one into their L2), and two postediting tasks (one in their L1 and another one in their L2) using MT output.
Table 11.3 provides the tasks performed by each participant. The analyses in this
chapter refer to T1 and P1 highlighted in Table 11.3.
As a brief, the participants were informed that they should render texts aimed at
a target audience analogous to that of the ST. They were told to feel free to produce
the human TT or post-edit the MT text without any time constraint, but they could
not use any kind of translation aids. As they had little to no experience in postediting, we provided them with guidelines reported in Mesa-Lao (2014, 225), see
also chapters 11.3.5, 11.7, 11.8, and 13 in this volume.
After each task, the participants were requested to provide a retrospective
protocol, whereby they could explain whatever they felt like concerning their
246
Table 11.3 Task distribution
across participants
M. Schmaltz et al.
Participant
03
04
06
07
09
11
12
14
15
17
18
23
From scratch
T1 T3
T2 T4
T3 T2
T2 T3
T3 T1
T1 T3
T2 T4
T3 T2
T1 T4
T3 T1
T4 T2
T1 T3
Post-editing
P4 P2
P3 P1
P1 P4
P4 P1
P2 P4
P2 P4
P1 P3
P4 P1
P3 P2
P2 P4
P1 P3
P2 P4
Available at https://sites.google.com/site/centretranslationinnovation/translog-ii.
Basically, there are two categories of Chinese input method, i.e. phonetic readings or root shapes.
Most of these input methods can be selected directly from the control panel of MS Windows.
Available at http://pinyin.sogou.com/.
247
Fig. 11.2 Snapshot of a post-editing session showing Sogous dialog box. Note: Circle indicates
fixation
Figure 11.2 shows a snapshot of a post-editing session. In the bottom half of the
window, Sogous dialog box pops up below the line where the participant wants
to introduce new characters. A zoom-in shows that while the participant types in
pinyin6 a series of alphabetic letters, a number of options are shown out of which one
may be the desired corresponding character(s). To select the desired characters and
insert them in Translog-II, the participant presses the space bar or the corresponding
number key.
As shown in Fig. 11.2, the Chinese input system is prone to word gaze error, since
the place where pinyin is typed is not the same place where the Chinese character
is inserted. Assuming that the Sogous dialog box pops up right below the space
where the character is supposed to be inserted, we manually7 attribute the fixation
Pinyin is the official phonetic system for transcribing the Putonghua (Mandarin) pronunciations
of Chinese characters into the Latin alphabet.
The StudyAnalysis scripts include a function to refixate gaze mapping to word. However,
several translation drifts remain and need to be corrected manually.
248
M. Schmaltz et al.
Available at https://sites.google.com/site/centretranslationinnovation/translog-ii.
10
249
Fig. 11.3 Procedures to prepare Translog-II data for Study Analysis: (a) Conventional process
for Roman-alphabet languages; (b) Adapted process for the Chinese language
11
There is no immediate connection between the keystrokes and the characters that appear in the
text.
250
M. Schmaltz et al.
TTid
50
51
TToken
SToken
_fato_,_que_essa
quantidade
Ins
2
2
Del
0
0
Edit
TToken
SToken
_fato_,_que_essa
quantidade
Time
153911
153913
Dur
1
1
Pause
4353
1
Edit
51
TToken
,
SToken
,
Edit1
[
]that
amount
_fato_,_que_
essa
It is a fact, that
this
quantidade
quantitity
Time1
149153
Dur1
405
Pause1
10780
153911
4353
153913
Edit
the actual keystrokes that the participants pressed to produce the characters logged
in Translog-II. Then, we manually correct the TT tables.
We also observed that the edited units are coherently aligned to the respective
STid and TTid in most of the cases. However, because Translog-II aligns only the
initial ST and the final TT, when the characters of the MT text are deleted and/or
immediately edited (such as Edit 1 [
], which represents the deletion of [that
amount] in TTid49 , Table 11.6), the system does not identify which words they were
originally part of (a part of Edit 1 in TTid49 should belong to TT50
[that], and
another part [amount] should belong to TT51 ). To account for this, we check all
Edit1 and Edit2 actual operations to identify the actual ST and TT tokens (STokens
and TTokens).
In sum, a substantial part of our production analysis was built on manually
processed data. To ensure quality, all manually extracted data were double-checked.
These spreadsheets are available in TPR-DB.
251
12
252
M. Schmaltz et al.
Four fixed effects were used in the analysis of both eye and keyboard movements
as proxies for processing of chains A and B in both ST and TT:
Token Length: The length in characters of the ST and TT tokens of chains A
and B was expected to affect the dependent variable, as longer words generally
receive longer fixations than shorter words (e.g. Rayner 1998; Hyn et al. 2003;
Staub and Rayner 2007);
Token Position: The position of the ST and TT tokens of chains A and B in the
text was expected to have an effect on the participants gaze behaviour. It may be
due to fatigue (e.g. Rayner 1998; Balling 2008, 2013) and/or to a priming effect
(Rayner 1998: 390; Staub and Rayner 2007: 331), which, based on Halliday and
Hasan (1976), may imply that the beginning of a text deserves more attention
because it will determine the understanding of the remaining of the text and it
will have items that will serve as referents for items further in the text (see also
Chap. 9);
Token Unigram Frequency: Readers are expected to fixate longer on lowfrequency words than on high-frequency words (e.g. Rayner et al. 2005; Rayner
1998). The Corpus of Portuguese13 and the Corpus of the Peking University
Center for Chinese Linguistics14 were used to measure frequency;
Token Trigram Probability: High predictability of word association was expected
to have an impact on processing effort (Frisson et al. 2005). The variable was
computed following McDonald and Shillcock (2003, 650) and considering the
selected token and the two preceding tokens as they occur in the text (Balling
2013).
Two variables were added to the analysis of TT Token production, namely:
Token Character Count: sum of insertions and deletions of TT Token (see
Sect. 11.3.6.3); and
Correctness of Token in the Chain: TT Tokens were assessed as right or wrong in
lieu of the ST Tokens; right TT Tokens were assumed to be as instance in which
patterns of lexical cohesion in texts are maintained, subject to the constraints of
particular text norms in particular languages (Hatim and Mason 1990, 200).
Two other fixed effects were included to directly answer our research questions:
Task: to investigate the effect of translating or post-editing on ST and TT
comprehension and on TT production; and
Type of Chain: to investigate the effect of chain A and chain B on ST and TT
comprehension and on TT production.
All continuous variables were naturally logarithmically transformed.
13
Available at http://www2.lael.pucsp.br/corpora/bp/.
14
Available at http://ccl.pku.edu.cn:8080/ccl_corpus/index.jsp.
253
Variable
Token length
Token position
Token trigram probability
TrtS
p
p
p
FixS
p
p
p
FPDurS
TrtS
1500
500
M. Schmaltz et al.
2500
254
1.0
1.5
2.0
FixS
3 4 5 6 7 8 9
Length
1.0
1.5
2.0
1500
500
TrtS
2500
Length
1.0
2.0
3.0
4.0
FixS
3 4 5 6 7 8
Position
1.0
2.0 3.0
Position
4.0
type of chain had no significant impact on the eye movements on the ST and the text
was relatively short, the results seem to indicate a priming effect: the initial items
in the chains receive more attention from participants because the beginning of the
text is crucial for their orientation in order to understand the entire text (Halliday
and Hasan 1976).
TrtS
1500 2500
500
255
0.00
0.03
0.06
FixS
3 4 5 6 7 8 9
Trigram
0.00
0.03
0.06
FPDurS
Trigram
0.00
0.03
0.06
Trigram
256
M. Schmaltz et al.
FixT
p
FPDurT
10 12
TrtT
p
p
Variable
Token Length
Token unigram frequency
Type of chain (A)
FixT
1.0
2.0
3.0
500
TrtT
1500 2500
Length
0.000
0.010
Unigram
2500
TrtT
500
1500
257
B
Type
Dur
4000
0
M. Schmaltz et al.
10000
258
1.0
2.0
3.0
Dur
4000
10000
CharCount
Dur
4000
10000
Chain
A
Type
As for question 1, our results show a significant effect of the type of cohesive
chain on eye movements on the TT, which let us infer that it is cognitively more
demanding to produce a chain built on participant tracking when it comes to the TT,
but no significant effect was observed for the metrics related to the ST.
We expected that the results were also significant for the ST, especially because
we assumed that keeping track of participants in the main chain of a text would
be challenging both to understand the ST and to produce the TT. However, we
should be aware that both translation and post-editing involve transiting gaze from
ST to TT, and therefore, the reading of the TT may be closely connected to the
understanding of the ST. Future studies should include transiting from ST to TT and
vice-versa as also a measure of effort and investigate if ST and TT comprehension
should be addressed as one single event. Measures of global processing should also
be developed and tried out to account for relationships between pieces of text
information that span relatively long distances in a text (Hyn et al. 2003, 314),
especially across sentences.
259
Furthermore, in a larger scale study involving the four tasks for which we
collected data, we intend to check if the order in which the task was carried out
had an effect on the participants processing. As we collected all data from each
participant on the same day, there might have been an effect of fatigue on the results
(e.g., the total time of the last sessions seems to be shorter than that of the first
sessions).
As for question 2, the results pointed to no significant impact of type of task
on the measures that we assessed in this chapter. We have two potential nonmutually exclusive explanations for this. One reason might be experiencenone of
the participants had PE experience. Another factor might have been that our design
is between subjects, so differences in the results may have to do with differences
in the groups. The other potential explanation is that the sample size is small.
Considering previous studies that do show significant differences between postediting and translation (e.g. Balling and Carl 2014; Mesa-Lao 2014), it is possible
that either the impact of the type of task is on the transitions across both ST and TT
areas or that type of task has an impact on the global processing of the entire text,
rather than only on particular cohesive chains. A third possibility concerns to MT
quality, as reported in the protocols, participants found the MT text ambiguous at
some points and exophoric reference to what MT tokens refer also lacked.
To address some of the aforementioned limitations, in future work we intend to
analyse more than one task and use a between subject design, which will allow us
to have data for participants that both translated and post-edited. We also intend to
compare the tasks considering the entire text and a larger volume of data (including
four texts). Following Alves et al. (2014), we also intend to perform a more finegrained analysis by qualitatively examining the renditions and their processing as
shown, for instance, in scan paths.
Besides answering the research questions, the alternatives we had to come up
with in order to cope with limitations to process Chinese language data are also a
contribution of the present chapter. We hope that the procedures we reported herein
contribute to facilitating further studies involving the Chinese language and that
our results awake the interest of new scholars to approach language pairs other
than those involving only alphabetic scripts. Although the tendency to use the same
language pairs and scripts may have methodological advantages (e.g., one language,
English, is kept as a standard for comparisons sake, and blank spaces undoubtedly
delimit words), not only does it overlook the insights that other language pairs and
scripts may add to understanding (non-)language and (non-)script specific cognitive
aspects of post-editing and translation, but it also prevents future generalizations
based on a comprehensive body of research encompassing multiple languages,
language pairs, and scripts.
Acknowledgment The results reported in this chapter are part of a project sponsored by
University of Macau Research Grant AuTema-PostEd MYRG058 (Y1-L1)-FSH12-ALL, and
carried out with the kind cooperation of the Centre for Research and Innovation in Translation
and Translation Technology (CRITT), at Copenhagen Business School, Denmark, the Laboratory
for Experimentation in Translation (LETRA), at Federal University of Minas Gerais, Brazil (grants
CNPq 307964/2011-6, and FAPEMIG SHA/PPM-00170-14), and the Translation Lab at Federal
260
M. Schmaltz et al.
University of Uberlndia, Brazil (grant CNPq 461054/2014-0). The authors are very grateful to the
editors for their valuable comments.
Table 11.10 Fixed effects in the analysis of ST and TT tokens total reading time, and total
production time with estimated effects size, standard error, t- and p-values
Variable
Estimate
Total reading time on ST Token
Intercept
7:750364
Log token length
0:359864
Log token position
0:223413
Log trigram probability
22:56563
Task (translation)
0:269650
Type of chain (A)
0:008212
Number of fixations on ST Token
Intercept
1:83671
Log token length
0:36931
Log token position
0:16169
Log trigram probability
12:67565
Task (translation)
0:30240
Type of chain (A)
0:01305
First pass reading time on ST Token
Intercept
5:61694
Log trigram probability
1:39461
Task (translation)
0:18261
Total reading time on TT Token
Intercept
7:2591
Log frequency (poly. 1)
2:6958
Log frequency (poly. 2)
4:7131
Task (translation)
0:2772
Type of chain (A)
0:4325
Number of fixations on TT Token
Intercept
1:2405
Log length
0:3549
Type of chain (A)
0:3347
TT Token total production time
Intercept
4:56944
Log character count
1:29942
Correctness of token (wrong)
0:51087
Type of chain (A)
0:20504
Std. error
0.310935
0.092903
0.062151
3.176764
0.301090
0.149548
24:926
3:874
3:595
7:103
0:896
0:055
<2e-16
0.000164
0.000450
5.78e-11
0.387272
0.956287
0.21923
0.07098
0.04754
2.43980
0.19646
0.11462
8:378
5:203
3:401
5:195
1:539
0:114
1.34e-11
6.90e-07
0.000879
7.18e-07
0.148426
0.909544
0.09665
0.50621
0.13136
58:117
2:755
1:390
<2e-16
0.00664
0.18839
0.2175
1.1700
1.1569
0.2543
0.2063
33:371
2:304
4:074
1:090
2:097
<2e-16
0.0229
8.18e-05
0.2980
0.0381
0.2820
0.1034
0.2449
4:399
3:431
1:367
0.000176
0.000806
0.219303
0.19611
0.07501
0.13583
0.09570
23:300
17:323
3:761
2:142
<2e-16
<2e-16
0.000265
0.034301
261
Table 11.11 Random effects in the analysis of ST and TT comprehension and production time
Variable
Total reading time on ST Token
Random factor
ST Token
Participant
Residual
ST Token
Participant
Residual
SToken
Participant
Residual
TT Token
Participant
Residual
TT Token
Participant
Residual
TT Token
Participant
Residual
Intercept/level
Intercept
Intercept
Intercept
Intercept
Intercept
Intercept
Intercept
Intercept
Intercept
Intercept
Intercept
Intercept
Standard deviation
0
0.5097
0.6155
0
0.3244
0.4712
1.565e-08
1.848e-01
5.045e-01
0
0.2783
1.1424
0.2393
0
0.8528
0
0.2401
0.4782
References
Alves, F., Pagano, A. S., & da Silva, I. A. L. (2014). Effortful text production in translation.
Translation and Interpreting Studies, 9(1), 2551.
Angelone, E. (2010). Uncertainty, uncertainty management, and metacognitive problem solving
in the translation task. In G. M. Shreve & E. Angelone (Eds.), Translation and cognition (pp.
1740). Amsterdam: John Benjamins.
Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to Statistical Using R.
Cambridge: Cambridge University Press.
Balling, L. W. (2008). A brief introduction to regression designs and mixed-effects modelling by
a recent convert. In S. Gpferich, A. L. Jakobsen, & I. Mees (Eds.), Looking at eyes: Eye-
262
M. Schmaltz et al.
tracking studies of reading and translation processing (Copenhagen studies in language, Vol.
36, pp. 175192). Frederiksberg: Samfundslitteratur.
Balling, L. W. (2013). Reading authentic texts: What counts as cognate? Bilingualism: Language
and Cognition, 16(3), 637653.
Balling, L., & Carl, M. (2014). Production time across language and tasks: A large-scale analysis
using the CRITT translation process database. In J. Schwieter & A. Ferreira (Eds.), The
development of translation competence: Theories and methodologies from psycholinguistics
and cognitive science (pp. 239268). Cambridge: Cambridge Scholar Publishing.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). lme4: Linear mixed-effects models using
Eigen and S4. R package version 3.1.2. Available at http://CRAN.R-project.org/package=lme4
Bell, R. T. (1991). Translation and translating: Theory and practice. London: Longman.
Carl, M. (2012). Translog-II: A program for recording user activity data for empirical reading and
writing research. In Proceedings of the eighth international conference on language resource
and evaluation (pp. 41084112). Istanbul: European Language Resources Association.
Carl, M. (2013). Feature representation in the translation process research DB. In R. Bonk, V.
Alabau, M. Carl, & P. Koehn (Eds.), D5.3: Beta release of Casmacat workbench. Available at
http://www.casmacat.eu/uploads/Deliverables/d5.3.pdf
Carl, M., & Dragsted, B. (2012). Inside the monitor model: Process of default and challenged
translation production. Translation: Corpora, Computation, Cognition, 2(1), 127145. Special
issue on the Crossroads between Contrastive Linguistics, Translation Studies and Machine
Translation.
Carl, M., & Jakobsen, A. L. (2009). Towards statistical modelling of translators activity data.
International Journal of Speech Technology, 12(4), 125138.
Denver, L. (2009). Unique items in translation. In S. Gpferich, A. Jakobsen, & I. Mees (Eds.),
Behind the mind: Methods, models and results in translation process research (pp. 125148).
Copenhagen: Samfundslitteratur.
Frisson, S., Rayner, K., & Pickering, M. (2005). Effects of contextual predictability and transitional
probability on eye movements during reading. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 31(5), 862877.
Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. New York: Longman.
Hasan, R. (1984). Coherence and cohesive harmony. In J. Flood (Ed.), Understanding reading
comprehension: Cognition, language and the structure of prose (pp. 181219). Newark, DE:
International Reading Association.
Hatim, B., & Mason, I. (1990). Discourse and the translator. New York: Longman.
Hvelplund, K. T. (2011). Allocation of cognitive resources in translation: An eye-tracking and
key-logging study. Published PhD thesis, Copenhagen Business School, Copenhagen.
Hyn, J., Lorch, R. F., Jr., & Rinck, M. (2003). Eye movements measures to study global text
processing. In J. Hyn, R. Radach, & H. Deubel (Eds.), The minds eye: Cognitive and applied
aspects of eye movement research (pp. 313334). Amsterdam: North-Holland.
Jakobsen, A. L. (2011). Tracking translators keystrokes and eye movements with Translog. In C.
Alvstad, A. Hild, & E. Tiselius (Eds.), Methods and strategies of process research: Integrative
approaches in translation studies (pp. 3755). Amsterdam: John Benjamins.
Leong, K. S., Wong, F. D., Tang, C. W., & Dong, M. (2006). CSAT: A Chinese segmentation and
tagging module based on the interpolated probabilistic model. In Z. H. Yuan & M. W. Yao
(Eds.), Computational methods in engineering and science (pp. 10921098). Sanya: Tsinghua
University Press/Springer.
[Modern Standard Chinese Dictionary]. Beijing:
Li, X. J. (Ed.). (2010).
Foreign Language Teaching and Research Press.
Li, X. S., Grimes, S., & Strassel, S. (2009). Linguistic data consortium. Guidelines for ChineseEnglish word alignment, version 4.0. Philadelphia, PA: Linguistic Data Consortium. Available
via https://catalog.ldc.upenn.edu/docs/LDC2012T16/GALE_Chinese_alignment_guidelines_
v4.0.pdf
McDonald, S. A., & Shillcock, R. (2003). Eye movements reveal the on-line computation of lexical
probabilities during reading. Psychological Science, 14(6), 648652.
263
Mesa-Lao, B. (2014). Gaze behavior on source texts: An exploratory study comparing translation
and post-editing. In S. OBrien, L. W. Balling, M. Carl, M. Simard, & L. Specia (Eds.),
Post-editing of machine translation (pp. 219245). Newcastle upon Tyne: Cambridge Scholar
Publishing.
Mossop, B. (2003). An Alternative to Deverbalization. http://www.yorku.ca/brmossop/
Deverbalization.htm
OBrien, E. J., Raney, G. E., Albrecht, J. E., & Rayner, K. (1997). Processes involved in the
resolution of explicit anaphors. Discourse Processes, 23, 124.
OBrien, E. J., Shank, D. M., Myers, J. L., & Rayner, K. (1988). Elaborative inferences during
reading: Do they occur on-line? Journal of Experimental Psychology: Learning, Memory, and
Cognition, 14, 410420.
Rayner, K. (1998). Eye movements in reading and information processing. Psychological Bulletin,
124(3), 372422.
Rayner, K., Li, X., Juhasz, B. Z., & Yan, G. (2005). The effect of word predictability on the eye
movements of Chinese readers. Psychonomic Bulletin & Review, 12(6), 10891093.
Sjrup, A. C. (2013). Cognitive effort in metaphor translation: An eye-tracking and key-logging
study. Published thesis, Copenhagen Business School, Copenhagen.
Staub, A., & Rayner, K. (2007). Eye movements and on-line comprehension processes. In G.
Gaskell (Ed.), The Oxford handbook of psycholinguistics (pp. 327342). Oxford: Oxford
University Press.
Tian, L., Wong, F., & Chao, S. (2011). Word alignment using GIZACC on Windows. Machine
Translation Summit, 13, 369372.
van Gompel, R. P. G., & Majid, A. (2004). Antecedent frequency effects during the processing of
pronouns. Cognition, 90, 255264.
Wong, D. F., & Chao, L. S. (2010). PCT: Portuguese-Chinese machine translation systems. Journal
of Translation Studies, 13(12), 181196.
Zang, C. L., Liversedge, S. P., Bai, X. J., & Yan, G. (2011). Eye movements during Chinese
reading. In S. P. Liversedge, I. D. Gilchrist, & S. Everling (Eds.), The Oxford handbook of
eye movements (pp. 961978). Oxford: Oxford University Press.
Zeng, X. D., Wong, D. F., Chao, S., & Trancoso, I. (2013). Graph-based semi-supervised model
for joint Chinese word segmentation and part-of-speech tagging. In Proceedings of the 51st
annual meeting of the association for computational linguistics (ACL 2013) (pp. 770779).
Sofia, Bulgaria: Association for Computational Linguistics.
Chapter 12
Abstract The present chapter compares and evaluates the merits of three recent
studies dealing with the cognitive processes of structuring information in translations. The studies differ in taking a syntactic, a functional and a conceptual
approach respectively. Correlation between structuring operations in translation
and cognitive effort is found to be higher when a conceptual relevance-theoretic
approach is taken, yet the results are somewhat inconclusive due to weaknesses in
the operationalization of the relevance theoretic concept of procedural information.
The syntactic parsing approach would also be improved by a more fine grained
analysis. Functional categories as well as reallocation measures are found to be
relevant for a more precise understanding of the effort related to structuring
operations in translation.
Keywords Translation effort Target text structuring Re-distribution Syntactic vs conceptual approach Information structure
12.1 Introduction
Beyond choosing adequate lexical items for a target text, translators have to decide
on a proper structure in their translation. Sometimes the structuring involves a
pure mapping of the source text syntax into the target sentence string, with slight
modifications on account of regular syntactic differences in the relevant language
pair, but in most cases (Thunes 1998, 2011)1 the translators have to or choose
to restructure the information given in the source text. It may be assumed that
B. Behrens ()
Department of Literature, Area Studies and European Languages, University of Oslo, Oslo,
Norway
e-mail: bergljot.behrens@ilos.uio.no
Springer International Publishing Switzerland 2016
M. Carl et al. (eds.), New Directions in Empirical Translation Process Research,
New Frontiers in Translation Studies, DOI 10.1007/978-3-319-20358-4_12
265
266
B. Behrens
these restructuring operations are lexically motivated, in that the chosen target
word or phrase comes with a different syntactic frame, they may be information
structurally motivated in order for the target phrase to get the right focus, or it
may be that the translator performs an unpacking of a source phrase only to repack the information in a more implicit or a more explicit form, possibly involving
a complete redistribution of the information in the source. One assumption in
cognitive translation studies is that the more alternatives the translator entertains
before selecting her target expression, the more demanding the translation. Campbell (2000) hypothesized that multi-translation data, i.e. translations of the same
source text by a number of translators, can be used to draw inferences about
the cognitive processes during translation. His Choice Network Analysis (CNA)
postulates that the more options and the more complex choices a translator has
to consider, the more effortful is the translation of a particular item. Various
measures of translation effort have been proposed to test this hypothesis, and
different approaches have been suggested to isolate the relevant kinds of unit a
translator considers. Among them, three papers (Dragsted (2012) and Carl and
Schaeffer (forthcoming), see also Chap. 9) focus on the lexicon and the effect of
target text variation on translator behavior. These studies demonstrate a significant
correlation between reading times and the number of target lexical options available
for a particular source word, indicating that translators entertain target alternatives
already during reading the source text. Similarly, studies are beginning to appear
that report on the cognitive effort of structuring translation segments (Chap. 10;
Alves and Gonalves 2013). The present paper takes up questions pertaining to the
operationalization of structuring mechanisms and their relevance to the measure
of cognitive load in translation. This involves two issues: What are the relevant
(re-)structuring mechanisms in translation and how do we relate them to translation
behavior?
The paper is structured into four parts. After presenting the types of measure used
in translation process research on cognitive load (Sect. 12.2) and the assumptions
forming the background for the studies to be discussed here (Sect. 12.2.1), the paper
assesses three different analyses of structuring operations in translation (Sect. 12.3).
Section 12.3.1 takes up in detail the merits and problems with a study in which
shallow syntactic annotations form the basis of analysis, Sect. 12.3.2 discusses
an alternative annotation system which makes use of a more complex syntactic
annotation including functional categories. Section 12.3.3 assesses the approach
by which the relevance theoretic notions of procedural and conceptual encodings
are operationalized to investigate the cognitive load of structuring information in
translation. In the final remarks in Sect. 12.4, an information structural approach to
contrastive translation studies is suggested as a way ahead to get at the structuring
mechanisms that involve cognitive translation load.
267
This does not mean, of course, that the translator does not go back to reading the source text while
editing. The TPR-DB shows that often ST reading and TT writing occur concurrently (see Chap.
9).
268
B. Behrens
different word. This is interpreted as an indication that the subjects actually consider
target lexical alternatives in the mind while selecting a final target word, and that the
more alternatives are considered, the more effortful is the selection process. Carl and
Schaeffer go one step further by weighting such alternatives and apply weighting
measures on a larger set of data. The relative weighting of alternatives across a large
set of translation data collected on the same set of texts across several languages
is quantified and measured in what they call translation entropy: a measure of the
effect of an items relative likelihood to occur (Shannon 1951).4 If the likelihood
of a choice is small, i.e., there are many different translations to choose from,
then the entropy is high. The cognitive effort of selecting a translation is deemed
high when there are many equally likely alternatives to choose from. On the same
account, translation should be facilitated when there are only one or two options,
i.e., when the entropy is low. Their hypothesis is confirmed: The correlation between
entropy values and reading times was high. High (weighted) variation in the target
texts correlates with high source text and target text total reading times, measured
in means across participants, text and language combinations per character. The
conclusion drawn from the study is that translators activate and entertain several
translation options (consciously or sub-consciously) while reading.
The interesting correlations found on the lexical level in the above mentioned
studies have triggered questions relating to whether this correlation would carry
over into structural choice in translation.
4
5
This does not imply that syntactic priming cannot also affect lexical choice (see Chap. 10 for the
study on syntactic priming).
269
The studies from which the data was taken: SG12 for German, KTHJ08 for Danish, and BML12
for Spanish, for a description of these studies, see Chap. 2.
270
B. Behrens
(It was only attention from other hospital staff that got stopped him and the
murders)
(impersonal active independent: MAI, transitive active dependent:
TAD D MAI-TAD)
d. ES: Solo el hecho de que el personal reparara en ello pudo parlarle los pies y
detener los asesinatos. (TAD-DAI-TAI)
(Only the fact that the personnel noticed him could stop his feet and end the
murders)
Valency (transitive(T), intransitive(I), ditransitive(D), impersonal(M)), Voice
(active(A), passive(P)) and Clause Type (dependent(D), independent(I)) mark a
triplet of syntactic features for each clause. The example shows that some translations retain the structure of the source segment, while others are more expansive,
including a combination of clauses.
The annotation system allows us to see the variation in syntactic constellations
for each language, as per translator. In Spanish, for example, the source segment
in (1) yields several structures, alternating between the TAD-DAI-TAD (as in (1)),
a simple active ditransitive (DAI) and an MAD-DAI combination. In Danish, the
same segment shows over 10 different options, from a simple TAI structure or a TPI
structure, to embedded structures of four clauses of various kinds (MAI-TAD-IADTPD or MAI-TAD-TPD-IAD). The syntactic entropy value is computed on the basis
of each syntactic forms likelihood to occur, and then correlated with the translators
reading time on the source segment and the target segment.
The merit of the annotation system is that it captures clause-level syntactic
features that are applicable across all the languages in the data set, which ensures
comparability, and makes it possible to study syntactic variability on a much larger
size corpus than we generally find in the translation process literature. This has not
been done before. Its weakness is that it may be too general to capture the structural
alternatives that correlate with the more demanding tasks, whether language specific
or across target languages. For a better understanding of the choices available to a
translator at a given point in a text, one would need a more fine-grained framework,
although also one general enough to allow for comparison across the languages
under study.
Structural choice involves a variety of operations that one would expect the
translator to entertain and find difficult to decide on, such as category changes on the
phrase level and the morpho-syntactic level as well as other syntactic restructurings
and redistributions of information. Such choices may be driven by cross-linguistic
differences at various levels, and may even be triggered by information structural
and/or functional cues in the source texts that the translator makes use of to infer
meanings that are only implicitly expressed in the source.
In the following some examples are looked into in more detail for an evaluation
of factors the system can capture and factors that will be overlooked by it.
Phrase level encodings my cause effortful restructuring operations that are not
captured by our annotation and thus not reflected in the analysis. Translators
introduce a variety of changes, such as shifts in grammatical functions. One example
271
is the translation of a source text complex compound and its Danish translation:
(2) a. ST: To make matters worse, escalating prices are racing ahead of salary
increases, especially those of nurses and : : : , who have suffered from
the governments insistence that those in the public sector have to receive
below-inflation salary increases. (TAD)
b. DA: at de offentlig ansattes lnstigninger skal ligge under inflasjonsraten.
(IAD)
(that the public-sector employees salary increases shall lie under the
inflation rate)
The source text has a syntactic structure of four clauses: the main clause follows
a context connecting sub-clause, the apposition following the main clause is not
registered in our system since it is not a clause, the subsequent relative clause picks
up the referents of the apposition as subject, and the final clause of the sentence
functions as a complement to a nominalization in the prepositional adjunct: MAD,
IAI, IAD, TAD.
The object of the last clause, a complex compound, is unpacked and redistributed
into other syntactic functions in the Danish translation: the head of the syntactic
object salary increases is made the head of the syntactic subject in the translation,
while the modifier is partly recategorized into a verb, and partly encoded in a
prepositional phrase. Such unpackings and re-allocations of information are thought
to be cognitively demanding. The difference between the source and target structure
in this clause is annotated as a change from a transitive to an intransitive structure
in our system, which hardly reflects the many restructuring operations that have
taken place, also syntactically. Although the changes are indirectly reflected in the
annotation from a TAD structure to an IAD structure, and thus count as a variant in
the entropy computation, the analysis obscures the many translation operations the
translator has coped with.
Another type of change not reflected in the syntactic analysis is metaphorization
as a re-categorization procedure.
Consider the Spanish translation in the following segment:
(3) a. ST: His withdrawal comes in the wake of fighting flaring up again in Darfur
and is set to embarrass China, which has sought to halt the negative fallout
from having close ties to the Sudanese government.
b. ES: Su retirada ha coincidido con una nueva intensificacin armada en Darfur
y sin duda significar para China una mella pblica. China a su vez ha
realizado un intento de no cortar los estrechos lazos que le unen al gobierno
del Sudan.
(His withdrawal has coincided with a new military intensification in Darfur
and no doubt will signify for China a public dent. China, in turn, has effected
an intent not to reduce the close ties that unites it to the Sudanese government)
The example is a case of irregular, complex re-categorization found in the
Spanish dataset P05_T3, (segment 3). The clausally postmodified nominalization
fighting flaring up again in Darfur is translated as a noun phrase una nueva
272
B. Behrens
Segment
1
2
3
4
5
GazeT/TokS
P05
P08
6766
8942
2243
3489
4136
1848
1780
1719
976
1076
P07
2553
1258
1044
410
1132
P02
2847
2053
1477
2206
1328
P09 has misunderstood the segment, so her solution is irrelevant for my purpose here.
273
274
B. Behrens
275
276
Table 12.2 Edits on
procedural and conceptual
encodings in Alves and
Gonalves (2013)
B. Behrens
Type of edits in A&G (2013)
Typos (t)
Completions (c)
Lexical (l)
Morphosyntactic (m)
Complex phrasal (p)
SUM l C p (CE)
SUM m C p (PE)
of a word to be typed (c) to lexical edits (l), morphosyntactic edits (m) and complex
phrasal structures (p). The edits are then related to whether the unit is a procedural
or a lexical encoding or both.
Annotation of procedural and conceptual encodings is not clear cut, certainly.
The function of procedural expressions is to activate procedures whose main
function is to help the hearer understand an utterance by finding the intended combination of context, explicit content and cognitive effects. In the traditional account
of Relevance Theory, procedural encodings do not contribute to the truth conditions
of an utterance, but trigger the derivation of implicatures relating to the meaning
meant to be conveyed by the speaker. Classical examples of linguistic categories
encoding procedural information are discourse connectives and conjunctions, and
we may add focus particles and other function words that are conceived of as
presupposition triggers in classical semantics. Conceptual encodings, on the other
hand, are lexical words such as nouns, verbs, adjectives and adverbs, used to convey
concepts that are extendable to propositions, which denote truth conditions. The
distinction is still a matter of debate. The parallelism between the truth-conditional
vs the non-truth-conditional distinction and the conceptual/procedural distinction is
given up on a number of accounts, and there is furthermore an indication that lexical
categories also carry procedural information (Wilson 2011). Analyzing translation
units according to the distinction is therefore still a challenge. Alves and Gonalves
are well aware of the problem. They solve it by annotating TUs with complex
phrasal structure edits (p) as an overlap category, belonging to both conceptual
encodings (CE) and procedural encodings (PE). On this measure they find that
overall, editing procedures are significantly higher on PEs than on CEs. The overall
means in their study is repeated in Table 12.2 for an overview:
12.3.4 Conclusion
Syntactic operations as well as procedural encoding operations are likely involved in
the cognitive task of structuring information in translation. According to the results
of the studies reported on in this paper, procedural encoding seems to be a stronger
indicator of higher processing effort than shallow syntactic annotation can bring out.
277
The few examples that have been provided in the present paper, demonstrate that
structuring operations go beyond syntax; they include a redistribution of content
within phrases and clauses which is not captured by the syntactic measures alone,
and which are not clearly defined as procedural encodings in the literature. As a
final note, I would add information structural constraints to procedural information,
since they clearly inform the hearer about how to update the message with context.
If basic information structural markers can be annotated, they should be included
among the procedural encodings.
An analogous translation, in Dohertys view, is one which retains high similarity of form at
every level. Grammatically acceptable analogous translations are seen as the starting point for
the translators search for an optimal translation (Doherty 2002: 166).
278
B. Behrens
References
Allott, N. (2013). Relevance theory. In A. Capone, F. Lo Piparo, & M. Carapezza (Eds.),
Perspectives on pragmatics and philosophy. Berlin: Springer. 12 pp.
Alves, F., & Gonalves, J. L. (2013). Investigating the conceptual-procedural distinction in the
translation process. Target, 25(1), 107124.
Alves, F., Pagano, A., Neumann, S., Steiner, E., & Hansen-Schirra, S. (2010). Translation units and
grammatical shifts. In G. Shreve & E. Angelone (Eds.), Translation and cognition. Amsterdam:
Benjamins.
Behrens, B. (2014). Nominalization: A case study of linguistic text conventions in comparable and
parallel texts: English and Norwegian. In G. Ebeling, K. Hauge, & D. Santos (Eds.), Corpusbased studies in contrastive linguistics. Oslo Studies in Language, 6(1), 143160.
Blakemore, D. (1987). Semantic constraints on relevance. Oxford: Blackwell.
Campbell, S. (2000). Choice network analysis in translation research. In M. Olohan (Ed.),
Intercultural faultlines (pp. 2942). Manchester: St. Jerome.
Carl, M., & Schaeffer, M. (forthcoming). Literal translation and processes of post-editing.
In Translation in transition: Between cognition, computing and technology. Amsterdam:
Benjamins.
Doherty, M. (2002). Language processing in discourse: A key to felicitous translation. London:
Routledge.
Dragsted, B. (2012). Indicators of difficulty in translation: Correlating product and process data.
Across Languages and Cultures, 13(1), 8198.
Hansen-Schirra, S., Neumann, S., & Steiner, E. (eds.) (2012). Cross-linguistic corpora for the study
of translations. Insights from the language pair English-German. W de Gruyter.
Hartsuiker, R. J., Pickering, M. J., & Veltkamp, E. (2004). Is syntax separate or shared between
languages? Cross-linguistic syntactic priming in Spanish-English bilinguals. Psychological
Science, 15(6), 409414.
Jakobsen, A. L. (2011). Tracking translators keystrokes and eye movements with Translog. In C.
Alvstad, A. Hild, & E. Tiselius (Eds.), Methods and strategies of process research. Integrative
approaches in translation studies. Amsterdam: Benjamins.
Jakobsen, A. L., & Jensen, K. T. H. (2008). Eye movement behavior across four different types of
reading task. In S. Gpferich, I. M. Mees, & A. Lykke Jakobsen (Eds.), Looking at eyes. Eyetracking studies of reading and translation processing (Vol. 36, pp. 103124). Copenhagen:
Samfundslitteratur. special issue of Copenhagen Studies in Language.
Jensen, K. T. H., Sjrup, A. C., & Balling, L. W. (2009). Effects of L1 syntax on L2 translation.
In F. Alves, S. Gpferich, & I. M. Mees (Eds.), Methodology, technology and innovation in
translation process research: A tribute to Arnt Lykke Jakobsen (pp. 319336). Copenhagen:
Samfundslitteratur.
Ruiz, C., Paredes, N., Macizo, P., & Bajo, M. T. (2008). Activation of lexical and syntactic target
language properties in translation. Acta Psychologica, 128(3), 490500.
Shannon, C. E. (1951). Prediction and entropy of printed English. The Bell System Technical
Journal, 30(1), 5064.
Thunes, M. (1998). Classifying translational correspondences. In S. Johansson & S. Oksefjell
(Eds.), Corpora and cross-linguistic research: Theory, method, and case studies (pp. 2551).
Amsterdam: Rodopi.
Thunes, M. (2011). Complexity in translation. PhD thesis forthcoming to the University of Bergen,
Norway.
Wilson, D. (2011). The conceptual-procedural distinction: Past, present and future. In V. EscandellVidal, M. Leonetti, & A. Ahern (Eds.), Procedural meaning: Problems and perspectives (pp.
329). Bingley: Emerald Group.
Wilson, D., & Sperber, D. (1993). Linguistic form and relevance. Lingua, 90(1), 125.
Chapter 13
Dagmara Ponska
Abstract The present paper is concerned with the question of literality of translations. The theoretical part presents the results of some think-aloud protocol (TAP)
research on literal translation regarded as a translators basic procedure. It also
deals with the problem of operationalization of literality in translation, enumerating
Carl and Schaeffers (n.d.) criteria for an ideal literal translation and presenting
Kielars (2013) definition of literal translation. The empirical part describes the
results of a study concerning French-Polish translations of a newspaper article,
involving 60 participants and using Translog as a primary logging tool. The main
aim of the study was to investigate the degree to which translators construction
of a full mental representation of the source text prior to translation and their
translation experience affect the literality of produced translations. An analysis
of the relationship between the literality operationalized according to Kielars
definition and one of the definitional criteria for literality proposed by Carl and
Schaeffer, namely the translation entropy, is an additional element.
Keywords Literality Literal translation Translation procedure Translation
experience Text representation Translation entropy Translog
D. Ponska ()
University of Social Sciences and Humanities, Warsaw, Poland
e-mail: dplonska@swps.edu.pl
Springer International Publishing Switzerland 2016
M. Carl et al. (eds.), New Directions in Empirical Translation Process Research,
New Frontiers in Translation Studies, DOI 10.1007/978-3-319-20358-4_13
279
280
D. Ponska
281
282
D. Ponska
indique sa fonction dans la phrase [in French the order of items is fixed since
the disappearance of the nominal inflexion, while in Polish the order of items is
freer because the form of the noun indicates its function in the sentence]. Given
these differences, in the present study I decided not to apply the identical word
order as one of the criteria of literality of translation. However, I wanted to verify
if the notion of literality operationalized according to Kielars definition correlates
with the notion of entropy proposed by Carl and Schaeffer. The value of entropy
indicates how many different translations a given source text word has. If a word
has only one possible translation, it has an entropy value of 0. I admit the possibility
of a source text word having more than one literal equivalent in the target language.
This intuition is based on my previous experience. At the same time I suppose that
the number of literal equivalents of a given word is limited. Therefore, it seems to
me that there should be a strong relationship between the literality of translation
of a given word as defined by Kielar (2013), and translation entropy as defined by
Carl and Schaeffer (n.d.). The entropy values should be significantly smaller for the
words translated literally according to Kielars definition.
The study was aimed at ascertaining whether professional translators, nonprofessional translators and translation students differ in terms of the literality of the
translations they produce. Firstly, I expected that the tendency to translate literally
would decrease with experience, i.e. professionals would produce less literal
translations than students and students less literal ones than non-professionals.
Secondly, I investigated the impact of constructing an initial mental representation
of the source text on the literality of produced translations. I hypothesized that
participants who did form a mental representation of the text prior to translating
would translate less text literally than those who did not. Thirdly, I analyzed the
relationship between literality as defined by Kielar (2013) and entropy as defined
by Carl and Schaeffer (n.d.). As I already stated above, my assumption was that
the entropy values would be significantly smaller for the words translated literally
according to Kielars definition.
13.3 Method
13.3.1 Participants
The study involved 19 professional French to Polish translators aged 28 to 61, 20
students of applied linguistics with French language aged 22 to 34 and 20 persons
with advanced-level French language skills and without a background in translation
aged 25 to 54. Further in the text I refer to these groups by the terms professionals,
students and non-professionals. The professionals work experience ranged
from 5 to 38 years at the time of the study. Among the non-professionals, 9 persons
had DALF certificate (Diplme approfondi de langue franaise) confirming their
advanced knowledge of French. The other 10 persons were teachers of French in
283
upper secondary schools (Polish: liceum) in Warsaw and one person was a teacher
of French in a primary school (Polish: szkoa podstawowa) at the time of the
participation in this research.
284
D. Ponska
chercher
[and fetch]
votre
[your]
retraite
[pension]
en Bulgarie.
[in Bulgaria]
Original text
Vous
[you]
devez
[have to]
personnellement
[personally]
venir
[come]
do Bugarii.
emeryture
swoja
po
przyjechac
osobiscie
musi
Imaginary literal
translation 1
Pani
emeryture.
swoja
po
do Bugarii
przyjechac
osobiscie
pani
Imaginary literal
translation 2
Musi
do Bugarii.
osobiscie
przyjechac
pani
musi
emeryture
swoja
Imaginary literal
translation 3
Po
Participant 1
translation
Musi
lit.
pani
lit.
osobiscie
lit.
zjawic sie
non-lit.
[appear]
w Bugarii
non-lit.
[in Bulgaria]
po odbir
non-lit.
[to collect]
swojej
non-lit.
[your]
emerytury.
non-lit.
[pension]
w Bugarii.
non-lit.
[in Bulgaria]
odbierac
non-lit.
[collect]
osobiscie
lit.
Participant 2
translation
Swoja
lit.
emeryture
lit.
musi
lit.
pani
lit.
286
D. Ponska
ensure that the target text words identified as such were indeed literal equivalents
of the source text words. However, originally, this method was used to check the
semantic content of translation segments of at least two words. When I tried to apply
the method to separate words it turned out to be unavailing, because all the words
appearing in the French-Polish dictionary as the equivalents of a given French word
could be translated back by the means of the same French word using Polish-French
dictionary.
13.3.4 Results
The total number of source text words translated literally was used as a measure
of literality of translation. This variable was examined with a 3 2 (Experience
[non-professionals, students, professionals] Initial text representation [yes, no])
analysis of variance (ANOVA). The results showed a statistically significant effect
of the main variable experience, F(2, 53) D 6.32, p < 0.01, 2 D 0.19 (see Fig. 13.1).
The students translated significantly less text literally (M D 162.55; SD D 27.46)
than non-professionals (M D 193.55; SD D 33.40).
Furthermore, the effect of the interaction between the variables Experience and
Initial text representation was statistically significant, F(2, 53) D 5.78, p < 0.01,
2 D 0.18 (see Fig. 13.2). Simple effects analysis showed that students translated
Fig. 13.1 Average total number of words translated literally depending on the experience
287
Fig. 13.2 Average total number of words translated literally depending on the experience and
depending on whether an initial representation of the source text had been formed
significantly less text literally than both professionals and non-professionals but
only when the participants did not form a mental representation of the text prior
to commencing work. Creating an initial representation of the text significantly
reduced the amount of text translated literally among professionals and significantly
increased this amount among students.
I used the total number of source text words translated literally as a measure of
literality of the whole translation. However, to investigate the relationship between
the literality of translation of a given word and the entropy of translation alternatives
I used a nominal variable literality concerning separate words. As I already stated
above, for every translation, all of the source text words were labeled as translated
literally or non-literally. The entropy values were also calculated for every source
text word. In order to verify whether the nominal variable literality concerning
separate words can be a good predictor variable of the entropy of translation
alternatives, a one-way ANOVA was performed with literality as a factor. The effect
of this variable was statistically significant, F (1, 22772) D 7470.62, p < 0.001,
2 D 0.25. The entropy values were smaller in the case of source text words labeled
as translated literally (see Fig. 13.3).
Another illustration of this relationship is provided by Fig. 13.4 presenting the
number of occurrences of literal and non-literal translation depending on the entropy
values.
288
D. Ponska
Fig. 13.3 Entropy of translation alternatives depending on whether the word was translated
literally
13.3.5 Discussion
The results of the study are unanticipated. As it turns out, the students translated
less text literally than professional translators. Moreover, there are no significant
differences in the amount of text translated literally between professionals and
non-professionals. The results also show that an initial mental representation of
the source text has a substantial impact on the subsequent translation process in
terms of the frequency of words translated literally. According to expectation,
having an initial representation of the text made the differences between groups
less substantial. However, the influence of this variable is different for the three
groups. The students who did form an initial mental representation of the source text
translated literally more text than those who did not. In contrast, the professionals
who did construct a representation of the source text before taking up the task made
less frequent use of literal translation than those who did not.
The results concerning the entropy of translation alternatives conform to my
preliminary expectations. For the words labeled as translated literally according
to Kielars definition the entropy values are significantly lower. It means that the
number of translations proposed by participants is significantly smaller in the case
of words translated literally. This finding confirms the assumption that a word of
one language has a limited number of literal equivalents in another language. It
also shows that, to a certain extent, my operationalization of literal translation is
289
Fig. 13.4 Number of occurrences of literal and non-literal translation depending on the entropy of
translation alternatives
compatible with Carl and Schaeffers conception described in Chaps. 2 and 9. The
entropy, one of the criteria used by Carl and Schaeffer to measure the literality of
translations, is a continuous variable and the literality operationalized according to
Kielars definition is a nominal one. Nevertheless, considering the entropy criterion
alone, the passages labeled as translated literally were significantly closer to an ideal
literal translation as defined by Carl and Schaeffer than the passages marked as
translated non-literally.
In my view students reluctance to translate literally might have been the result
of translators training. Students learn at a very early stage that the use of literal
translation often results in translation errors and can be regarded as a sign of
incompetence. This is the probable reason why they perceive this procedure as
their last resort and try to avoid it by all means. As their experience grows, they
learn to recognize the situations permitting the safe use of literal translation. In
contrast, experienced translators can consciously use literal translation allowing
them to provide translations that are both acceptable to target norms and adequate
to the source. It is even easier to apprehend given that the use of this procedure
requires less time and effort (cf. Schaeffer and Carl 2014). As the differences in
the use of literal translation between professional and non-professional translators
werent statistically significant, it would be compelling to assess the quality of the
290
D. Ponska
translations and to examine whether the use of this procedure was associated with
specific translation errors in any of these groups.
The study confirms the importance of forming a mental representation of text
before commencing translation. In light of the results, I believe that having a mental
representation of the source text before taking up the task allows the translator to be
freer in their choice of available translation procedures, including literal translation.
This would help in explaining why, contrary to my preliminary expectation, the
students who did form an initial representation of the text translated literally more
text than those who did not.
The effect of experimental condition could also be interpreted in terms of a
possible pre-translation during the initial reading and a probable priming effect (see
Chaps. 9 and 10, this volume). Schaeffer et al. argue that reading for translation is
substantially different from monolingual reading. However, when the participants
were presented the original text for the first time, they were told the objective of the
reading was to be able to answer the questions about the text form and content. The
participants knew they were going to translate a text during the study. They might
have supposed the text they were reading to be the one they would translate later.
Nevertheless, they werent explicitly told so. Certainly, some of the words of the
original text were translated during writing the summary. On the other hand, the text
was too long to be memorized, so at this stage the participants were writing a new
text based on their recollection of the original text content rather than translating.
As far as the priming effect is concerned, the participants were presented not
only the original text but also the sentence recognition test with different kind
of samples, including paraphrases, correct conclusions and incorrect conclusions.
They also wrote a Polish summary of the original text. As noted by Schaeffer et al.
(see Chap. 9, this volume), in translation priming studies the priming from L1 to
L2 was observed more often than priming from L2 to L1. Thus, it is legitimate
to suppose that Polish words and syntactic constructions the participants used in
their own summaries of the original text were more accessible to them during the
subsequent translation. It might be an interesting concept for the future to examine
the summaries written by the three groups of participants and to see how close they
are both to the original text and to the translation text. An analysis of the time
data from Translog could also shed a light on the role of a pre-translation and of
a possible priming effect in the ulterior translation process.
A more complete picture of the translation process may be gained by researching
the ways the three groups of participants apply more complex translation strategies.
As a part of the current project, I also plan to take a closer look at the participants
errors by analyzing the entire process of making corrections.
Acknowledgment I would like to express my gratitude to the former director of the Center
for Research and Innovation in Translation and Translation Technology at Copenhagen Business
School, Arnt Lykke Jakobsen and to the current director of the center, Michael Carl, for their
assistance and guidance in my work.
The research was supported by Polish National Science Centre (NCN); grant awarded by decision
N DEC-2013/09/N/HS6/02863.
291
References
Carl, M. (2012). Translog-II: A program for recording user activity data for empirical reading and
writing research. In N. Calzolari (Ed.), Proceedings of the eighth international conference on
language resources and evaluation (LREC 2012) (pp. 26), May 23rd25th, 2012, Istanbul.
European Language Resources Association.
Carl, M., & Schaeffer, M. (n.d.). Literal translation and processes of post-editing. In: Edited
volume. Translation in transition: Between cognition, computing and technology, under review
with Benjamins.
Chesterman, A. (2011). Reflections on the literal translation hypothesis. In C. Alvstad, A. Hild,
& E. Tiselius (Eds.), Methods and strategies of process research: integrative approaches in
translation studies (pp. 2335). Amsterdam: John Benjamins Publishing.
van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. Orlando: Academic.
Dobrzynski, J., Frosztega, B., & Kaczuba, I. (Eds.). (1996). Wielki sownik francusko-polski [The
Great French-Polish Dictionary]. Warsaw: Wiedza Powszechna.
Frosztega, B. (Ed.). (19952008). Wielki sownik polsko-francuski [The Great Polish-French
Dictionary]. Warsaw: Wiedza Powszechna.
Germann, U. (2008). Yawat: Yet another word alignment tool. In Proceedings of the ACL08: HLT demo session (Companion Volume) (pp. 2023). Columbus, OH: Association for
Computational Linguistics.
Gniadek, S. (1979). Grammaire contrastive franco-polonaise [Contrastive grammar of French and
Polish]. Warsaw: PWN.
Ivir, V. (1997). Formal/contrastive correspondence and translation equivalence. Studia Romanica
et Anglica Zagrabiensia, 42, 167180.
Jskelinen, R. (1996). Hard work will bear beautiful fruit. A comparison of two think-aloud
protocol studies. Meta, 41(1), 6074.
Kielar, B. Z. (2013). Zarys translatoryki [The outline of translatology]. Warsaw: Wydawnictwo
Naukowe IKL@.
Knigs, F. G., & Kauffmann, R. (1996). Processus mentaux tudis chez des sujets allemands
apprenant le franais lorsquils sont en train de traduire [Investigation of mental processes of
French-learning German subjects involved in translating]. Meta, 41(1), 725.
Lrscher, W. (2005). The translation process: Methods and problems of its investigation. Meta,
50(2), 597608.
Mandelblit, N. (1996). The cognitive view of metaphor and its implications for translation theory.
In M. Thelen, & B. Lewandowska-Tomaszczyk (Eds.), Translation and meaning, part 3.
Proceedings of the Maastricht session of the 1995 Maastricht-dz Duo Colloquium on
Translation and Meaning, Held in Maastricht (p. 483495), The Netherlands, April 1922,
1995 . Maastricht: Rijkshogeschool Maastricht, School of Translation and Interpreting.
Newmark, P. (1988). A textbook of translation. New York: Prentice Hall.
Ponska, D. (2014). Strategies of translation. Psychology of Language and Communication, 18(1),
6774.
Schaeffer, M., & Carl, M. (2014). Measuring the cognitive effort of literal translation processes.
In U. Germann, M. Carl, P. Koehn, G. Sanchis-Trilles, F. Casacuberta, R. Hill, & S. OBrien
(Eds.), Proceedings of the workshop on humans and computer-assisted translation (HaCaT)
(pp. 2937). Stroudsburg, PA: Association for Computational Linguistics.
Tirkkonen-Condit, S. (2005). The monitor model revisited: Evidence from process research. Meta,
50(2), 405414.
Chapter 14
Abstract The current chapter introduces an annotation schema of TPR data that
categorises post-editing behaviour into five different classes and compares generallanguage and domain-specific English-to-German translation and post-editing with
respect to production times, key-logging (text production activity and text elimination activity) and eye-tracking data (total reading times on source text and on target
text). The results support the hypothesis that post-editing is faster than translation
from scratch for both domain-specific and non-domain-specific text types. When
key-logging and eye-tracking data are taken into consideration, domain-specific
texts require more effort when translating from scratch, but less effort, when the
machine translation output is post-edited. It is hypothesized that the introduced
annotation schema could provide more details about translation processes, and
better insights into the differences between different domains.
Keywords Translation process research LSP Key-logging Eye-tracking
Post-editing Annotation schema
14.1 Introduction
The global demand for translated texts is constantly rising (De Palma 2009).
Companies are increasingly using machine translation and editing the translations
produced by the computer in order to improve the translators efficiency (cf. OBrien
2011). In regard to this development, translation process researchers started to
investigate the advantages and disadvantages of post-editing and the differences
between the translation process and the post-editing process (cf. Culo
et al. 2014;
Carl et al. 2014; Winther Balling and Carl 2014). In this chapter we will present new
findings and methods to analyse post-editing in order to help to find answers to the
question how translation and post-editing differ.
293
294
This chapter presents an annotation schema for the Translog data that has already
been applied to some of the CFT13 study data and that categorises post-editing
behaviour into five different classes. The main classes are the orientation and
revision phases, which in turn are separated into two or three subclasses specified
by indices. This annotation schema could also be adapted to translation behaviour
and thus used to compare the two activities. Finally, we will introduce a possible
extension of the database in regard to text type and domain-specific translation/postediting. We believe that focusing on domain-specific texts is very interesting since
they make up the biggest component of the translation market (Hommerich and
Rei 2011). We will thus use first findings to compare translation and postediting behaviour in three domains: journalistic and sociology texts (classified as
general-languagedata already available in the database in study SG12), extracts
from a refrigerator manual (domain-specific language) and extracts from patient
information leaflets (domain-specific language). Then we will show the advantages
and disadvantages of translation and post-editing in these fields.
295
concentrate on orientation (O) and revision (R).1 O and R have been specified by
indices (orientation in source or target text or in source and target text: Os , Ot , Ost ;
linear or scattered revision: Rl , Rs ) which will be further described below.
Features
The participant spend time reading both source and target text (Fig. 14.1, left)
More than 80 % of the fixations were on the source text.
More than 80 % of the fixations were on the target text (Fig. 14.1, right)
Fig. 14.1 (left) Ost orientation in source and target text. The participant looked at both ST
(green dots) and TT (blue dots); (right) Ot orientation in target text. The participant looked more
frequently at TT (green dots)
296
Features
Every word or phrase is processed only once. In Fig. 14.2 (left) for example,
revisions 15 are on different parts of the text.
The participant works on a part of the text, moves on but jumps back later to readjust
the parts (s)he already worked on. In Fig. 14.2 (right), for example, revision 1 is on
the same part of text as revision 4.
Fig. 14.2 (left) Rl linear revision. The participant edited every word only once. Revisions 1
5 are on different words; (right) Rs scattered revision. The participant edited one word twice.
Revisions 1 and 4 are on the same word
297
made by the participant. The black numbers represent the order in which the
revisions were performed.
Fig. 14.3 (left) Orientation splitting up a long revision phase; (right) Sequence of several phases
298
Type 1: Reading ST
Type 2: Reading TT
Type 3 Typing
Type 4 Typing while reading ST
Type 5 Typing while reading TT
Type 6 Typing while reading ST and TT
Type 7 No recorded activity
We used these annotations as a basis for our annotations. The granularity of the
original TPR-DB activity units (CUs) is however quiet different from the one we
suggest. Figure 14.5 reproduces the segments of Fig. 14.2 (right) with the original
TPR-DB CU segmentation. The graph represents a duration of approximately 60
s, which is a single segment in the annotation (Fig. 14.2), but amounts to 44
segments in the TPR-DB annotation. Although we forfeit information about user
activity when using the annotation schema presented in this paper, it is possible
to distinguish immediately, for example, between different revision patterns. As
Fig. 14.5 shows, it cannot easily be distinguish between scattered and linear revision
behaviour, which however is possible with our annotation schema.
In contrast to the TPR-DB CU-units, which are computed automatically and
which represent activity patterns on a very fine granularity level, the classification
of the labels described above was created by visual analysis of the logging data.
We believe that the advantage of the coarse granularity level of our annotation
schema would allow users to distinguish between different types of post-editors
and translators more easily. It might be easier to immediately discover a certain type
of participant at first glance in our annotations, whereas the TPR-DB units need a
Ost
28 (29.79 %)
13 (26 %)
16 (22.86 %)
2a (3.57 %)
27 (42.19 %)
33 (33.67 %)
45 (38.14 %)
32 (39.51 %)
33 (32.04 %)
30 (38.96 %)
43 (33.33 %)
286
Os
0 (0 %)
1 (2 %)
1 (1.43 %)
4 (6.25 %)
2 (2.04 %)
2 (1.69 %)
1 (1.23 %)
3 (2.91 %)
1 (1.30 %)
1 (0.76 %)
16
Ot
19 (10.64 %)
7 (14 %)
6 (8.57 %)
0
3 (3.06 %)
6 (5.08 %)
5 (6.17 %)
8 (7.77 %)
5 (6.49 %)
20 (15.50 %)
79
Rl
43 (45.74 %)
27 (54 %)
43 (61.43 %)
49 (87.5 %)
30 (46.88 %)
46 (46.94 %)
51 (43.22 %)
34 (41.97 %)
49 (47.57 %)
30 (38.96 %)
54 (41.86 %)
456
Rs
4 (4.26 %)
2 (4 %)
4 (5.71 %)
5 (8.93 %)
3 (4.69 %)
14 (14.29 %)
14 (11.86 %)
9 (11.11 %)
10 (9.71 %)
11 (14.29 %)
11 (8.53 %)
87
Sum
94
50
70
56 (101)b
64
98
118
81
103
77
129
Some of the eye-tracking data were probably lost for this dataset after the first segments. We therefore decided not to use indices when annotating orientation
phases. They were assigned the label O. A total of 44 phases in this dataset were labelled O.
b
The number in brackets includes the label O
Participant
P01_P11
P01_PI21
P01_PIA31
P03_P31
P03_PI11
P04_P31
P04_PI11
P04_PIA21
P05_P11
P05_PI21
P05_PIA31
Sum
300
Fig. 14.5 Rs segment reproduction of Fig. 14.2 (right) with original TPR-DB CUs
more extensive analysis before patterns become visible to the researcher. However,
our annotation schema is not yet based on quantifiable measures. A next step would
thus involve setting thresholds to generate an algorithm to automatically extract such
phases from the raw data. Lubli and Germann (see Chap. 8, this volume) are the
first to attempt to use the above described annotation. In future work, the annotation
schema should be applied to all post-editing and translation data in the database so
that the behaviour can be compared between different participants, languages, tools
and/or text types.
301
302
In the SG12 study, each of the participants had to translate two texts. Therefore,
every text was translated and post-edited eight times by student translators. In the
set-up of the study OCT13, each participant had to translate one text from scratch,
so that each technical text was translated four times and each medical text three
times. Two texts had to be post-edited by each participantone according to the
guidelines of full post-editing, one according to the guidelines of light post-editing.
The guidelines of the light post-editing task are similar to the guidelines of the
bilingual post-editing task in study SG12. Therefore, the processes and products of
the light post-editing task in OCT13 are comparable to the post-edited texts from
study SG12. In all, we have a corpus of 24 translated and post-edited texts from
study SG12, and 21 translated and post-edited texts from OCT13.
We conducted different undirected tests for significance. When the data was distributed normally,
a t-test was conducted; when the data was not distributed normally, a Mann-Whitney-U-Test was
conducted.
Due to technical problems the data of P10 for translation are not considered.
303
The production times are much higher for the domain-specific texts (M D 1243.4,
SD D 485.4) than for general-language texts (M D 991.1, SD D 311.9) in the translation task. The contrary applies to the post-editing tasks, though the difference is
not as high (domain-specific M D 720.8, SD D 237.3; general-language M D 779.6,
SD D 227.5). However, the differences between the domains are not statistically
significant: W D 295, p D 0.117 for translations from scratch and t(41.7) D 0.84,
p D 0.4027 for post-editing.
Without separating the production times by domain, participants need significantly longer for translation than for post-editing: W D 1498.5, p < 0.0001. The
same applies, when the studies are separated. For study SG12, the difference is
significant as well (t(40.2) D 2.65, p D 0.012), but not as strong as for study OCT13
(W D 357, p < 0.0001).
Conclusively, translation from scratch takes up more time than post-editing,
irrespective of the domain the participants worked in. The post-editing task seems
to be more time-saving for technical and medical texts than for general-language
texts, however, the differences are not significant.
304
Table 14.4 Mean values and standard deviation of text production (TP) and text elimination (TE)
according to study (SG12 and OCT13) and task, translation (T) and post-editing (P)
Study
SG12
SG12
OCT13
OCT13
Task
T
P
T
P
Mean TP
1095:2
306:3
1297:4
199:9
Standard deviation TP
194:1
146:2
158:0
72:6
Mean TE
100:4
128:5
181:9
111:9
Standard deviation TE
56:1
82:5
89:1
65:4
as these parameters are only available for study SG12 and not for study OCT13, we
had to use the Translog-II output.
The descriptive statistics in Table 14.4 reveal that less text was produced and
erased in study SG12 than in study OCT13 for the translation task. For the postediting task, it is the other way around: Participants produced and erase more text in
study SG12 than in study OCT13.
In translation, the difference between study SG12 and OCT13 for TP
(t(41.5) D 3.80, p D 0.0005) and for TE (W D 383.5, p D 0.0009) is significant.
For post-editing, the differences between the studies is only significant for TP
(W D 131, p D 0.0062), but not for TE (W D 230, p D 0.6246).
To summarise the results, the differences in TP and TE in the studies for both
tasks suggest that the MT output was more useful for domain-specific texts than for
general-language texts, although the difference for TE is not significant. Another
interpretation would be that the participants in the seconds study used the MT output
more efficiently.
305
Table 14.5 Mean values and standard deviation of total reading timeor total reading timeon
source (TrtS) and target text (TrtT) in seconds according to study (SG12 and OCT13) and task,
translation (T) and post-editing (P)
Study
SG12
SG12
OCT13
OCT13
Combined
Combined
Task
T
P
T
P
T
P
Mean TrtS
360:29
206:97
241:60
149:69
304:90
180:24
Standard
deviation TrtS
116:62
81:31
100:05
44:34
123:47
72:00
Mean TrtT
507:67
474:15
368:46
377:88
442:70
499:22
Standard
deviation TrtT
229:77
136:64
194:17
144:70
222:84
147:09
In both studies and tasks, the Trt of the target text was higher than of the source
text. Additionally, the TrtT is similar for both tasks in each study. Although the TrtS
is always lower than of TT, the gap is bigger in post-editing than in translation.
Finally, Trt is shorter for both ST and TT and for both tasks in study OCT13.
Initially, both studies were analysed separately. In study SG12 the difference
between translation from scratch and post-editing is significant in terms of gaze on
the ST (t(41.1) D 5.28, p < 0.0001), while there is no significant difference for gaze
on the TT (W D 301, p D 0.7984). The same applies for study OCT13: Again, the
difference between translation from scratch and post-editing is significant for gaze
on the ST (W D 351, p D 0.0008), while there is no significant difference for gaze on
TT (W D 210, p D 0.8034). Conclusively, there is no significant difference between
TrtT for both tasks, but the difference is significant for TrtS, and according to the
descriptive values gaze on ST is significantly higher for translation.
Combining the two studies and thereby increasing the number of texts (n D 45
per task) leads to clearer results: The difference between the tasks in TrtS is very
significant (W D 1636, p < 0.0001), while there is no significant difference between
the two tasks when considering TrtT (W D 1011, p D 0.9936).
Next, total fixations duration was compared between the tasks in different
domains considering the two tasks. When looking at the translation from scratch
data, the TrtS of the general-language texts was significantly longer than in the
domain-specific texts (t(43.0) D 3.67, p D 0.0007). It has to be kept in mind that
the texts were about the same length. The same applies to TrtT, but the effect is not as
strong (W D 151, p D 0.0212). When looking at post-editing, the difference between
gaze on the general-language ST and the domain-specific STs is again significantly
higher for the general-language texts (W D 142, p D 0.0117). Similar results can be
found for TrtT in post-editing (W D 146, p D 0.0154).
All in all, gaze on the ST decreases in post-editing, while it stays about the same
for the TT in both tasks. Further, less gaze was spent on both texts in the domainspecific texts. It has to be kept in mind that the figures for total fixation duration
on the source text and the target text were taken from different sources (tables in
CRITT TPR database vs. Tobii Studio). This should not result in any differences,
but it cannot be completely ruled out.
306
14.4 Conclusion
The present study aimed at providing a more in-depth investigation into the
differences between translation and post-editing by analysing how texts from
different domains are processed. The results showed that participants need more
time for translation than for post-editing. However, no significant difference was
found between the different domains but the mean values indicate a tendency toward
translation taking longer for domain-specific texts, with translators working faster
in post-editing for those domains. A similar result was observed in the key-logging
data. While more key-logging activity was recorded in the domain-specific texts
for the translation task, fewer keystrokes were necessary to post-edit these texts.
The gaze behaviour on the TT is about the same for both tasks, while gaze on
ST is significantly lower in the post-editing task than in translation. This applies
for both studies. Conclusively, the gaze behaviour changes between the two tasks
and the TT receives more visual attention in the post-editing task compared to the
translation task. While the ST is the main information source in the translation
task, and an entire translation has to be produced, in the post-editing task, the most
important part is the MT output and the TT. In post-editing, the ST is only used
for reference. Further, less gaze was spent on the domain-specific texts than on the
general-language texts, in regard to both ST and TT.
The annotation schema presented in Sect. 14.2 of this chapter could reveal
differences between post-editing and translation of domain-specific texts which we
did not discover with the analysis presented in this paper. Our data failed to reach
significance level but still showed a tendency towards a difference between different
domains. It could therefore be worth analysing the recordings for the phases of our
annotation schema. We believe that especially the phases Rl and Rs could give more
insights into the differences between different domains. The revision behaviour was
not covered by our measurements of key-strokes and eye-tracking data. As a future
step, it might thus be worth annotating the data of study SG12 and study OCT13
and analysing the different phases.
We also believe that, although the TPR-DB already includes domain-specific
(medical) texts for the CASMACAT project, it would be sensible to expand
with additional domains. The analysis of the SG12 and OCT13 studies showed
differences in translators behaviour in non-domain-specific vs. domain-specific
texts. These differences can be expected to be present for other domains as well.
Further, domain-specific texts are far more relevant in professional translation than
the translation of newspaper articles or similar general-language texts. An online
survey (Hommerich and Rei 2011) conducted for the BD6 one of the leading
German professional associations for interpreters and translatorsreported that
49 % of the members that participated in the study (in total 1570) specialised in
the field Industry and Technology (general), 45 % in Law and Administration,
307
References
Carl, M. (2012). Translog-II: A program for recording user activity data for empirical translation
process research. In Proceedings of the eighth international conference on language resources
and evaluation. Istanbul, Turkey.
Carl, M., Gutermuth, S., & Hansen-Schirra, S. (2014). Post-editing machine translationa usability
test for professional translation settings. In Psycholinguistic and cognitive inquiries in translation and interpretation studies. Newcastle Upon Tyne: Cambridge Scholars Publishing.
Culo,
O., Gutermuth, S., Hansen-Schirra, S., & Nitzke, J. (2014). The influence of post-editing
on translation strategies. In Post-editing of machine translation: Processes and applications.
Newcastle Upon Tyne: Cambridge Scholars Publishing.
De Palma, D. (2009). The business case for machine translation. Common Sense Advisory.
Accessed March 30, 2015. http://www.mt-archive.info/MTS-2009-DePalma-ppt.pdf
Dragsted, B., & Carl, M. (2013). Towards a classification of translation styles based on eye-tracking
and key-logging data. Journal of Writing Research, 5(1), 13358.
Hommerich, C., & Rei, N. (2011). Ergebnisse Der BD-Mitgliederbefragung.
OBrien, S. (2011). Towards predicting post-editing productivity. Machine Translation, 25, 197
215.
308
Winther Balling, L., & Carl, M. (2014). Production time across languages and tasks: A largescale analysis using the CRITT translation process database. In Psycholinguistic and cognitive
inquiries in translation and interpretation studies. Newcastle Upon Tyne: Cambridge Scholars
Publishing.
Index
A
Acceptability, 123, 124, 127128, 131, 189,
219, 291. See also Adequacy
Active learning (AL), 7, 49, 5774
Activity
pattern, 157, 161, 298
units (CU), 10, 19, 33, 3839, 50, 52, 53,
298
Adequacy, 6, 61, 123, 124, 127131, 140, 189,
265, 289
AIC value, 119, 120, 122, 125, 128
Alignment
segmentation, 21, 31
units (AU), 15, 19, 2225, 31, 50
Anaphora, 241
Annotation
schema, 8, 10, 293309
system, 10, 266, 269, 270, 272274
Areas of interest (AOI), 86, 89, 307
Automatic processes, 16, 155179. See also
Priming
Automatic tagging, 155179
B
Backward score, 62
Batch learning, 63. See also Online learning
BD, 306
Behavioral measures, 269
Best suffix, 61
BIAC model, 192, 206
Biconcordancer, 8, 135151. See also
Concordancer
Bigram, 27
Bilingual
continuum, 187, 215
lexicon, 9, 185, 192193, 206
mode, 215
post-editing, 45, 300, 302
C
CASMACAT
field trial, 45, 96, 104, 137, 140, 141, 150,
163
pre-field trial, 46
workbench, 7, 14, 15, 17, 45, 58, 6568,
70, 74, 80, 85, 86, 89, 92, 95, 104,
108, 112, 115, 137, 138, 140, 141,
147150, 162, 163
CAT. See Computer assisted translation (CAT)
Cataphora, 241
CAT workbench, 5774, 79, 80, 108. See also
CASMACAT; Trados; Wordbee
Chinese
input method, 246, 249
input system, 247
Chisegmentor, 248
Clustering, 9, 49, 104, 162, 170, 171
Co-activation, 9, 185, 187, 191194, 199, 207,
213, 215216, 219, 224, 234236,
267
Cognitive
effort, 8, 78, 8284, 89, 92, 93, 156, 159,
160, 186, 189, 212, 213, 217, 219,
222, 225, 250, 267, 268, 273, 274
process, 46, 8, 82, 138, 147, 148, 159,
160, 186, 187, 193, 195, 266
309
310
Coherent
construction, 241
keyboard activity, 35, 39, 52, 100, 221, 223
reading, 19, 36
sequences, 19, 35, 36, 161, 275
text, 17, 35, 221, 240, 241, 251
typing, 19, 35, 36, 275
Cohesive
chain, 9, 240, 241, 245, 251, 257259
relations, 239261
Collocation, 41, 143, 241. See also
Biconcordancer
Compound noun, 36, 273
Comprehension strategies, 280
Computer-aided translation, 5, 6, 98, 112
Computer assisted translation (CAT),
5774, 79, 80, 93, 109. See also
Post-editing
Conceptual. See also Procedural encoding
analysis, 58, 80, 83
encoding (CE), 266, 275276
model, 139, 192
representation, 275
Concordancer, 116119, 126, 127, 129, 130,
137, 149. See also Biconcordancer
Concurrent
activity, 38
processing, 191, 208
reading, 25, 39, 185, 187189
typing, 34, 187189
Copying, 6, 10, 15, 48, 49, 67, 145, 219221,
223, 232235, 245
Co-referentiality, 241
Cross-linguistic
differences, 270
priming, 185, 192, 206, 207, 218
Cross value, 23, 26, 27, 51, 189191, 194, 199,
207
D
Dictionary, 9, 41, 116, 117, 119, 121, 126131,
136, 147, 149, 248, 283, 286
Distributed conceptual features model
(DCFM), 192, 193, 206
Domain-specific texts, 11, 294, 300, 303306
Drafting, 1921, 23, 25, 50, 52, 58, 79, 157,
158, 167, 231, 277, 294, 296
Drop1 test, 121
Dynamic programming, 61
Index
E
Early eye movement measure, 9, 185, 191,
193195, 205, 206. See also
Fixation
Edit distance, 62. See also Translation edit rate
(TER)
Editing, 49, 96, 132, 276
Effect plot, 120124, 126, 128, 129
Effort
cognitive, 8, 78, 8284, 89, 92, 93, 156,
159, 160, 186, 189, 212, 213, 217,
219, 222, 225, 250, 267, 268, 273,
274
technical, 8, 78, 8284, 86, 8889, 92, 93,
109, 141, 142, 159
temporal, 78, 79, 82, 83, 8688, 9193, 156
typing, 7, 69, 71, 73, 74, 97, 107, 109, 412
EMEA corpus, 8, 66, 74, 84, 85, 97, 106
Emission probability, 170
Encyclopedia, 48, 116, 119, 121, 126, 127,
129131, 149
Error analysis, 124
Europarl corpus, 65, 67
Expectation-maximisation (EM), 6365, 171
Expertise, 141, 161, 224, 225, 228, 230, 232
External
information tools, 115, 140
resources (EX), 7, 8, 19, 4144, 48, 5053,
111132, 144146, 150
Eye
movement, 9, 86, 99, 147, 156, 157,
160, 162, 167, 184187, 189, 191,
193196, 205207, 224, 225, 241,
251, 253, 256, 258
tracking, vii, 46, 17, 29, 36, 48, 80, 84, 86,
99, 102, 112, 114116, 139, 140,
147, 157, 162, 163, 173, 186, 215,
220, 227, 240242, 244, 267, 300,
301, 304305
Eyelink 2000, 115
Eye-mind assumption, 160
F
First fixation duration (FFDur), 9, 29, 30, 51,
186, 187, 194, 196201, 203207,
213, 216, 223, 225, 226, 229230,
232, 233
First pass duration (FPDur), 23, 24, 51, 196,
201202, 256
Index
311
G
Gaussian mixture model (GMM), 170173
Gaze
data, 6, 115, 220, 244, 248, 303
information, 23, 27, 29, 30, 32, 33
General-language texts, 294, 301306
Gold standard, 163167, 174, 176, 177
K
Key-logging, 4, 5, 17, 19, 157, 186, 195, 220,
240, 267, 300, 303304, 306
Keystroke
activities, 51, 106, 141, 221
data (KD), 19, 33, 41, 51, 65, 156
logging, 42, 112, 113, 115, 116, 130, 160,
283
K-means clustering, 162, 170
H
Hidden markov model (HMM) based
alignment model, 6364
Horizontal translation, 40, 41
Human
annotation, 9, 166, 173, 177, 178
translation process (HTP), 6, 911,
155179
Human-computer interaction (HCI), 6, 136,
138, 139, 149
Human-information interaction (HII), 136,
139, 145
I
Idiomatic expressions, 114, 283
Incremental EM algorithm, 63, 65
Inefficiency, 26, 51, 225, 230232, 234, 235
Information
behavior (IB), 139, 140, 149
entropy, 213
needs, 140, 148, 151, 184
relevance, 143, 145, 146
retrieval (IR), 9, 139, 140, 145, 146, 148,
149
structure, 268
L
Language
Danish, 45, 48, 207, 220, 226, 232235,
241, 269, 271, 273
Dutch, 112, 117, 124, 130, 218, 220, 236
English, 17, 26, 27, 47, 48, 60, 62, 67, 82,
8486, 98, 112, 113, 116, 130, 143,
207, 215, 216, 218, 230, 236, 248,
250, 269, 272, 274, 275, 277, 280,
286, 301
French, 220, 280286
model, 59, 63
Polish, 281285, 290
Portuguese, 10, 48, 85, 240, 242244, 248,
252, 275
technologies, 136, 141
Language-pair
english-danish, 222223
english-german, 222223
english-portuguese, 92, 93
english-spanish, 24, 26, 66, 106, 137, 140,
212, 222223
french-polish, 279290
portuguese-chinese, 242244
312
Latin square design, 115
Law of interference, 217
Learning effect, 47, 92, 97, 104, 106, 148
Lexaligner, 248
Lexical
access, 198
cohesion, 240241, 252
LexTALE test, 116
Linear mixed effects
analysis, 119, 121, 122, 125128
modelling (LMEMs), 125, 185, 196, 200,
223
regression model (LMER), 253
Linear regression, 104, 223225
Literality, 9, 11, 22, 23, 33, 50, 51, 106, 185,
189, 191, 193, 216, 217, 279290
Literal translation
hypothesis, 185, 189191, 199, 216217,
236
non-literal translation, 217, 284, 287289
Log-linear model, 59, 61, 63, 66
Logographic language, 246
Longitudinal study, 8, 47, 97104, 108, 148
M
Machine translation (MT)
output, 9, 10, 49, 7881, 85, 87, 90, 92,
102, 107, 112, 113, 126, 127, 156,
158, 245, 294, 306, 308
post-editing, 7, 26, 49, 58, 7882, 90, 141,
158159, 242, 301
systems, 58, 59, 7880, 85, 155
Masked self-paced reading, 215
Medical texts, 8, 97, 106, 140, 301, 303, 306
Metaphor, 49, 271, 272, 280, 283
Micro units, 18, 19, 21, 2325, 51, 52, 104,
275
Mixture modelling, 170
Monolingual mode, 215
Mouse events, 78, 86, 88, 89, 303
MT. See Machine translation (MT)
Multilingual experiment, 17, 47
Multi-word unit, 31. See also Compound
noun; Terminology
N
N-gram model, 59, 63, 104
Non-selective lexical access, 193, 206
Index
O
Observable translation action, 158, 162, 175,
176
Online learning (OL), 7, 8, 17, 45, 49, 5774,
97, 106, 109, 140142, 144, 163
Operationalization, 10, 266, 281282, 288, 289
Orientation
Os, 164166, 295
Ost, 164166, 295
Ot, 164166, 295
P
Package leaflet, 140
Parallel
corpus, 8485, 137
processing, 185
reading, 14, 18, 19
Participant
clustering, 104
tracking, 241, 250251, 256258
Part-of-speech tag, 36
Pausing, 810, 1820, 25, 35, 36, 39, 52, 83,
85, 100, 160, 165167, 174178,
212, 221, 223, 241, 249251, 267,
275
Personal reference, 241
Phrase-based model, 59, 63
Portuguese chinese translator (PCT), 243
Post-editing (PE)
CASMACAT, 68, 15, 21, 40, 45, 49, 65,
67, 96, 108, 145
CAT, 58, 63, 79, 109
effort, 7, 68, 7793, 159, 160
guidelines, 67, 85, 99, 245, 304
interactive post-editing, 8, 17, 47, 87, 89,
91, 99, 147
interactive translation prediction (ITP), 49,
60, 78, 86, 96, 140, 163
process, 9, 63, 7982, 92, 93, 99, 115, 141,
142, 145147, 159, 160, 163165,
175, 242, 293, 300
Post-editor, 68, 18, 70, 78, 80, 81, 84, 85,
92, 93, 96110, 112, 114, 141143,
145148, 150, 158160, 162, 163,
174, 178, 241, 294, 296, 298, 300,
302, 307
Priming
cross-linguistic priming, 185, 192, 206,
207, 218
Index
probability, 219, 223225, 227230, 232,
236
semantic priming, 192, 218
structural priming, 192, 199, 206, 218
word-order dependence of priming, 192,
218, 226, 227, 236
Probabilistic finite-state machines, 62
Probability
distribution, 29, 31, 65, 178, 191, 213, 214,
225
of a fixation, 187, 196, 199, 201207
of regression, 186, 196, 200, 204, 227, 236
Problem-solving strategy, 112, 113, 115, 132
Procedural encoding
conceptual encoding, 266, 276
Process data, 5, 7, 10, 1416, 4041, 104, 141,
300, 307. See also Fixation; Product
data; Production
Processing
phases, 8, 184, 204, 294, 299
units, 16, 18, 19, 22, 24, 3339, 248, 275,
296
Product data, 15. See also Alignment unit
(AU); Process data; Segment;
Source tokens (ST); Target tokens
(TT)
Production
pauses, 18, 19, 25, 52, 100, 212, 221, 223,
251, 275
time, 8, 15, 45, 52, 87, 100, 102, 107, 118,
142, 188, 195, 221, 249, 251, 257,
258, 260, 261, 300303
units (PU), 19, 21, 25, 3339, 41, 5052,
160, 161, 188, 189, 275
Productivity, 4, 14, 58, 79, 96, 112, 141,
155156, 184, 212, 239261, 267,
300
Professional translator, 6, 11, 45, 48, 65, 82,
84, 98, 101, 105, 108, 109, 112, 113,
137, 140, 146, 155156, 159, 164,
187, 215, 220, 224, 228, 230, 241,
242, 280282, 289, 290, 301
Profile, 86, 98, 104, 107, 138, 178
Q
Quality score, 64, 124
R
R (data analysis software), 86, 119, 196, 223
Random
effect, 119, 121, 122, 125129, 204, 225,
232, 251, 253, 261
313
slope, 119, 121, 126, 127, 129
Reaction time, 192194, 215
Readability metrics, 114
Refrigerator manual, 294, 301
Relevance theory, 10, 266, 275, 276
Residual plot, 123, 125, 128, 129
Restructuring
effort, 10, 225, 227, 228
operations, 265266, 270274
Retrospective comment
interview, 245246
Revision
Rl, 164167, 295299, 301, 306
Rs, 164167, 295300, 306
Revision time (TimeR), 20, 21, 25, 50, 52
S
Screen recording, 8, 96, 108, 113, 139, 142,
145147, 150, 166, 241
Search
engine, 116, 130, 146, 149
query, 42, 113, 119, 121, 131, 146
strategy, 126, 131, 139
Segment
alignment, 21, 31, 274
information, 1923, 50, 119, 146, 272
summary table (SS), 21, 50, 119
Semantic
bonds, 240, 241
priming, 192, 218
Sense model, 192193, 206
Shared representation, 192, 193, 201, 205207,
218, 236
Situational model, 281
Skipping probability, 186, 196, 213
Source tokens (ST)
reading time (TrtS), 20, 21, 23, 24, 29, 30,
51, 186, 193, 196, 201, 203, 205,
206, 251, 253255, 304, 305
Specialized texts, 85, 97, 140
Statistical machine translation (SMT), 7,
5867, 70, 74, 140144
Statistical modelling, 61, 63, 155179, 194,
251, 253
Structural priming, 192, 193, 199, 206, 218,
227, 235. See also Priming
Sufficient statistics, 63
Syntactic
annotation, 9, 235, 266, 269, 270, 273, 276
entropy, 9, 44, 217, 219, 220, 223, 224,
226236, 269273
triplets, 44, 270
variation, 219, 224, 235, 236, 270
314
T
TAD. See Translator activity data (TAD)
TAP. See Think aloud protocol (TAP)
Target segment, 23, 33, 34, 40, 50, 143, 146,
234, 267, 270, 294
Target tokens (TT)
reading time (TrtT), 20, 21, 23, 24, 29, 30,
51, 83, 186, 235, 251, 256, 257, 260,
304, 305
Task type, 20, 23, 50, 115, 122, 123, 127, 131
TCI. See Translator-computer interaction
(TCI)
Technical
effort, 8, 78, 8284, 86, 8889, 9193, 109,
141, 159
text, 301, 303
Temporal effort, 78, 79, 82, 83, 8688, 9193,
156, 159
TER. See Translation edit rate (TER)
Terminology, 106, 136, 146, 149, 158, 307
Text
base, 281, 290
comprehension, 186, 207, 215, 239261,
267, 280, 281
representation, 11, 48, 195, 251, 281283,
286288, 290
Texture, 240
Think aloud protocol (TAP), v, 4, 81, 82, 112,
113, 115, 160, 241, 281
TII. See Translator-information interaction
(TII)
Tobii Studio, 86, 87, 242, 248, 249, 301, 304,
305
Tobii TX300, 301
Tool prototype, 135151
Total
production time, 188, 221, 251, 260, 302
reading time (TRT), 21, 23, 24, 29, 83,
186187, 193, 194, 196, 201, 202,
205, 208, 212, 213, 218, 223, 224,
226229, 231235, 251, 253257,
260, 261, 267269, 304, 305
translation time, 8, 122
TPG. See Translation progression graph (TPG)
TPR-DB. See Translation process research
database (TPR-DB)
Trados, 98
Translation
alternatives, 10, 51, 65, 78, 184, 185,
190194, 199, 212214, 236, 266,
269, 274, 277, 287289
asymmetry, 192, 193
duration, 10, 21, 24, 99, 194, 221
entropy, 213, 217, 268, 281, 282
Index
environments, 6, 137, 138, 146150, 162,
248
errors, 81, 289, 290
experience, 92, 96, 108, 125, 279, 289
industry, 92, 151, 156
model, 59, 62, 66, 67
probability, 29, 31, 65, 143, 175, 176, 199,
213, 214, 225, 290
problem, 59, 114, 131, 143, 241, 279290
procedure, 96, 279281, 290
process, 47, 911, 1353, 59, 60, 63,
68, 70, 83, 84, 96, 104, 112, 113,
115, 137, 139, 149, 155179, 184,
189, 191, 195, 200, 213, 215217,
240, 241, 266, 267, 269, 270, 275,
280, 283, 284, 288, 290, 293, 294,
300
productivity, 14, 15, 185, 212, 244
segment (SG), 51, 184, 266, 286
session (SS), 68, 14, 15, 156158,
160163, 168, 169, 171, 173, 179,
221, 240, 243
strategies, 245, 281, 290
style, 6, 96, 178
technology, 5, 6, 138, 139, 148, 151, 259,
292
typing, 38, 39
universals, 4, 189, 207
workbench, 8, 5774, 79, 85, 86, 112, 149,
162
Translation aides, 6, 245, 282. See also
Post-editing
Translation edit rate (TER)
TER scores, 78, 79, 86, 9093
Translation of medical text, 106, 301
Translation of news text, 48, 106
Translation process research database
(TPRDB), 5, 6, 1353, 58, 65, 68,
78, 85, 97, 112, 113, 116, 130,
138, 185, 186, 188, 189, 191, 213,
243, 246, 250, 267, 269, 275, 284,
298300, 306
compilation, 1518, 42
studies
BML12
CFT13, 45, 46, 97, 209, 294, 295
KTHJ08, 46, 48, 195, 221, 226, 269
OCT13, 301307
SG12, 47, 48, 195, 220, 221, 226, 269,
296, 301306
tables, 14, 18
Translation progression graph (TPG), 39, 40,
161, 248, 295
Translation quality assessment, 114, 123
Index
Translator activity data (TAD), 156, 160163,
171, 173, 174, 176, 178, 222, 223,
270, 271, 274. See also User
activity data (UAD)
Translator-computer interaction (TCI), 6, 136,
138, 139, 150s
Translator-information interaction (TII), 68,
135151
Translators behaviour, 4, 15, 48, 49, 80, 149,
167171, 300307
Translog
Translog-II, 57, 14, 15, 17, 4144, 48, 53,
156, 160, 162, 230, 240, 242, 243,
246250, 301, 303
TT. See Target tokens (TT)
Typing
effort, 7, 68, 69, 71, 73, 74, 106, 108
inefficiency, 23, 26, 230232, 234, 235
speed, 10, 70, 97, 106109
Typist, 97, 108, 109, 161
U
Uncertainty management, 112, 241
Unigram, 252, 256
Unsupervised
learning, 161162
sequence modelling, 170171, 175
Usability, 7, 138141, 147, 148, 150, 151
User
experience, 58, 139
feedback, 58, 60, 63, 69, 107109
interaction, 64, 139
interface, 10, 140, 143, 148, 151, 294
User activity data (UAD), 7, 11, 15, 18, 21, 41,
65
315
V
Vertical
process, 184, 191, 204, 206, 207
translation, 4041
Visualization, 5, 7, 11, 14, 15, 34, 4041, 45,
49, 120, 122, 123, 156, 160, 161,
175, 178, 188, 198200, 204, 205,
298, 302, 306, 307
Viterbi algorithm, 157, 171
W
Web-based
information, 136, 137
translation, 148149
Word
alignment, 16, 31, 45, 64, 116, 248
frequency, 185, 194202, 204, 205, 215,
242, 252, 288
graph, 62
length, 196202, 204, 205, 224225, 228,
230235, 245, 283
order, 9, 27, 189, 191194, 199, 206, 207,
217, 218, 225227, 236, 277, 281,
286
translation entropy, 9, 23, 27, 2933, 51,
106, 183208, 213
translation perplexity, 2933
Wordbee, 98
Y
YAWAT, 1617, 116, 284