EMNLP 2015
TENTH WORKSHOP ON
STATISTICAL MACHINE TRANSLATION
17-18 September 2015
Lisbon, Portugal
[HOME]
| [TRANSLATION TASK]
| [METRICS TASK]
| [TUNING TASK]
| [QUALITY ESTIMATION TASK]
| [AUTOMATIC POST-EDITING TASK]
| [SCHEDULE]
| [PAPERS]
| [AUTHORS]
| [RESULTS]
This workshop builds on nine previous workshops on statistical machine
translation:
IMPORTANT DATES
Release of training data for translation task | Early January, 2015 |
Release of training data for automatic post-editing task | January 31, 2015 |
Release of MT system for tuning task | February 9, 2015 |
Release of training data for quality estimation task | February 15, 2015 |
Registration for complimentary manual evaluation (tuning task) | February 22, 2015 |
Submission deadline for tuning task | April 20, 2015 | tr>
Test set distributed for translation task | April 20, 2015 |
Submission deadline for translation task | April 27, 2015 |
Test set distributed automatic post-editing task | April 27, 2015 |
System outputs distributed for metrics task | May 4, 2015 |
Test sets distributed for quality estimation task | May 4, 2015 |
Submission deadline for automatic post-editing task | May 15, 2015 |
Submission deadline for metrics task | May 25, 2015 |
Submission deadline for quality estimation task | June 2nd, 2015 |
Start of manual evaluation period | May 4, 2015 |
End of manual evaluation | June 8, 2015 |
Paper submission deadline | June 28, 2015 |
Notification of acceptance | July 21, 2015 |
Camera-ready deadline | August 11, 2015 |
OVERVIEW
This year's workshop will feature five shared tasks:
- a translation task,
- a (pilot) automatic post-editing task (NEW),
- a tuning task (optimize a given MT system, NEW, a follow-up of the WMT11 tunable metrics task),
- a quality estimation task (assess MT quality without access to any reference),
- a metrics task (assess MT quality given reference translation).
In addition to the shared tasks, the workshop will also feature scientific papers on topics related to MT.
Topics of interest include, but are not limited to:
- word-based, phrase-based, syntax-based, semantics-based SMT
- using comparable corpora for SMT
- incorporating linguistic information into SMT
- decoding
- system combination
- error analysis
- manual and automatic method for evaluating MT
- scaling MT to very large data sets
We encourage authors to evaluate their approaches to the above topics
using the common data sets created for the shared tasks.
TRANSLATION TASK
The first shared task which will examine translation between the
following language pairs:
- English-German and German-English
- English-French and French-English
- English-Finnish and Finnish-English NEW
- English-Czech and Czech-English
- English-Russian and Russian-English
The text for all the test sets will be drawn from news articles except
for (NEW) the French-English set, which will be drawn
from user-generated comments on the news articles.
Participants may submit translations for any or all of the language
directions. In addition to the common test sets the workshop organizers
will provide optional training resources, including a newly expanded
release of the Europarl corpora and out-of-domain corpora.
All participants who submit entries will have their translations
evaluated. We will evaluate translation performance by human judgment. To
facilitate the human evaluation we will require participants in the
shared tasks to manually judge some of the submitted translations. For each team,
this will amount
to ranking 300 sets of 5 translations, per language pair submitted.
We also provide baseline machine translation systems, with performance
comparable to the best systems from last year's shared task.
AUTOMATIC POST-EDITING TASK
This shared task will examine automatic methods for correcting errors produced by machine translation (MT) systems. Automatic Post-editing (APE) aims at improving MT output in black box scenarios, in which the MT system is used "as is" and cannot be modified.
From the application point of view APE components would make it possible to:
- Cope with systematic errors of an MT system whose decoding process is not accessible
- Provide professional translators with improved MT output quality to reduce (human) post-editing effort
In this first edition of the task, the evaluation will focus on one language pair (English-Spanish), measuring systems' capability to reduce the distance (HTER) that separates an automatic translation from its human-revised version approved for publication. Training and test data are provided by Unbabel.
QUALITY ESTIMATION TASK
Quality estimation systems aim at producing an estimate on the quality of a given translation at system run-time, without access to a reference translation. This topic is particularly relevant from a user perspective. Among other applications, it can (i) help decide whether a given translation is good enough for publishing as is; (ii) filter out sentences that are not good enough for post-editing; (iii) select the best translation among options from multiple MT and/or translation memory systems; (iv) inform readers of the target language of whether or not they can rely on a translation; and (v) spot parts (words or phrases) of a translation that are potentially incorrect.
Research on this topic has been showing promising results in the last couple of years. Building on the last three years' experience, the Quality-Estimation track of the WMT15 workshop and shared-task will focus on English, Spanish and German as languages and provide new training and test sets, along with evaluation metrics and baseline systems for variants of the task at three different levels of prediction: word, sentence, and document.
METRICS TASK
The metrics task (also called evaluation task) will assess automatic evaluation metrics' ability to:
- Rank systems on their overall performance on the test set
- Rank systems on a sentence by sentence level
Participants in the shared evaluation task will use their automatic evaluation metrics to score the output from the translation task and the tunable metrics task. In addition to MT outputs from the other two tasks, the participants will be provided with reference translations. We will measure the correlation of automatic evaluation metrics with the human judgments.
TUNING TASK
In the tuning task is a follow up of WMT11 invitation-only tunable metrics task. The task will assess your team's ability to optimize the parameters of a given hierarchical MT system (Moses).
Participants in the tuning task will be given complete Moses models for English-to-Czech and Czech-to-English translation and the standard developments sets from the translation task. The participants are expected to submit the moses.ini
for one or both of the translation directions. We will use the configuration and a fixed revision of Moses to translate official WMT15 test set. The outputs of the various configurations of the system will be scored using the standard manual evaluation procedure.
PAPER SUBMISSION INFORMATION
Submissions will consist of regular full papers of 6-10 pages, plus
additional pages for references, formatted following the
EMNLP 2015
guidelines. In addition, shared task participants will be invited to
submit short papers (4-6 pages) describing their systems or their
evaluation metrics. Both submission and review processes will be handled
electronically.
Note that regular papers must be anonymized, while system descriptions
do not need to be.
We encourage individuals who are submitting research papers to evaluate
their approaches using the training resources provided by this workshop
and past workshops, so that their experiments can be repeated by others
using these publicly available corpora.
POSTER FORMAT
For details on posters, please check with the local organisers.
ANNOUNCEMENTS
Subscribe to to the announcement list for WMT by entering your e-mail address below. This list will be used to announce when the test sets are released, to indicate any corrections to the training sets, and to amend the deadlines as needed.
|
|
You can read past
announcements on the Google Groups page for WMT. These also
include an archive of annoucements from earlier workshops. |
|
INVITED TALK
Jacob Devlin (Microsoft Research)
A Practical Guide to Real-Time Neural Translation
ORGANIZERS
Ondřej Bojar (Charles University in Prague)
Rajan Chatterjee (FBK)
Christian Federmann (MSR)
Barry Haddow (University of Edinburgh)
Chris Hokamp (Dublin City University)
Matthias Huck (University of Edinburgh)
Varvara Logacheva (University of Sheffield)
Pavel Pecina (Charles University in Prague)
Philipp Koehn (University of Edinburgh / Johns Hopkins University)
Christof Monz (University of Amsterdam)
Matteo Negri (FBK)
Matt Post (Johns Hopkins University)
Carolina Scarton (University of Sheffield)
Lucia Specia (University of Sheffield)
Marco Turchi (FBK)
PROGRAM COMMITTEE
- Alexandre Allauzen (Universite Paris-Sud / LIMSI-CNRS)
- Tim Anderson (Air Force Research Laboratory)
- Eleftherios Avramidis (German Research Center for Artificial Intelligence (DFKI)
- Amittai Axelrod (University of Maryland)
- Loic Barrault (LIUM, University of Le Mans)
- Fernando Batista (INESC-ID, ISCTE-IUL)
- Daniel Beck (University of Sheffield)
- Jose Miguel Benedi (Universitat Politecnica de Valencia)
- Nicola Bertoldi (FBK)
- Arianna Bisazza (University of Amsterdam)
- Graeme Blackwood (IBM Research)
- Fabienne Braune (University of Stuttgart)
- Chris Brockett (Microsoft Research)
- Christian Buck (University of Edinburgh)
- Hailong Cao (Harbin Institute of Technology)
- Michael Carl (Copenhagen Business School)
- Marine Carpuat (University of Maryland)
- Francisco Casacuberta (Universitat Politecnica de Valencia)
- Daniel Cer (Google)
- Mauro Cettolo (FBK)
- Rajen Chatterjee (Fondazione Bruno Kessler)
- Boxing Chen (NRC)
- Colin Cherry (NRC)
- David Chiang (University of Notre Dame)
- Kyunghyun Cho (New York University)
- Vishal Chowdhary (Microsoft)
- Steve DeNeefe (SDL Language Weaver)
- Michael Denkowski (Carnegie Mellon University)
- Jacob Devlin (Microsoft Research)
- Markus Dreyer (SDL Language Weaver)
- Kevin Duh (Nara Institute of Science and Technology)
- Nadir Durrani (QCRI)
- Marc Dymetman (Xerox Research Centre Europe)
- Marcello Federico (FBK)
- Minwei Feng (IBM Watson Group)
- Yang Feng (Baidu)
- Andrew Finch (NICT)
- Jose A. R. Fonollosa (Universitat Politecnica de Catalunya)
- Mikel Forcada (Universitat d'Alacant)
- George Foster (NRC)
- Alexander Fraser (Ludwig-Maximilians-Universität München)
- Markus Freitag (RWTH Aachen University)
- Ekaterina Garmash (University of Amsterdam)
- Ulrich Germann (University of Edinburgh)
- Kevin Gimpel (Toyota Technological Institute at Chicago)
- Jesus Gonzalez-Rubio (Universitat Politecnica de Valencia)
- Francisco Guzman (Qatar Computing Research Institute)
- Nizar Habash (New York University Abu Dhabi)
- Jan Hajic (Charles University in Prague)
- Greg Hanneman (Carnegie Mellon University)
- Eva Hasler (University of Cambridge)
- Yifan He (New York University)
- Kenneth Heafield (University of Edinburgh)
- John Henderson (MITRE)
- Teresa Herrmann (Karlsruhe Institute of Technology)
- Felix Hieber (Amazon Research)
- Stephane Huet (Universite d'Avignon)
- Young-Sook Hwang (SKPlanet)
- Gonzalo Iglesias (University of Cambridge)
- Abe Ittycheriah (IBM)
- Laura Jehl (Heidelberg University)
- Maxim Khalilov (BMMT)
- Roland Kuhn (National Research Council of Canada)
- Shankar Kumar (Google)
- David Langlois (LORIA, Universite de Lorraine)
- Gennadi Lembersky (NICE Systems)
- Lemao Liu (NICT)
- Qun Liu (Dublin City University)
- Zhanyi Liu (Baidu)
- Wolfgang Macherey (Google)
- Saab Mansour (RWTH Aachen University)
- Yuval Marton (Microsoft)
- Arne Mauser (Google, Inc)
- Wolfgang Menzel (Hamburg University)
- Abhijit Mishra (Indian Institute of Technology Bombay)
- Dragos Munteanu (SDL Language Technologies)
- Maria Nadejde (University of Edinburgh)
- Preslav Nakov (Qatar Computing Research Institute, HBKU)
- Graham Neubig (Nara Institute of Science and Technology)
- Jan Niehues (Karlsruhe Institute of Technology)
- Kemal Oflazer (Carnegie Mellon University - Qatar)
- Daniel Ortiz-MartÃnez (Technical University of Valencia)
- Santanu Pal (Saarland University)
- Stephan Peitz (RWTH Aachen University)
- Sergio Penkale (Lingo24)
- Daniele Pighin (Google Inc)
- Maja Popovic (Humboldt University of Berlin)
- Stefan Riezler (Heidelberg University)
- Johann Roturier (Symantec)
- Raphael Rubino (Prompsit Language Engineering)
- Alexander M. Rush (MIT)
- Hassan Sawaf (eBay Inc.)
- Jean Senellart (SYSTRAN)
- Rico Sennrich (University of Edinburgh)
- Wade Shen (MIT)
- Patrick Simianer (Heidelberg University)
- Linfeng Song (University of Rochester)
- Sara Stymne (Uppsala University)
- Katsuhito Sudoh (NTT Communication Science Laboratories / Kyoto University)
- Felipe Sanchez-Martinez (Universitat d'Alacant)
- Jörg Tiedemann (Uppsala University)
- Christoph Tillmann (IBM Research)
- Antonio Toral (Dublin City Unversity)
- Yulia Tsvetkov (Carnegie Mellon University)
- Marco Turchi (Fondazione Bruno Kessler)
- Ferhan Ture (BBN Technologies)
- Masao Utiyama (NICT)
- Ashish Vaswani (University of Southern California Information Sciences Institute)
- David Vilar (Nuance)
- Martin Volk (University of Zurich)
- Aurelien Waite (University of Cambridge)
- Taro Watanabe (NICT)
- Marion Weller (Universität Stuttgart)
- Philip Williams (University of Edinburgh)
- Shuly Wintner (University of Haifa)
- Hua Wu (Baidu)
- Joern Wuebker (RWTH Aachen University)
- Peng Xu (Google Inc.)
- Wenduan Xu (Cambridge University)
- Francois Yvon (LIMSI/CNRS)
- Feifei Zhai (The City University of New York)
- Joy Ying Zhang (Carnegie Mellon University)
- Tiejun Zhao (Harbin Institute of Technology)
- Yinggong Zhao (State Key Laboratory for Novel Software Technology at Nanjing University)
CONTACT
For general questions, comments, etc. please send email
to bhaddow@inf.ed.ac.uk.
For task-specific questions, please contact the relevant organisers.
ACKNOWLEDGEMENTS
WMT15 receives support from the European Union under the projects MosesCore
(grant number 288487), Cracker and QT21.
We thank Yandex for their donation of data for the Russian-English task.