IWSLT 2014

Evaluation sets for the MT track

The IWSLT 2014 Evaluation Campaign includes the MT trackon TED Talks. In this edition, the official language pairs are five:

from English to French

from English to German

from German to English

from English to Italian

from Italian to English

Optional tasks are proposed with English paired in both directions with other twelve languages:

from/to English to/from Arabic, Spanish, Farsi, Hebrew, Dutch, Polish, Portuguese-Brazil, Romanian, Russian, Slovenian, Turkish and Chinese

Linguistic preprocessors for normalize/cleaning Arabic, Farsi and Hebrew texts are available here; they were used for the development and the scoring of baseline systems and will be employed for the official evaluation of submissions.

Submitted runs on optional pairs will be evaluated as well, in the hope to stimulate the MT community to evaluate systems on common benchmarks and to share achievements on challenging translation tasks.

The archive with test sets is available at this link.

If you use this corpus in your work, please cite the paper:

M. Cettolo, C. Girardi, and M. Federico. 2012. WIT3: Web Inventory of Transcribed and Translated Talks. In Proc. of EAMT, pp. 261-268, Trento, Italy. pdf, bib.