The IWSLT 2017 Evaluation Campaign includes an optional TED Talks MT task. In this edition, the language pairs are eight:

  from/to English to/from Arabic, French, Japanese, Chinese.

For each language pair, training and development sets are linked to the corresponding entry of the table below: by clicking, an archive will be downloaded which contains the sets and a README file. Numbers in the table refer to millions of units (untokenized words) of the target side of parallel training data.

If you use this corpus in your work, please cite the paper:

M. Cettolo, C. Girardi, and M. Federico. 2012. WIT3: Web Inventory of Transcribed and Translated Talks. In Proc. of EAMT, pp. 261-268, Trento, Italy. pdf, bib.





ar 4.02   
en3.32 4.210.720.57
fr 4.04   
ja 3.88   
zh 4.02