IWSLT 2016

Training, development and evaluation sets for {ar,cs,de} - {en,es}

The IWSLT 2016 Evaluation Campaign does not include any task on Arabic-German/Czech nor on English-Spanish pairs. Exceptionally, for these pairs and in both directions, here you can find training, development and evaluation sets built upon the latest available XML files (April 2016) of the two languages.

The archive with training, development and evaluation sets is available at this link.

Bilingual sets for the non-English pairs were automatically built following the pivot-based procedure, English being the pivot language, described in:

M. Cettolo. 2016. An Arabic-Hebrew parallel corpus of TED talks. In Proc. of the AMTA Workshop on Semitic Machine Translation (SeMaT), Austin, US-TX. pdf, bib.

If you use this corpus in your work, please cite the paper:

M. Cettolo, C. Girardi, and M. Federico. 2012. WIT3: Web Inventory of Transcribed and Translated Talks. In Proc. of EAMT, pp. 261-268, Trento, Italy. pdf, bib.