IWSLT 2018

Training and development sets for the MT track 

The IWSLT 2018 Evaluation Campaign includes the Low Resource MT track on TED Talks. In this edition, the language pair is:  from Basque to English.

Training and development sets for Basque-English (as well as additional talks for Basque-French, Basque-Spanish, Spanish-French, Spanish-English, French-English) are linked to the entry of the table below: by clicking, an archive will be downloaded which contains the sets and a README file. Numbers in the table refer to millions of units (untokenized words) of the target side of parallel Basque-English training data.

Basque_to_English training and development sets

If you use this corpus in your work, please cite the paper:

M. Cettolo, C. Girardi, and M. Federico. 2012. WIT3: Web Inventory of Transcribed and Translated Talks. In Proc. of EAMT, pp. 261-268, Trento, Italy. pdf, bib.