The IWSLT 2018 Evaluation Campaign includes the Low Resource MT track on TED Talks. In this edition, the language pair is: from Basque to English.
Training and development sets for Basque-English (as well as additional talks for Basque-French, Basque-Spanish, Spanish-French, Spanish-English, French-English) are linked to the entry of the table below: by clicking, an archive will be downloaded which contains the sets and a README file. Numbers in the table refer to millions of units (untokenized words) of the target side of parallel Basque-English training data.
Basque_to_English training and development sets
If you use this corpus in your work, please cite the paper:
M. Cettolo, C. Girardi, and M. Federico. 2012. WIT3: Web Inventory of Transcribed and Translated Talks. In Proc. of EAMT, pp. 261-268, Trento, Italy. pdf, bib.