WIT3

Web Inventory of Transcribed and Translated Talks

Home 2018-01 Training and development sets for the MT track

The IWSLT 2018 Evaluation Campaign includes the Low Resource MT track on TED Talks. In this edition, the language pair is:

  from Basque to English.

Training and development sets for Basque-English (as well as additional talks for Basque-French, Basque-Spanish, Spanish-French, Spanish-English, French-English) are linked to the entry of the table below: by clicking, an archive will be downloaded which contains the sets and a README file. Numbers in the table refer to millions of units (untokenized words) of the target side of parallel Basque-English training data.

If you use this corpus in your work, please cite the paper:

M. Cettolo, C. Girardi, and M. Federico. 2012. WIT3: Web Inventory of Transcribed and Translated Talks. In Proc. of EAMT, pp. 261-268, Trento, Italy. pdf, bib.


en
eu0.66