WIT3

Web Inventory of Transcribed and Translated Talks

Home 2017-01-more Training and development sets for the MT track

The IWSLT 2017 Evaluation Campaign does not include any task on English-Indonesian pair. Exceptionally, for this pair and in both directions, here you can find training, development and evaluation sets built upon the latest available XML files (April 2017) of the two languages.

For each direction, training, development and evaluation sets are linked to the corresponding entry of the table below: by clicking, an archive will be downloaded which contains the sets and a README file. Numbers in the table refer to millions of units (untokenized words) of the target side of parallel training data. If you use this corpus in your work, please cite the paper:

M. Cettolo, C. Girardi, and M. Federico. 2012. WIT3: Web Inventory of Transcribed and Translated Talks. In Proc. of EAMT, pp. 261-268, Trento, Italy. pdf, bib.


en

id
en 1.56
id1.80