WIT3

Web Inventory of Transcribed and Translated Talks

Home 2015-01-test Evaluation sets for the MT track

The IWSLT 2015 Evaluation Campaign includes the MT track on TED Talks. In this edition, the official language pairs are ten:

  from/to English to/from French, German, Chinese, Thai*, Vietnamese

Additionally, this year the Czech language is proposed as a special guest; submitted runs

 from/to English to/from Czech

will be evaluated as well, in the hope to stimulate the MT community to evaluate systems on common benchmarks and to share achievements on challenging translation tasks.

For each language pair, test sets are linked to the corresponding entry of the table below: by clicking, an archive will be downloaded which contains the sets and a README file.

If you use this corpus in your work, please cite the paper:

M. Cettolo, C. Girardi, and M. Federico. 2012. WIT3: Web Inventory of Transcribed and Translated Talks. In Proc. of EAMT, pp. 261-268, Trento, Italy. pdf, bib.

(*) Thai texts are segmented at word level according to the guideline defined at the InterBEST 2009; the document can be found here.


cs

de

en

fr

th

vi

zh
cs  click to get
test sets
    
de  click to get
test sets
    
enclick to get
test sets
click to get
test sets
 click to get
test sets
click to get
test sets
click to get
test sets
click to get
test sets
fr  click to get
test sets
    
th  click to get
test sets
    
vi  click to get
test sets
    
zh  click to get
test sets