WIT3

Web Inventory of Transcribed and Translated Talks

Home 2013-01 Training and development sets for the MT track

The IWSLT 2013 Evaluation Campaign includes the MT track on TED Talks. In this edition, the official language pairs are three:

  from English to French
  from English to German
  from German to English

Optional tasks are proposed with English paired in both directions with other twelve languages:

  from/to English to/from Arabic, Spanish, Farsi, Italian, Dutch, Polish, Portuguese-Brazil, Romanian, Russian, Slovenian, Turkish and Chinese

Submitted runs on additional pairs will be evaluated as well, in the hope to stimulate the MT community to evaluate systems on common benchmarks and to share achievements on challenging translation tasks.

For each language pair, training and development sets are linked to the corresponding entry of the table below: by clicking, an archive will be downloaded which contains the sets and a README file. Numbers in the table refer to millions of units (untokenized words) of the target side of parallel training data.

Reference performance of baseline systems are available here.

If you use this corpus in your work, please cite the paper:

M. Cettolo, C. Girardi, and M. Federico. 2012. WIT3: Web Inventory of Transcribed and Translated Talks. In Proc. of EAMT, pp. 261-268, Trento, Italy. pdf, bib.


ar

de

en

es

fa

fr

it

nl

pl

pt-br

ro

ru

sl

tr

zh
ar  2.49            
de  2.33            
en2.072.20 2.491.532.722.442.251.832.452.481.800.201.590.35
es  2.59            
fa  1.29            
it  2.61            
nl  2.39            
pl  2.43            
pt-br  2.55            
ro  2.60            
ru  2.17            
sl  0.24            
tr  2.22            
zh  2.53