In addition, for ten language pairs training, development and evaluation sets are provided:
from German, Dutch, Polish, Portoguese-Brazil, Romanian, Russian, Slovak, Slovenian, Turkish and Chinese to English
Submitted runs on additional pairs will be evaluated as well, in the hope to stimulate the MT community to evaluate systems on common benchmarks and to share achievements on challenging translation tasks.
For each language pair, training and development sets are linked to the corresponding entry of the table below: by clicking, an archive will be downloaded which contains the sets and a README file. Numbers in the table refer to millions of units (untokenized words) of the target side of parallel training data.
If you use this corpus in your work, please cite the paper: