Linguistic preprocessors for Arabic, Farsi and Hebrew

Linguistic preprocessors for Arabic, Farsi and Hebrew have been developed by experts for cleaning and/or normalize TED texts.

The experts are:

  • Praslav Nakov, UC Berkeley [Arabic]

  • Amin Farajian, FBK [Farsi]

  • Shachar Mirkin, Xerox Research Centre Europe [Hebrew]

These preprocessors were used for the development and the scoring of baseline systems and will be employed for the official evaluation of submissions.

You can download them from here.