Linguistic preprocessors for Arabic, Farsi and Hebrew
Linguistic preprocessors for Arabic, Farsi and Hebrew have been developed by experts for cleaning and/or normalize TED texts.
The experts are:
Praslav Nakov, UC Berkeley [Arabic]
Amin Farajian, FBK [Farsi]
Shachar Mirkin, Xerox Research Centre Europe [Hebrew]
These preprocessors were used for the development and the scoring of baseline systems and will be employed for the official evaluation of submissions.
You can download them from here.