Vortrag im Rahmen des Oberseminars am 7.11.2018
Am Mittwoch, den 7.11.2018, findet im Rahmen des Oberseminars
im Fach Linguistische Informatik der Vortrag „MULTEXT-East:
morphosyntactic resources for Central and Eastern European languages“
von Herrn Professor Dr. Tomaž Erjavec (Jožef Stefan Institute,
Ljubljana, Slowenien), statt.
Zeit:
16:15 – 17:45 Uhr
Ort:
Seminarraum 4.000, Bismarckstr. 6,
91054 Erlangen
Abstract:
The talk will present the MULTEXT-East language resources, a multilingual
dataset for language engineering research, focused on the morphosyntactic
level of linguistic description. The MULTEXT-East dataset includes the
morphosyntactic specifications, morphosyntactic lexica, and a parallel
corpus, the novel “1984” by George Orwell, which is sentence aligned and
contains hand-validated morphosyntactic descriptions and lemmas.
The resources are uniformly encoded in XML, using the Text Encoding
Initiative Guidelines, TEI P5, and cover 16 languages, mainly from Central
and Eastern Europe: Bulgarian, Croatian, Czech, English, Estonian,
Hungarian, Macedonian, Persian, Polish, Resian, Romanian, Russian,
Serbian, Slovak, Slovene, and Ukrainian. This dataset, unique in terms
of languages covered and the wealth of encoding, is extensively
documented, and freely available for research purposes. The talk overviews
the MULTEXT-East resources by type and language and gives some
conclusions and directions for further work.
Alle Interessierten sind herzlich eingeladen.