blog

In LeAK 2 we experimented with a joint learning (Multitask) approach. I.e., a single NER tagger comprising multiple prediction headers (spans, entity classes and risks). We used the same corpus as in LeAK and combined both domains (Landlord-tenant and Transport) during the training step and achieved...

Kategorie: blog

Transparency and availability of such documents is not only important for Law Sciences, but also for the public in general, for the whole legal-tech area, and for the digitalisation of government institutions. However, according to estimates (see Coupette and Fleckner 2018, Sohn 2018) a very small a...

Kategorie: blog

A discussion on Twitter during the recent Corpus Linguistics 2021 conference led Andrew Hardie and me to consider an optimisation for simple queries in CQPweb and BNCweb (using CEQL notation). This TAB optimisation provides up to 10x faster execution of CEQL queries for fixed phrases or part-of-spee...

Kategorie: blog