Software & Data
Python packages
- SoMaJo – A tokenizer and sentence splitter for German and English web and social media texts.
- SoMeWeTa – A part-of-speech tagger with support for domain adaptation and external resources.
- pandas-association-measures – Statistical Association Measures for co-occurrence dataframes in pandas.
- cwb-ccc – A CWB wrapper to extract concordances and collocates.
Data
- GeRedE – A corpus of German Reddit exchanges.
- EmpiriST 2.0 – A manually annotated corpus consisting of German web pages and German computer-mediated communication (CMC).