Methodological foundations of corpus research and digital humanities
Corpus research in linguistics as well as in the digital humanities and social sciences relies on a wide range of statistical techniques and visualizations. A central goal of our research is to develop sound methodological foundations for corpus linguistics, which address key problems in order to ensure that quantitative analyses are both reliable and meaningful.
Research activities
- Quantitative methodology for literary stylometry (e-Humanities-Zentrum KALLIMACHOS)
Project funding
- KALLIMACHOS Centre for Digital Humanities: corpus-linguistic approaches and statistical methodology (phase 1), linguistic complexity in literary stylometry (phase 2)
(10/2014 – 09/2019) - Efficient simulation experiments for large-scale parameter optimisation of machine learning approaches in natural language processing
(10/2016 – 09/2017)
Key publications
- Evert, Stefan; Proisl, Thomas; Jannidis, Fotis; Reger, Isabella; Pielström, Steffen; Schöch, Christof; Vitt, Thorsten (2017). Understanding and explaining Delta measures for authorship attribution. Digital Scholarship in the Humanities 22(suppl_2), ii4–ii16.
- Evert, Stefan and Neumann, Stella (2017). The impact of translation direction on characteristics of translated texts. A multivariate analysis for English and German. In G. De Sutter, M.-A. Lefer, and I. Delaere (eds.), Empirical Translation Studies. New Theoretical and Methodological Traditions (TiLSM 300), pages 47–80. Mouton de Gruyter, Berlin.
☞ online supplement - Evert, Stefan; Wankerl, Sebastian; Nöth, Elmar (2017). Reliable measures of syntactic and lexical complexity: The case of Iris Murdoch. In Proceedings of the Corpus Linguistics 2017 Conference, Birmingham, UK.
- Evert, Stefan and Arppe, Antti (2015). Some theoretical and experimental observations on naïve discriminative learning. In Proceedings of the 6th Conference on Quantitative Investigations in Theoretical Linguistics (QITL-6), Tübingen, Germany.
- Baroni, Marco and Evert, Stefan (2007). Words and echoes: Assessing and mitigating the non-randomness problem in word frequency distribution modeling. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), pages 904–911, Prague, Czech Republic.
- Evert, Stefan (2006). How random is a corpus? The library metaphor. Zeitschrift für Anglistik und Amerikanistik 54(2), 177–190.
2017
- Evert, S., & Neumann, S. (2017). The impact of translation direction on characteristics of translated texts. A multivariate analysis for English and German. In De Sutter G, Lefer M, Delaere I (Eds.), Empirical Translation Studies. New Theoretical and Methodological Traditions. (pp. 47-80). Berlin: Mouton de Gruyter.
- Evert, S., Wankerl, S., & Nöth, E. (2017). Reliable measures of syntactic and lexical complexity: The case of Iris Murdoch. Paper presentation, Birmingham, GB.
2015
- Evert, S., & Arppe, A. (2015). Some theoretical and experimental observations on naïve discriminative learning. In Proceedings of the 6th Conference on Quantitative Investigations in Theoretical Linguistics (QITL-6). Tübingen, Germany.
- Evert, S., Proisl, T., Jannidis, F., Pielström, S., Schöch, C., & Vitt, T. (2015). Towards a better understanding of Burrows's Delta in literary authorship attribution. In Proceedings of the Fourth Workshop on Computational Linguistics for Literature (pp. 79--88). Denver, CO.
2014
- Diwersy, S., Evert, S., & Neumann, S. (2014). A weakly supervised multivariate approach to the study of language variation. In Szmrecsanyi B, Wälchli B (Eds.), Aggregating Dialectology, Typology, and Register Analysis. Linguistic Variation in Text and Speech. (pp. 174–204). Berlin, Boston: De Gruyter.
2007
- Baroni, M., & Evert, S. (2007). Words and Echoes: Assessing and Mitigating the Non-Randomness Problem in Word Frequency Distribution Modeling. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (pp. 904-911). Prague, Czech Republic.
2006
- Evert, S. (2006). How Random is a Corpus? The Library Metaphor. Zeitschrift für Anglistik und Amerikanistik, 54(2), 177-190.
Events
- Open-source course on Statistical Inference – A Gentle Introduction for (Computational) Linguists (LinC 2018, Birmingham 2016, MaLT 2015, Zürich 2010, EMA 2008, DGfS/CL 2007, …)
- Tutorial / course on Type-Token Distributions & Zipf’s Law (LREC 2018, Birmingham 2018, ESSLLI 2006)