Corpus

For both qualitative and quantitative purposes, the project collects social media data for both Danish and German from two of the largest players in this domain, Facebook and Twitter.

Within the project, our corpus fulfills multiple functions:

First of all, it is a source of hate speech examples for qualitative analysis, interviews and experimental subtasks.
Second, it can be used for the identification of slurs and linguistic patterns typical of hate speech.
And third, it should allow statistical evaluation and comparison with background data. In order to support these tasks and make efficient use of the corpus, the raw text data had to be filtered, linguistically processed, and turned into a searchable database with a user interface.