SemRelData

SemRelData (Semantic Relation Dataset)

SemRelData (Semantic Relation Dataset) is a dataset focused on contextual annotation of classical semantic relations between nominals in various genres, i.e. encyclopedic, literary and news texts, and different languages, here: English, German and Russian. The dataset is distributed under CC-BY license.

It consists of texts extracted from three different genres – encyclopedic texts, extracted from Wikipedia; newspaper articles, extracted from Wikinews; and out-of-copyright literary texts.

The resulting dataset consists of 13 news articles, 20 encyclopedic articles, and snippets from 9 literary texts, all available in parallel in the three described languages, and contains approximately 60,000 tokens, 15,000 noun compounds, 3,400 annotated relations and 9,400 transitive relations.

It is comprehensively described in:

Darina Benikova (2015): SemRelData:Multilingual Contextual Annotation and Analysis of Semantic Relations between Nominals. MA Thesis, in collaboration with Sabine Bartsch (FB2, TUDA) (pdf)

It is available for download: SemRelData.