GermEval 2019 Shared Task on hierarchical classification of German Blurbs
First Call for Participation

This is the call to participate in the shared task on hierarchical
classification on blurbs at the GermEval 2019. We invite everyone
interested to participate in this shared task. The shared task can be
found under this webpage: https://competitions.codalab.org/competitions/21226.

Introduction

Hierarchical multi-label classification (HMC) of blurbs is the task of
classifying multiple labels for a short descriptive text, where each
label is part of an underlying hierarchy of categories. The increasing
amount of available digital documents and the need for more and finer-grained
categories calls for new, more robust and sophisticated text
classification methods. Large datasets often incorporate a hierarchy
for, which can be used to categorize information of documents on
different levels of specificity. The traditional multi-class text
classification approaches are thoroughly researched, however, with the
increase of available data and the necessity of more specific
hierarchies and since traditional approaches fail to generalize
adequately, the need for more robust and sophisticated classification
methods increases.

With this task we aim to foster research within this context. This
task is focusing on classifying German books into their respective
hierarchically structured writing genres using short advertisement
texts (blurbs) and further meta information such as author, page
number, release date, etc.


Tasks

This shared task consists of two subtask, described below. You can
participate in one of them, or in both.

Subtask A: The task is to classify German books into one or multiple
most general writing genres. Therfore, it can be considered a
multi-label classification task. In total, there are 8 classes that
can be assigned to a book: Literatur & Unterhaltung, Ratgeber,
Kinderbuch & Jugendbuch, Sachbuch, Ganzheitliches Bewusstsein, Glaube
& Ethik, Künste, Architektur & Garten.

SubTask B: The second task targets hierarchical multi-label
classification into multiple writing genres. In addition to the very
general writing genres, additional genres of different specificity can
be assigned to a book. In total, there are 343 different classes that
are hierarchically structured on up to 4 levels.


Data

The dataset for this task consists in total of 20,784 examples. Sample
data is already available to get familiar with the data structure of
this task. Training data is going to be released in February and can
be downloaded after registering for the shared tasks.
The evaluation of the task will take place in July 2019. More
information can be found on the GermEval-2019 website at:
https://competitions.codalab.org/competitions/21226


Important Dates
* Jan 2019: Release of trial data
* Feb 01, 2019: Release of training data (train + validation)
* Jun 01, 2019: Release test data
* July 15, 2019: Final submission of test results
* July 30, 2019: Submission of description paper
* Aug, 2019: Workshop in Nürnberg/Erlangen, Germany at the Conference on
Natural Language Processing KONVENS 2019 (https://dgfs.de/de/cl/konvens.html)


Organizers
The task is organized by Rami Aly, Steffen Remus and Chris Biemann from
Language Technology, Department of Informatics, Universität Hamburg.
https://www.inf.uni-hamburg.de/en/inst/ab/lt/home.html


GermEval
GermEval is a series of shared task evaluation campaigns that focus on
Natural Language Processing for the German language. GermEval has
been conducted four times since 2014 in co-location with
KONVENS/GSCL conferences.