Written by OpenOffice
Daria Maximova (National Research University Higher School of
Economics, Moscow, RU)
Frank Fischer (DARIAH-EU and National Research University Higher
School of Economics, Moscow, RU)
The <stage> tag is a core element for the encoding of drama.
The TEI guidelines suggest nine values for its type attribute,
which is widely used in large corpora such as the French Théâtre
Classique, the Shakespeare Folger Library or the Swedish Dramawebben. This paper introduces an approach to automatically assign
stage-direction types to the TEI-P5-encoded Russian Drama Corpus, RusDraCor
(https://dracor.org/). The corpus currently features 144 plays ranging from mid-18th to
mid-20th century which makes for 32 753 stage directions with 144,525 tokens.
We selected 18 plays comprising 6,569 stage directions to represent the breadth of the corpus. For the manual annotation we established a clear set of rules to identify the stage-direction types proposed by the TEI guidelines (https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-stage.html).
Following the annotation of our subcorpus, we developed a tool for the classification of the remaining plays without human interference. For the conversion of stage directions into feature vectors, we used morphological and semantic data. Our tool in its current state is able to classify different types with an F1 score of approx. 0.75, which means that 3 out of 4 stage directions of any given type are assigned correctly.
Our work will inform a dedicated analysis of stage directions, which after preliminary studies by Sperantov (1998) and Detken (2009) will be based on larger corpora allowing for a description of the evolvement of stage directions over 200 years.