TEI 2019

What is text, really? TEI and beyond

All PostersTEI

Distant Spectators: Mining TEI-encoded periodicals of the Enlightenment

Alexandra Fuchs, Bernhard Geiger, Elisabeth Hobisch, Philipp Koncar, Sanja Saric, Martina Scholger

Keywords: quantitative methods, distant reading, journals, Enlightenment
Slides: http://doi.org/10.5281/zenodo.3451439
Permalink: https://gams.uni-graz.at/o:tei2019.186

The poster will present the idea behind and first steps in the recently started project Distant Spectators: Distant reading for periodicals of the Enlightenment. The objective of this project is the application of distant reading and text-mining methods (topic modeling, meme diffusion, stylometry, sentiment analysis, network analysis) to the Spectators press, a journalistic genre of the 18th century Enlightenment, and the combination of these methods with the already existing expertise gained from close reading. This will provide an insight into the formation of trans-European ideas, literary techniques and cultural practices by employing quantitative methods to investigate authorship attribution, editorial networks, distribution of topics, transfer of micro-narratives etc.

The project builds on an existing and ongoing digital edition project, The Spectators in the International Context, which has been running since 2008 (https://gams.uni-graz.at/spectators). Currently it incorporates approximately 4000 individual texts in six languages (French, Italian, Spanish, English, German, Portuguese) with more than 9 million tokens. The discourses are encoded in TEI, representing the text structure and the narrative forms (e.g. reader's letter, fable, dreams) in the texts, and building registers of names, works, and places.

The TEI encoding builds the basis for the computational analysis. The benefit of the application of quantitative methods on the basis of an elaborate TEI model is the flexibility in building collections drawing on the affiliation of single texts to specific journals, to certain time periods, to individual keywords, etc. encoded in the TEI Header. Furthermore, specific textual structures and narrative features can be extracted and analyzed in relation to the entire corpus. A particular challenge is the compilation of a representative corpus for the application of quantitative methods due to the a) multilingual text-corpus, b) brevity of single discourses, and c) short period of publishing.

The main objective is to investigate how and which quantitative methods prove useful for the analysis of this multilingual corpus from the 18th century.