TEI 2019

What is text, really? TEI and beyond


All WorkshopsTEI

Interface design and user involvement – Wiennerisches Diarium Digital

Dario Kampkaspar

Keywords: Digital Edition, User involvement
Permalink: https://gams.uni-graz.at/o:tei2019.179

The Wien[n]erisches Diarium Digital – Digitarium is an ongoing project that aims at making available one of the oldest newspapers in the world available as a high quality full text. The Wiennerisches Diarium, now Wiener Zeitung, first appeared on 8th August 1703, initially with 2 issues per week of usually 8–12 pages each but reaching over 40 pages regularly by the second half of the 18th century. From October 1813, there were daily issues (including Sundays).

The project makes use of the Transkribus software and Handwritten Text Recognition (HTR) models trained specifically on the newspaper’s issues to achieve a reasonably high quality full text – on average, less than 1.5% character error rate (CER) – from automated processing. During the first 2 years, the project implemented this automated workflow and improved the HTR models by making available 420 issues from 1703 – 1799 (5 per year where images were already available). In the end, the team wants to include all issues from 1703 until the 1940es in an extensive corpus with a versatile frontend that can cater to different research needs.

While a recent grant application (the outline of which had been presented in Tokyo) has not been successful, the project team still is intent on developing an interface together with researchers and the interested public alike. Several questions that arise from the serial nature of the source, the amounts of text involved as well as the linguistic changes over more than 200 years have to be addressed and combined with a user centred design approach so that the texts can be presented, read, searched and otherwise reused easily.

This one-day workshop wants to include the TEI community in this development. The first part will introduce participants to the project, its workflow and the current web interface. A short survey is to collect the initial reactions to the interface.

The second part will focus on research questions that can be answered by periodical texts and how both the framework in which they are presented and the encoding of the texts and their metadata can support a wide variety of research disciplines. Participants will be asked to try to answer a research question from a field of their choosing and record what steps they take, what functionality they would like to see included in the web frontend to help them in their research and whether, and if so, how, they would like to contribute in improving the quality of the source material.

Both parts will be connected by the presentation of (and comparison to) the results of a two day conference and several “annotate-a-thons” held from 2017 – 2019.

The results of the workshop will be discussed in the form of an article for the JTEI while also being an important basis for the further development of the framework used to present the Diarium’s texts (which of course is available as an open source: wdbplus). The texts will be made available separately later this year.

Participants should have a laptop with a working internet connection so they can use the projects website. The room needs a means of projection. Those who want to use their own data and/or install the presentation tool themselves need to have eXist installed and should, if possible, already have wdbplus running.

While the Diarium texts are in German, knowledge of German is not necessary. However, if participants want to suggest a periodical in a different language for use in the project, this is highly welcome and will be included for use during the workshop.