TEI 2019

What is text, really? TEI and beyond


All PapersTEI

Inscriptions, Hieroglyphs, Linguistics… and Beyond! The Corpus of Classic Mayan as an Ontological Information Resource

Franziska Diehr, Sven Gronemeyer, Uwe Sikora, Christian Prager, Maximilian Brodhun, Elisabeth Wagner, Katja Diederichs, Nikolai Grube

Keywords: non-alphabetic writing system, ontological model, stand-off markup
Slides: https://www.doi.org/10.5281/zenodo.3456882
Permalink: https://gams.uni-graz.at/o:tei2019.139

Inscriptions, Hieroglyphs, Linguistics... and Beyond! The Corpus of Classic Mayan as an Ontological Information Resource

Franziska Diehr, Sven Gronemeyer, Uwe Sikora, Christian Prager, Maximilian Brodhun, Elisabeth Wagner, Katja Diederichs, Nikolai Grube

Answering the question “What is text, really?” may be impossible, for ‘text’ being a most complex resource, fulfilling numerous purposes, manifested in diverse documents types with unique characteristics. To study ‘text’ using digital methods, some kind of representation is required. The project ‘Text Database and Dictionary of Classic Mayan’ 1 compiles the hieroglyphic texts written by the ancient Maya in a machine-readable corpus. We do so with an approach fitting the idea of “TEI and beyond”: ‘Text’ is represented as separate information resources, each described by an ontological model representing the specific semantics and complexities of the material. Using different formats (RDF, XML) and standards (CIDOC-CRM, TEI-P5), the inscriptions are encoded in a multi-level corpus: 2 (1) a TEI-all conform schema defining values and rules for the encoding of the text’s topological and structural features, (2) a ‘Sign Catalogue’ for the classification of Maya hieroglyphs (Diehr et al.), and (3) the tool ‘ALMAH’ for linguistic analyses (Grube et. al, 5-7).

Maya writing is not yet fully deciphered, not all signs are known, and we still deal with competing reading hypotheses and a missing Unicode character set. 3 To represent the script, we use stand-off markup to enable an interlinked structure between distributed sources: The TEI encoding serves as central data source, embedding other information (Fig. 1). Maya glyphs are grouped in blocks, each usually containing more than one in different arrangements. 4

Using @rend and @corresp represents this structure by describing the position to the neighboring glyph. 5 The project will encode approximately 10,000 texts. 6 To support the workflow, we developed a parser that creates the according TEI/XML structure out of a project-specific sign number transliteration (Grube et. al, 2-3).

In our approach, ‘text’ is understood as a multi-level information resource in form of an ontological corpus, offering different views and access points to the material, providing a holistic environment for studying Classic Mayan.

Figure 1: Within tei:g the value of attribute @ref refers to the URI ofthe
                        graph recorded in the Sign Catalogue. Its ontological structure links
                        thegraph to its linguistic expression, to which a transliteration value is
                        assigned.
Figure 1: Within tei:g the value of attribute @ref refers to the URI ofthe graph recorded in the Sign Catalogue. Its ontological structure links thegraph to its linguistic expression, to which a transliteration value is assigned.

References

Franziska Diehr, Sven Gronemeyer, Elisabeth Wagner, Christian Prager, Katja Diederichs, Uwe Sikora, Maximilian Brodhun, and Nikolai Grube. Modelling vagueness: A criteria-based system for the qualitative assessment of reading proposals for the deciphering of Classic Mayan hieroglyphs.In Michael Piotrowski, editor, Proceedings of the Workshop on Computational Methods in the Humanities 2018, volume 2314 of Workshop Proceedings, pages 33–44, Lausanne, Switzerland, 2019. CEUR.

Nikolai Grube, Christian Prager, Katja Diederichs, Sven Gronemeyer, Antje Grothe, Céline Tamignaux, Elisabeth Wagner, Maximilian Brodhun, and Franziska Diehr. Textdatenbank und Wörterbuch des Klassischen Maya Annual Report for 2017. Textdatenbank und Wörterbuch des Klassischen Maya, (Project Report 5), 2018.

Carlos Pallan Gayol and Deborah Anderson. Achieving Machine-Readable Mayan Text via Unicode: Blending “Old World” Script-encoding with Novel Digital Approaches. In Élika Ortega, Glen Worthey, Isabel Galina, and Ernesto Priani, editors, Digital Humanities 2018 - Puentes-Bridges: Book of Abstracts, pages 256–261, Mexico City.

Irene Rossi and Annamaria De Santis. Crossing Experiences in Digital Epigraphy: From Practice to Discipline. De Gruyter, Berlin, 2019.


1 For more information on the project, see: http://mayadictionary.de/

2 The corpus further consists of (4) an ontological-based RDF-schema for historical and scholarly information and physical features of text carriers, and (5) the ‘Maya Image Archive’ for photographic and archival material, for which we use the DARIAH service ‘ConedaKOR’: https://classicmayan.kor.de.dariah.eu/.

3 There are efforts in this direction (Pallan Gayol, Anderson), but in their current form they do not meet the classification requirements of the Maya script. These challenges are also present in other ancient, non-alphabetic writing systems (Rossi, De Santis). The interdisciplinary working group EnCoWS (Encoding Complex Writing Systems) was set up in 2015 for the purpose of harmonising encodings.

4 Depending on space requirements and aesthetics, individual signs merge, overlap, beinfixed, or rotated, depending on the sign shape and space within the block.

5 By using @corresp to refer to the neighboring glyph, we mimic a numerical transliteration(similar to the ‘Leiden Conventions’), but in a more precise way: With support of theTEI semantics and the XML syntax an unambiguous description of the glyph arrangementis provided.

6 The data will successively be made accessible under a CC BY-4.0 license on our projectportal (https://www.classicmayan.org/) which is currently in the stage of conception.Furthermore, the corpus data will also be published in the TextGrid Repository (https://textgridrep.org/), where they can also be accessed by external users via OAI-PMH.The RDF data of the Sign Catalogue will be also retrievable at the portal via a SPARQLendpoint and also at the TG Rep.