TEI 2019

What is text, really? TEI and beyond


All PapersTEI

Towards larger corpora of Indic texts: For now, minimize metatext

Himal Trikha

Keywords: Jainism, Sanskrit, Philology
Permalink: https://gams.uni-graz.at/o:tei2019.188

Towards larger corpora of Indic texts: For now, minimize metatext

Himal Trikha (himal.trikha@oeaw.ac.at)

#id_OrgXref.orgaa445faThe Digital Corpus of Vidyānandin’s works (DCVW) is an ongoing collection of digital text resources for the works of a 10th century Sanskrit author. The resources are assembled and maintained in the context of my Indological research specialization, i.e., the history of an Indian philosophical tradition. A web interface (http://www.dipal.org/dcvw) allows to access the resources and, to some extent, modify them.

The digital resources are XML-files that are processed by a bundle of technologies in order to pursue specific research interests: search for text strings, identification of dialogic or intertextual elements, differences between attestations etc. In this context, the quality of the results depends on the quality of the resource files, which are assessed by three basic criteria: (a) status of the separation of text and metatext, (b) quality of the captured text and (c) compliance of the metatext to an established terminology. For the latter I use TEI markup on basically two levels: (1) markup for a precise identification of the attestation of the text and its specific editorial features and (2) markup to enrich the text from the perspective of my own research interests.

The presentation will provide examples for the applied markup. I will argue that the use of tag sets within the first category is certainly an indispensable prerequisite for long term efforts to build larger and larger corpora of Indic texts. The tag sets within the secondary category, on the other hand, seem to be of no relevance for this goal. The energy invested in the refinement of technically demanding tag sets is an asset for scholars who are so inclined. In the current state of Digital Indology, however, it is still necessary to develop standards for the discipline as a whole before we can start to agree on the basic ones.