TEI XML and Delta Format Interchangeability Cole, Nicholas De Roure, David Willcox, Pip Zentrum für Informationsmodellierung - Austrian Centre for Digital Humanities, Karl-Franzens-Universität Graz Austria Zentrum für Informationsmodellierung - Austrian Centre for Digital Humanities, Karl-Franzens-Universität Graz Austria GAMS - Geisteswissenschaftliches Asset Management System Creative Commons BY-NC 4.0 2019 Graz o:tei2019.145

Converted from a Word document

Papers tei2019

en JavaScript XML collaborative editing annotation processing 2019-08-30T10:20:13Z

TEI XML and Delta Format Interchangeability

Authors

Nicholas Cole, University of Oxford; David De Roure, University of Oxford; Pip Willcox, The National Archives

Abstract

This paper will interrogate the close link between TEI and XML, and examine whether the information encoded by TEI could be more easily processed (for certain applications) by expressing it in alternative formats. In particular, this paper will examine the rise of various socalled ‘Delta’ formats in the JavaScript ecosystem, that are particularly popular with projects developing multi-user, collaborative editors, and which express the structure and features of text by separating the text itself from the list of attributes associated with each block, line, or character. This offers several advantages for processing, and allows such documents to take advantage of an emerging ecosystem.

As this paper examines, for certain applications, such as the representation of negotiated texts, it offers an ability to represent and manipulate rich-text, annotated data in a way that is not possible in traditional TEI formats because algorithms to manipulate delta formats exist that have no comparable (implemented) analogues for XML documents.

This paper will lay out the current state of competing ‘delta’ formats, the advantages they offer in separating text from the attributes that describe it, the problem that some of these popular but poorly-defined formats would have encounter in trying to express a TEI document in a lossless fashion, whether a translation layer between TEI XML and such a format would be possible to achieve, and whether the academic, text-processing ecosystem would benefit from an alternative format that draws on such ideas, particularly as a transient format (with import/export to/from XML) for editing and data-processing within particular applications. Given the poorly defined state of some other useful but evolving formats, this paper will examine whether efforts by the DH community to standardize versions for academic purposes and link them to the current TEI standard would bear fruit.