
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>TEI as a Graph</title>
        <author>
          <name>
            <surname>Kuczera</surname>
            <forename>Andreas</forename>
          </name>
        </author>
      </titleStmt>
      <editionStmt>
        <edition>
          <date>2019-04-29T22:18:03.513250709</date>
        </edition>
      </editionStmt>
      <publicationStmt>
        <publisher>
          <orgName corresp="https://informationsmodellierung.uni-graz.at" ref="http://d-nb.info/gnd/1137284463">Zentrum für
                   Informationsmodellierung - Austrian Centre for Digital Humanities,
                   Karl-Franzens-Universität Graz</orgName>
          <country>Austria</country>
        </publisher>
        <authority>
          <orgName corresp="https://informationsmodellierung.uni-graz.at" ref="http://d-nb.info/gnd/1137284463">Zentrum für
                   Informationsmodellierung - Austrian Centre for Digital Humanities,
                   Karl-Franzens-Universität Graz</orgName>
          <country>Austria</country>
        </authority>
        <distributor>
          <orgName ref="https://gams.uni-graz.at">GAMS - Geisteswissenschaftliches
                   Asset Management System</orgName>
        </distributor>
        <availability>
          <licence target="https://creativecommons.org/licenses/by-nc/4.0">Creative Commons
                   BY-NC 4.0</licence>
        </availability>
        <date when="2019">2019</date>
        <pubPlace>Graz</pubPlace>
        <idno type="PID">o:tei2019.126</idno>
      </publicationStmt>
      <sourceDesc>
        <p>Written by OpenOffice</p>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <p>
        <ref target="info:fedora/context:tei2019.demonstrations" type="context">Demonstrations</ref>
        <ref target="info:fedora/context:tei2019" type="context">tei2019</ref>
      </p>
    </encodingDesc>
    <profileDesc>
      <langUsage>
        <language ident="en">en</language>
      </langUsage>
      <textClass>
        <keywords>
          <term>TEI</term>
          <term>Text as a Graph</term>
        </keywords>
      </textClass>
    </profileDesc>
    <revisionDesc>
      <listChange>
        <change>
          <date>2019-07-31T22:27:28.450810655</date>
        </change>
      </listChange>
    </revisionDesc>
  </teiHeader>
  <text>
    <body>
      <p>TEI as a Graph</p>
      <p>Andreas Kuczera, Academy of Science and Literature, Mainz</p>
      <p>As TEI is not a format, though many people think it is. It&apos;s a de facto standard that
                specifies Guidelines for document interchange. Actually the Guidelines are based on
                the XML but this is only one possible technical way of expressing the phenomenons.
                In the graph you can use multi-hierarchical annotations layers. Graph models are
                very easy to read and understand. So DH-People and “normal” scientists have a level
                of discussion in common. A Graph can be expressed as RDF so the step from a Graph to
                linked open data is easy to make. </p>

      <p>In this paper a small xml-example in DTA-Base-Format will be imported into the
                graph-database neo4j and then be converted to the Standoff-Property-Json-Format<note n="1" place="foot" xml:id="ftn1"> The Standoff Property Format is explained in
                    detail in Iian Neill, Andreas Kuczera: The Codex – an Atlas of Relations. In:
                    Die Modellierung des Zweifels – Schlüsselideen und -konzepte zur graphbasierten
                    Modellierung von Unsicherheiten. Hg. von Andreas Kuczera / Thorsten Wübbena /
                    Thomas Kollatz. Wolfenbüttel 2019. (= Zeitschrift für digitale
                    Geisteswissenschaften / Sonderbände, 4) text/html Format. DOI:
                    10.17175/sb004_008.</note> but this toolchain works for every TEI-XML-file.<note n="2" place="foot" xml:id="ftn2"> The example is the xml export of folio 11 of
                    the notes ofGotthilf Friedrich Patzig about Humboldts Kosmos-Lecture accessible
                    in the German Textarchive (<ptr target="http://www.deutschestextarchiv.de/book/show/30962"></ptr>).</note> The
                exported Standoff-Property-Json data can then be imported into the
                Standoff-Property-Editor SPEEDy, which can manage multi-hierarchical annotations.
                Standoff-Formats are well-known but they have some limitations. So you are not
                allowed to change the base text (datum) after having started with the annotations as
                the indexes would be damaged. In our system annotated documents can be edited as the
                indexes are recalculated when the document is saved. </p>
      <div rend="P18" type="div1">
        <head>Convert DTA-XML with neo4j to Standoff Property JSON </head>
        <p>In a first step we import a small xml-example into a neo4j (https://neo4j.com)
                    instance using apoc.import.xml (https://github.com
                    /neo4j-contrib/neo4j-apoc-procedures-function) </p>

        <p>The example is one folio (https://seafile.rlp.net/f/6282a26504cc4f079ab9/?dl=1)
                    from the DTA (https://www.deutschestextarchiv.de). Here you can find the
                    XML-Testfile and this is the <ref target="http://www.deutschestextarchiv.de/patzig_msgermfol841842_1828/11">Link
                        (http://www.deutschestextarchiv.de/patzig_msgermfol841842_1828/11)</ref> to
                    the DTA-Version.</p>
      </div>
      <div rend="P19" type="div1">
        <head>Import into neo4j </head>
        <p>The import into neo4j runs with: </p>
        <p>// Import xml-example from DTA to neo4j<lb></lb>call
                    apoc.xml.import(&apos;https://seafile.rlp.net/f/6282a26504cc4f079ab9/?dl=1&apos;,
                    {connectCharacters: true, charactersForTag:{lb:&apos; &apos;}, filterLeadingWhitespace:
                    true}) yield node<lb></lb>return node;</p>
        <p>
          <figure>
            <graphic mimeType="image/png" url="info:fedora/o:tei2019.126" xml:id="IMAGE.1"></graphic>
            <desc>Figure1: TEI-XML-Example in neo4j (Kuczera).</desc>
          </figure>
        </p>
        <p>In Figure 1 you can see a snippet of the example in the Graph-Database. In this
                    import to the graph-database the xml-file is imported as an xml-tree with the
                    root-element at the top level. The hierarchy of the xml is expressed with
                    IS_CHILD_OF, FIRST_CHILD_OF, LAST_CHILD_OF etc. edges connecting all elements
                    which are converted to nodes of type XmlTag for the elements or XmlCharacter for
                    the text. The seriality of the XML-file is expressed by NEXT, which make
                    reexporting XML possible. In addition all text nodes are connected by NE edges,
                    connecting all text without any elements in between. Whitespaces become a
                    textnode on their own. The example shows that importing a
                    DTA-Baseformat-XML-File keeps all informations from the xml-version and
                    re-exporting to xml is possible. </p>
      </div>
      <div rend="P18" type="div1">
        <head>Export from neo4j to Standoff Property JSON</head>
        <p>
          <figure>
            <graphic mimeType="image/png" url="info:fedora/o:tei2019.126" xml:id="IMAGE.2"></graphic>
            <desc> Figure2: TEI-as-a-Graph in the Standoff-Property-Editor SPEEDy
                            (Kuczera).</desc>
          </figure>
        </p>
        <p>The next step is to export the data with some <ref target="https://wiki.tei-c.org/index.php?title=Cypher">cypher</ref>to the
                    Standoff-Property JSON-Format, which can be directly copied out of the
                    neo4j-browser-window. This json can then be imported in the [SPEEDy
                    (https://github.com/argimenes/standoff-properties-editor)] Standoff Property
                    Editor which can be found on <ref target="https://github.com/argimenes/standoff-properties-editor">GitHub
                        (https://github.com/argimenes/standoff-properties-editor)</ref>. </p>
        <p>In the README-Section of the SPEEDy Github Repo you can find a <ref target="https://argimenes.github.io/standoff-properties-editor/">Link
                        (https://argimenes.github.io/standoff-properties-editor/)</ref>to the
                    Test-Instance hosted on Github-Pages. We have prepared the example in SPEEDy.
                    Just select „TEI-XML → SPEEDY IV“ in the file-Section and load the data. Below
                    the Editor-Window you can press the UNBIND-Button and inspect the exported json
                    in the window below.</p>
        <p>Figure 2 shows the results of the conversion without any further treatment by
                    hand. The plain text is the result of the xml-file with all elements deleted and
                    not very good to read. But if you select a part of the text the according
                    annotations are shown below the editor window, so the semantic is not lost. </p>

        <p>Further steps will be some algorithms to put deleted text in an annotation of the
                    added text to get a readable text which then can be annotated further. Another
                    task is developing an export function to xml. Another approach could be to do
                    the refactoring of the xml in the graph-database to get clean Standoff-Data out
                    of the Graph-DB. From my point of view TEI as Graph can be the next technical
                    step for TEI to get better support and linking to Linked Open Data projects and
                    to overcome the uni-dimensional restriction of xml.I want to say thanks to
                    Stefan Armbruster from neo4j for the export-cypher-query and the implementation
                    of the XML-Import funkctions to apoc
                    (https://github.com/neo4j-contrib/neo4j-apoc-procedures-function) and Iian Neill
                    for his work on <ref target="https://github.com/argimenes/standoff-properties-editor">SPEEDy
                        (https://argimenes.github.io/standoff-properties-editor/)</ref>. </p>
      </div>
    </body>
  </text>
</TEI>
