
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>TEITOK – TEI based annotated corpora</title>
        <author>
          <name>
            <forename>Maarten</forename>
            <surname>Janssen</surname>
          </name>
        </author>
      </titleStmt>
      <publicationStmt>
        <publisher>
          <orgName corresp="https://informationsmodellierung.uni-graz.at" ref="http://d-nb.info/gnd/1137284463">Zentrum für Informationsmodellierung -
						Austrian Centre for Digital Humanities, Karl-Franzens-Universität
						Graz</orgName>
          <country>Austria</country>
        </publisher>
        <authority>
          <orgName corresp="https://informationsmodellierung.uni-graz.at" ref="http://d-nb.info/gnd/1137284463">Zentrum für Informationsmodellierung -
						Austrian Centre for Digital Humanities, Karl-Franzens-Universität
						Graz</orgName>
          <country>Austria</country>
        </authority>
        <distributor>
          <orgName ref="https://gams.uni-graz.at">GAMS - Geisteswissenschaftliches Asset
						Management System</orgName>
        </distributor>
        <availability>
          <licence target="https://creativecommons.org/licenses/by-nc/4.0">Creative
						Commons BY-NC 4.0</licence>
        </availability>
        <date when="2019">2019</date>
        <pubPlace>Graz</pubPlace>
        <idno type="PID">o:tei2019.142</idno>
      </publicationStmt>
      <sourceDesc>
        <p>born digital</p>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <p>
        <ref target="info:fedora/context:tei2019.demonstrations" type="context">Demonstrations</ref>
        <ref target="info:fedora/context:tei2019" type="context">tei2019</ref>
      </p>
    </encodingDesc>
    <profileDesc>
      <langUsage>
        <language ident="en">en</language>
      </langUsage>
      <textClass>
        <keywords xml:lang="en">
          <term>Corpus search</term>
          <term>TEI visualization</term>
          <term>Linguistic annotation</term>
          <term>Spoken corpora</term>
        </keywords>
      </textClass>
    </profileDesc>
  </teiHeader>
  <text>
    <body>
      <head>TEITOK – TEI based annotated corpora</head>
      <p rend="paper_abstract">
        <hi rend="paper_label">
          <seg rend="italic bold">Maarten
						Janssen</seg>
        </hi>
      </p>
      <p>TEITOK is a web based tool building, annotating, and distributing corpora, in which
				corpus files are stored in TEI/XML. It combines the needs of those how want to do
				detailed philological markup with the requirements of a searchable, annotated
				linguistic corpus, and is being used in a growing number of corpora around the
				world, primarily for historical, spoken, and learner corpora. </p>
      <p>With regards to textual mark-up, it allows the visualisation of TEI documents
				directly in a browser, using CSS and JavaScript to visualize the different TEI
				elements in a customisable way. It can display facsimile images alongside the text,
				and has additional display options for specific types of TEI documents, such as a
				line-by-line visualisation for aligned facsimile transcriptions, and a view
				including a waveform display for time-aligned audio transcriptions. </p>
      <p>With regards to linguistic annotation, it allows TEI documents to be tokenised
				inline, after which each token can be adorned with information such as POS, lemma,
				or dependency relations. And the tokenised corpus can then be automatically exported
				as a linguistic corpus using the Corpus Workbench corpus tool, making it possible to
				search through the corpus using its expressive search languages. Different from most
				corpus search interfaces, TEITOK displays TEI/XML fragments in the search results,
				including hence the full textual mark-up of the source document. </p>
      <p>For tokenised corpora, it also allows storing multiple orthographic realisations for
				each token, such as a semi-palaeographic transcription and a regularised
				orthography, which can then in turn be used in the document view to display various
				editions of the same document. The textual metadata can be used in a number of
				different ways, for instance to display all the documents in the corpus on the world
				map. And the combination of metadata and token-based annotation allows for detailed
				corpus research on richly annotated documents. </p>
    </body>
  </text>
</TEI>
