
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title type="main">Character Counting</title>
        <author xml:id="sb">
          <name>
            <forename>Syd</forename>
            <surname>Bauman</surname>
          </name>
          <affiliation>Northeastern University Digital Scholarship Group</affiliation>
          <email>s.bauman@northeastern.edu</email>
        </author>
      </titleStmt>
      <publicationStmt>
        <publisher>
          <orgName corresp="https://informationsmodellierung.uni-graz.at" ref="http://d-nb.info/gnd/1137284463">Zentrum für
                   Informationsmodellierung - Austrian Centre for Digital Humanities,
                   Karl-Franzens-Universität Graz</orgName>
          <country>Austria</country>
        </publisher>
        <authority>
          <orgName corresp="https://informationsmodellierung.uni-graz.at" ref="http://d-nb.info/gnd/1137284463">Zentrum für
                   Informationsmodellierung - Austrian Centre for Digital Humanities,
                   Karl-Franzens-Universität Graz</orgName>
          <country>Austria</country>
        </authority>
        <distributor>
          <orgName ref="https://gams.uni-graz.at">GAMS - Geisteswissenschaftliches
                   Asset Management System</orgName>
        </distributor>
        <availability>
          <licence target="https://creativecommons.org/licenses/by-nc/4.0">Creative Commons
                   BY-NC 4.0</licence>
        </availability>
        <date when="2019">2019</date>
        <pubPlace>Graz</pubPlace>
        <idno type="PID">o:tei2019.143</idno>
      </publicationStmt>
      <sourceDesc>
        <p>born digital</p>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <p>
        <ref target="info:fedora/context:tei2019.demonstrations" type="context">Demonstrations</ref>
        <ref target="info:fedora/context:tei2019" type="context">tei2019</ref>
      </p>
    </encodingDesc>
    <profileDesc>
      <langUsage>
        <language ident="en">en</language>
      </langUsage>
      <textClass>
        <keywords>
          <term>character counting</term>
        </keywords>
      </textClass>
    </profileDesc>
    <revisionDesc>
      <change when="2019-08-05" who="#sb">Created from
            character_counting.txt</change>
    </revisionDesc>
  </teiHeader>
  <text>
    <body>
      <p>This is demonstration of a character counting system.
      Character counting is not particularly new, is not all that
      interesting, and is not particularly difficult. And even the
      added fact that the output is a useful table of the characters
      found in an input file, sortable by frequency, character,
      Unicode code point, or (perhaps uselessly) by Unicode name is
      not particularly remarkable.</p>
      <p>But add to that feature set the fact that the system works by
      running an XSLT program that writes an XSLT program, and it
      starts to get interesting. Furthermore, although the input file
      can be any XML document, the system will semi-intelligently
      handle several different kinds of input, currently including
      TEI, WWP, XHTML, and yaps. (That list may change before the
      conference — e.g., I am likely to add DocBook or JATS.)</p>
      <p>In all cases attribute values can be included or excluded and
      whitespace can be normalized, ignored, or left as is at user
      option via a parameter. If the system knows the input language,
      further parameters may be specified to control whether or not
      metadata is included and perhaps other details (like choosing
      <gi>corr</gi> over <gi>sic</gi>). Lastly, the system performs a
      lookup into the Unicode database to get the correct Unicode name
      of each character.</p>
    </body>
  </text>
</TEI>
