Validating @selector: a regular expression adventureSydBaumanNortheastern University Digital Scholarship Groups.bauman@northeastern.eduZentrum für
Informationsmodellierung - Austrian Centre for Digital Humanities,
Karl-Franzens-Universität GrazAustriaZentrum für
Informationsmodellierung - Austrian Centre for Digital Humanities,
Karl-Franzens-Universität GrazAustriaGAMS - Geisteswissenschaftliches
Asset Management SystemCreative Commons
BY-NC 4.02019Grazo:tei2019.127
born digital
Paperstei2019
enTEICSS3selectorregular expression (regexp)renditionCreated from
validating_selector.txt and the Markup UK paper
Starting with P3 in 1994 (i.e., over two years before CSS1
was released), the
Guidelines supported a
mechanism to indicate a default rendition, a way of saying
all persName elements were in italics in the
original. You would put the name of an element on the
gi attribute of a tagUsage element in order
to indicate which elements had a particular default
rendition.
Starting in 2015-10 with P5 2.9.0, TEI introduced a new
method for the same purpose (and then phased out the original
method). In this new method you specify which elements a default
rendition applies to using the Cascading Style Sheets (CSS)
selection mechanism — you put a CSS selector on the
selector attribute of a rendition
element. But The TEI only defines selector as
teidata.text (which boils down to the RELAX NG
string datatype).
This struck me as insufficient; formal syntactic validation
is in order. Thus I set about writing a regular expression to
validate CSS3 selectors. This presentation is about both the
process of creating said regular expression, and the result,
which is a regular expression just over 18,300 characters long
which I believe correctly matches valid CSS3 selectors and
correctly fails to match other strings.
Topics to be addressed include the following.
How do you write such a long expression? The answer
is you don’t—you write a program to write the
expression. I wrote such a program in Perl, but plan to
re-write it in XSLT before the presentation.There are some aspects of the CSS3 specification
that aren’t entirely clear, at least not to me.According to several sources, CSS3 is not regular,
and thus it cannot be parsed with a
regexp. So how was I able to do this? I think there are
three contributing factors. I was
not dealing with all of CSS3, only with selectors;not trying to parse the selectors into their
component segments, but rather only trying to return
yes or no;unaware it was impossible until after I’d done
it.The program will generate output in either RelaxNG or XSLTThe output includes a test suite of thousands of CSS3 selectorsBecause of limitations in RelaxNG’s use of regular
expressions, the regular expression produced respects case
in some places where it should be ignored.I did not write the portion of the regular
expression that tests a BCP 47 language tag, but rather
downloaded someone else’sThe regular expression runs very quickly in RelaxNG
using jing, and very slowly in XSLT using
Saxon.