The .tig format

Files of type .tig (syntactic annotation) comprise a chronological representation of the syntactic annotation in an XML text format. The structure of this format is described by stext.dtd on the annotation DVD. The tig files can be found in /data/annot/xml/tig on the annotation DVD and can be viewed by means of COREX. The format is based on the Tiger format that is used in combination with the TigerSearch software. See stext.dtd on the annotation DVD for information about the compatability.

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<subcorpus name="fn123456">
<s id="fn123456.1">
<graph root="fn123456.1.506">
<terminals>
 <t id="fn123456.1.1" word="in" pos="VZ1" morph="T701"/>
 <t id="fn123456.1.2" word="de" pos="LID" morph="T602"/>
 <t id="fn123456.1.3" word="Amsterdam" pos="SPEC" morph="T005"/>
 <t id="fn123456.1.4" word="Arena" pos="SPEC" morph="T005"/>
 <t id="fn123456.1.5" word="is" pos="WW1" morph="T301"/>
 <t id="fn123456.1.6" word="Sensation" pos="N5" morph="T110"/>
 <t id="fn123456.1.7" word="de" pos="LID" morph="T602"/>
 <t id="fn123456.1.8" word="grootste" pos="ADJ3" morph="T208"/>
 <t id="fn123456.1.9" word="houseparty" pos="N1" morph="T101"/>
 <t id="fn123456.1.10" word="ter" pos="VZ3" morph="T703"/>
 <t id="fn123456.1.11" word="wereld" pos="N1" morph="T101"/>
 <t id="fn123456.1.12" word="gehouden" pos="WW7" morph="T320"/>
 <t id="fn123456.1.13" word="." pos="LET" morph="T007"/>
</terminals>
<nonterminals>
 <nt id="fn123456.1.500" cat="MWU">
  <edge label="MWP" idref="fn123456.1.3"/>
  <edge label="MWP" idref="fn123456.1.4"/>
 </nt>
 <nt id="fn123456.1.501" cat="PP">
  <edge label="HD" idref="fn123456.1.10"/>
  <edge label="OBJ1" idref="fn123456.1.11"/>
 </nt>
 <nt id="fn123456.1.502" cat="NP">
  <edge label="DET" idref="fn123456.1.2"/>
  <edge label="HD" idref="fn123456.1.500"/>
 </nt>
 <nt id="fn123456.1.503" cat="NP">
  <edge label="DET" idref="fn123456.1.7"/>
  <edge label="MOD" idref="fn123456.1.8"/>
  <edge label="HD" idref="fn123456.1.9"/>
  <edge label="MOD" idref="fn123456.1.501"/>
 </nt>
 <nt id="fn123456.1.504" cat="PP">
  <edge label="HD" idref="fn123456.1.1"/>
  <edge label="OBJ1" idref="fn123456.1.502"/>
 </nt>
 <nt id="fn123456.1.505" cat="NP">
  <edge label="HD" idref="fn123456.1.6"/>
  <edge label="APPOS" idref="fn123456.1.503"/>
 </nt>
 <nt id="fn123456.1.506" cat="SMAIN">
  <edge label="HD" idref="fn123456.1.5"/>
  <edge label="VC" idref="fn123456.1.12"/>
  <edge label="MOD" idref="fn123456.1.504"/>
  <edge label="SU" idref="fn123456.1.505"/>
 </nt>
</nonterminals>
</graph>
</s>
<s id="fn123456.2">
<graph root="fn123456.2.506">
<terminals>
 <t id="fn123456.2.1" word="zo&apos;n" pos="VNW21" morph="U528c"/>
 <t id="fn123456.2.2" word="veertigduizend" pos="TW1" morph="T401"/>
 <t id="fn123456.2.3" word="bezoekers" pos="N3" morph="T107"/>
 <t id="fn123456.2.4" word="gingen" pos="WW2" morph="T305"/>
 <t id="fn123456.2.5" word="uit" pos="VZ1" morph="T701"/>
 <t id="fn123456.2.6" word="hun" pos="VNW11" morph="U509o"/>
 <t id="fn123456.2.7" word="dak" pos="N1" morph="T102"/>
 <t id="fn123456.2.8" word="tijdens" pos="VZ1" morph="T701"/>
 <t id="fn123456.2.9" word="het" pos="LID" morph="T601"/>
 <t id="fn123456.2.10" word="dansfeest" pos="N1" morph="T102"/>
 <t id="fn123456.2.11" word="." pos="LET" morph="T007"/>
</terminals>
<nonterminals>
 <nt id="fn123456.2.500" cat="DETP">
  <edge label="MOD" idref="fn123456.2.1"/>
  <edge label="HD" idref="fn123456.2.2"/>
 </nt>
 <nt id="fn123456.2.501" cat="NP">
  <edge label="DET" idref="fn123456.2.6"/>
  <edge label="HD" idref="fn123456.2.7"/>
 </nt>
 <nt id="fn123456.2.502" cat="NP">
  <edge label="DET" idref="fn123456.2.9"/>
  <edge label="HD" idref="fn123456.2.10"/>
 </nt>
 <nt id="fn123456.2.503" cat="NP">
  <edge label="HD" idref="fn123456.2.3"/>
  <edge label="DET" idref="fn123456.2.500"/>
 </nt>
 <nt id="fn123456.2.504" cat="PP">
  <edge label="HD" idref="fn123456.2.5"/>
  <edge label="OBJ1" idref="fn123456.2.501"/>
 </nt>
 <nt id="fn123456.2.505" cat="PP">
  <edge label="HD" idref="fn123456.2.8"/>
  <edge label="OBJ1" idref="fn123456.2.502"/>
 </nt>
 <nt id="fn123456.2.506" cat="SMAIN">
  <edge label="HD" idref="fn123456.2.4"/>
  <edge label="SU" idref="fn123456.2.503"/>
  <edge label="SVP" idref="fn123456.2.504"/>
  <edge label="MOD" idref="fn123456.2.505"/>
 </nt>
</nonterminals>
</graph>
</s>
</subcorpus>

<subcorpus> sample with a syntactic annotation
<s> sentence with a syntactic annotation
<graph> graphic representation of the syntactic annotation
<terminals> list of terminal nodes, end nodes <t>.
<nonterminals> list of non-terminal nodes <nt>.
<edge> syntactic function
<secedge> syntactic function
<nt> non-terminal node
<t> terminal node
root ID of the mother node of sentence <s>.
id  unique node identification, with <sample number>.<sentence rank number>.<node number>, where <node number> relates to terminal as well as to non-terminal nodes
word  word form as it occurs in the orthographic transcription (cf. data in the .ort files)
pos part-of-speech tag of the terminal node. This POS tag is a simplified/derived version of the POS tag in morph (see below). See corpus.header (XML) on the annotation DVD or negra.header (text) also on the annotation DVD for an overview of the tagset used.
morhp part-of-speech tag corresponding to the POS tag from the attributepos. See corpus.header (XML) on the annotation DVD or negra.header (text) also on the annotation DVD for a mapping of the abbreviated label notation and the full POS tags (cf. data in the .plk files)
cat node label, the syntactic category of a non-terminal node.
label syntactic function. See corpus.header (XML) on the annotation DVD or negra.header (text) also on the annotation DVD for an explanation of the labels used.
idref reference to the id of the daughter node

All characters used from the ISO-8859.1 character set that fall outside the 7-bit range have been translated according to the Character entity references for ISO 8859-1 characters. The subset of special characters used can be found in stext.dtd on the annotation DVD. In entities.htm an overview is presented of the various standards for this character (sub)set.