The .awd format

Files of type .awd (to found in /data/annot/text/awd of the annotation DVD) comprise an automatically generated  word segmentation in which the words of the orthographic transcription have been linked to the audio signal. The files also contain an automatically generated phoneme segmentation in which the individual phonemes have been linked to the audio signal. The files are in ShortTextGrid format and can be produced, changed or viewed by means of the PRAAT software. For a description of the ShortTextGrid format, see the description of the .ort format. For each speaker three tiers are envisaged. The first tier bears the speakercode as tier name and is identitical to the tier with the same name in the  .ort file. The next tier has the same name with the suffix _FON (N98765 and N98765_FON resp.) and comprise an automatically generated phonetic transcription. The time markers on the two tiers are identical. Finally, there is a third tier with the same name and with the suffix  _SEG (N98765_SEG). In this tier the underlying phoneme segmentations are represented that correspond to the words on the other two tiers.

An interval in the tier with the orthographic transcription contains exactly one word  (with or without underscores), a single underscore ("_"), a pause (empty interval), or a text (multiple words) as they occur exactly in the same interval in the orthographic transcritpion (.ort file). In the latter case the tiers with the phonetic transcription and the phoneme segmentation is occupied by the automatically generated phonetic transcription without segmentation information. Moreover, in all three tiers intervals of this type an exclamation mark  "!" precedes the text, which indicates that the segmentation (which is absent) is unreliable. An exaclamation mark "!" can also occur if a segmentation is present but was found to be unreliable (by some standard).

In the tier with the phonetic transcription the following phenomena may occur:

In the tier with the phoneme segmentation empty intervals or intervals with a single phoneme symbol only occur in cases where the  "_" segment from the orthographic transcription and phonetic transcription is labelled with the shared phoneme (a plosive). In a similar fashion a shared phoneme that is not a plosive is represented in a single tier in which the boundaries in the orthographic and phonetic tier occur in the middle of the interval.

For an overview of the phonetic symbols that have been used, see the description of the .fon format. Analogous to the .wrd format the .awd file does not comprise a BACKGOUND and/or COMMENT tier.