The .ort format

Files of type .ort contain the orthographic transcriptions in text format and can be produced, changed or viewed by means of the PRAAT software. In PRAAT the .ort files are produced by selecting the option 'Write to short text file...' in the Write-menu. For exchange purposes the ShortTextGrid format is preferred to the TextGrid format because it has a more compact notation and therefore files in the ShortTextGrid format are smaller in size. The structure of a ShortTextGrid can be described in the following fashion:


Please note: below non-literal text is indicated by means of curly brackets: {...}. The numbering of the lines has been used for the purpose of reference and is not part of the format.

The first three lines are always the same.

 1. File type = "ooTextFile short"
 2. "TextGrid"
 3. {lege regel}

On lines 4 and 5 a description is given of the timespan involved. Time here is iexpressed in terms of the number of seconds, using three decimals.

 4. {begintijdstip}
 5. {eindtijdstip}

Lines 6 and 7 describe the number of tiers that occur in the file.

 6. <exists>
 7. {aantal tiers}

Lines 8 up to and including 12 contain information about the first tier.

 8. "IntervalTier"
 9. "{Sprekernaam}"
10. {begintijdstip}
11. {eindtijdstip}
12. {aantal intervallen in tier}

Lines 13 up to and including 15 describe the very first interval.

13. {begintijdstip}
14. {eindtijdstip}
15. "{orthografisch transcript}"

Then all further intervals of the first tier occur in chronological order as in lines 13 up to and including 15. Every next tier after that follows all intervals of the preceding tier. The structure is identical to that of the first tier from line 8 onwards.


The size of the time span/interval can vary from less than 1 second up to about 10 seconds. A time marker can coincide with a sentence boundary, but this is not necessarily the case.

In the orthographic transcription a word may be marked by means of one of the following asterisk-letter codes:

*v foreign (= non-Dutch) word
*d dialect
*a incomplete word
*u slip of the tongue or onomatope
*z word with dialectal pronunciation
*x word difficult to hear

In addition, there are three special codes:

ggg a non-speech sound produced by the speaker
xxx one or more incomprehensible words or partial words
Xxx an incomprehensible word that is clearly a titel or proper name

All these codes can represent a word, part of a word or a sequence of words. Where applicable, the code may be separated from a word part by means of a hyphen ("-"). For example, "xxx-enzeventig" or "achten-xxx-tig".

The punctuation used is restricted to the following set:

"." the full stop marks the end of a sentence
"..."  the ellipsis sign marks the end of an incomplete sentence
"?" the question mark indicates the end of an interrogative sentence

All diacritics that occur in the orthographic transcription have been encoded according to the ISO 8859.1 standard. In entities.htm an overview is presented of the special characters from this set that were used (ISO column). PRAAT can represent the ISO codes correctly under UNIX (and variants) and Windows.