The .wav format

The format that has been used for the audio files is the RIFF-WAVE format. The audio encoding varies, depending on the nature of the recording. The standard audio encoding in the Spoken Dutch Corpus has a sample frequency of 16kHz and a resolution of 16 bit PCM. A different encoding has been used for the recordings of telephone dialogues. These recordings were made in two ways: they were recorded on Mini Disk (MD) via a local interface or they were recorded via a switchboard. The MD recordings have an 8 kHz sample frequency and a 16-bit PCM resolution. The recordings via a switchboard have a sample frequency of 8kHz and a 8-bit A-law resolution. In this case, speakers have been recorded on separate channels. Both channels have been combined into one stereo signal, so that the transcribers can hear both sides of the conversation. This, we found, leads to better transcriptions.

The procedure to combine the two channels has been improved over time. A number of audio files have been re-processed after they had already been transcribed. As a consequence, the orthographic transcription is correct in terms of content. However, it may happen that the segmentation is not exactly in place, but may be off by (on average) 115 ms.

The corpus comprises both stereo and mono recordings. For dialogues and multilogues generally stereo has been opted for, since this may make it easier to distinguish between speakers.

The audio files all have the extension .wav. Programs such as PRAAT and COREX, but also most other mediaplayers both on pc and other platforms can be used to play the recordings. Both PRAAT and COREX make it possible for the user to view the orthographic transcription while playing the audio.