Frequency lists

On the basis of the data in the corpus a number of frequency lists were derived that provide information regarding the frequency of occurrence of word forms, POS tags and lemmas and combinations of these. There is also a frequency list of all the word forms and their phonetic transcriptions which is based on the data for which a manually verified phonetic transcription is available. The frequency lists can be found in de directory /data/lexicon/ of the annotation DVD; all files can be identified on the basis of the extension .frq. To the word forms special codes may be attached. Between the word form and the code a slash forward is used (eg wonderful/foreign). The following codes are used:

The following types of frequency list are distinguished: