The article presents basic corpus design criteria according to the characteristics of different types of corpora. These criteria are concerned with the specifications of the corpus, hardware and software, data capture and mark-up of the corpus document, corpus processing, and the final design of the corpus. In relation to these criteria, a 5.5-milion word corpus of Slovene military texts is presented. Although this corpus has been designed fora very narrow purpose, many new military terms can be extracted from it if we compare it with the new Vojaški slovar (Military Vocabulary) printed in 2002.
|