Ordered and computerized text collections - corpora - are becoming and indispensable source of linguists data. Freely available corpora of the Slovene language do not exist. The article gives a historical overview of the development of computer corpora, their typologiy and fields of application. Two aspects of corpora are discussed next: the standardization of their encoding and the tools for their development and exploitation. The second partof the article gives an overview of the MULTEXT-East project (Multilingual Text Tools and Corpora for Central and Eastern European Languages), Which also includes the Slovene language. The focus of the presentation is on the corpus and morphosyntactic descriptions developed in the project and on its currently available results. Finally, some possibilities for developing the field of corpus linguistics in Slovenia are discussed.
|