Web texts represent an increasing segment of language production both worldwide and in Slovenia. User-generated content is thus becoming an increasingly important source of knowledge and affects future language development. In order to harness this potential, it is necessary to conduct a thorough analysis of internet language use, which differs from traditional language production. The first step in this direction is the construction and analysis of the Janes corpus of internet Slovene, which is presented in this paper.
|