Web corpora are useful in producing language reference materials, conducting research in corpus linguistics, and developing language technology applications. Despite the direct availability of web texts, building such corpora is complicated and expensive, which is why it is important to ensure their availability both to the academic community and the general public. But despite the lack of technical obstacles, a number of legal restrictions must be taken into account, e.g. copyright, personal data protection, and the terms of use of various web service providers. In this article, we provide an overview of the legal basis in this regard as well as the de facto state of things in Slovenia, and suggest a number of measures to enable free and open dissemination of corpora of internet Slovene.
|