This paper presents a study of collocations in Slovene user generated content (UGC): in tweets, forums and blog posts. For extracting collocations of newly coined words word sketches are used, while UGC-specific collocations of general vocabulary are extracted using a method for comparing collocations of two corpora. In addition to analyzing collocations the key obstacles in the extraction process are identified.
|