Details

Comparing the nonstandard language of Slovene, Croatian and Serbian tweets
ID Fišer, Darja (Author), ID Erjavec, Tomaž (Author), ID Ljubešić, Nikola (Author), ID Miličević, Maja (Author)

.pdfPDF - Presentation file, Download (80,14 KB)
MD5: 1FA99CE2BAA662F31A82BE0B0FE6BB87
URLURL - Source URL, Visit https://centerslo.si/simpozij-obdobja/zborniki/obdobja-34-1-del/ This link opens in a new window

Abstract
In this paper we carry out a cross-lingual comparison of nonstandard features in the language of social media for Slovene, Croatian and Serbian. The goal of the analysis is twofold: (1) we try to establish the extent to which the observed phenomena are universal rather than language-specific, and (2) we propose an approach for automatic scoring of (non)standardness levels of user-generated content, which can be used as a separate annotation layer in corpora. Quantitative and qualitative analyses of the results show that the majority of the language used on Twitter is fairly standard, especially in Slovene and Croatian. The prevalent characteristic of nonstandard Slovene tweets is nonstandard orthography, while nonstandard lexis is more typical of Serbian tweets, possibly due to a younger user profile.

Language:English
Keywords:user-generated content, nonstandard language, web corpora, corpus annotation, South Slavic languages
Typology:1.16 - Independent Scientific Component Part or a Chapter in a Monograph
Organization:FF - Faculty of Arts
Year:2015
Number of pages:Str. 225-231
Numbering:Del 1
PID:20.500.12556/RUL-166571 This link opens in a new window
UDC:81'276=163.6=163.42=163.41:004.738.52
COBISS.SI-ID:59028578 This link opens in a new window
Publication date in RUL:17.01.2025
Views:403
Downloads:75
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Record is a part of a monograph

Title:Slovnica in slovar - aktualni jezikovni opis
Editors:Mojca Smolej
Place of publishing:Ljubljana
Publisher:Znanstvena založba Filozofske fakultete
Year:2015
ISBN:978-961-237-787-8
COBISS.SI-ID:281920512 This link opens in a new window
Collection title:Obdobja
Collection numbering:34
Collection ISSN:1408-211X

Secondary language

Language:Slovenian
Abstract:
V prispevku predstavimo večjezično primerjavo nestandardnih jezikovnih prvin v družbenih medijih za slovenščino, hrvaščino in srbščino. Cilj analize je dvojen: (1) ugotoviti želimo, do katere mere so identificirani pojavi univerzalni za to zvrst komunikacije in katere so tiste prvine, ki so jezikovno specifične, ter (2) predlagati pristop za avtomatsko ocenjevanje stopnje (ne)standardnosti spletnih uporabniških vsebin, ki ga lahko kot dodatno oznako s pridom uporabimo pri označevanju korpusov. Kvantitativna in kvalitativna analiza rezultatov kažeta, da je jezik, ki se uporablja na Twitterju, pravzaprav precej standarden, še posebej v Sloveniji in na Hrvaškem. Prevladujoča značilnost nestandardnih slovenskih tvitov je nestandardna ortografija, medtem ko je za srbske tvite tipična nestandardna leksika, ki nakazuje na mlajši profil uporabnikov tega družbenega medija v Srbiji.

Keywords:uporabniške spletne vsebine, nestandardni jezik, spletni korpusi, označevanje korpusov, južnoslovanski jeziki

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back