Vaš brskalnik ne omogoča JavaScript!
JavaScript je nujen za pravilno delovanje teh spletnih strani. Omogočite JavaScript ali pa uporabite sodobnejši brskalnik.
Repozitorij Univerze v Ljubljani
Nacionalni portal odprte znanosti
Odprta znanost
DiKUL
slv
|
eng
Iskanje
Napredno
Novo v RUL
Kaj je RUL
V številkah
Pomoč
Prijava
Podrobno
Automatic genre identification : a survey
ID
Kuzman, Taja
(
Avtor
),
ID
Ljubešić, Nikola
(
Avtor
)
PDF - Predstavitvena datoteka,
prenos
(1,70 MB)
MD5: DA6048DD507252FA5D668607EA7898E9
URL - Izvorni URL, za dostop obiščite
https://link.springer.com/article/10.1007/s10579-023-09695-8
Galerija slik
Izvleček
Automatic genre identification (AGI) is a text classification task focused on genres, i.e., text categories defined by the author’s purpose, common function of the text, and the text’s conventional form. Obtaining genre information has been shown to be beneficial for a wide range of disciplines, including linguistics, corpus linguistics, computational linguistics, natural language processing, information retrieval and information security. Consequently, in the past 20 years, numerous researchers have collected genre datasets with the aim to develop an efficient genre classifier. However, their approaches to the definition of genre schemata, data collection and manual annotation vary substantially, resulting in significantly different datasets. As most AGI experiments are dataset-dependent, a sufficient understanding of the differences between the available genre datasets is of great importance for the researchers venturing into this area. In this paper, we present a detailed overview of different approaches to each of the steps of the AGI task, from the definition of the genre concept and the genre schema, to the dataset collection and annotation methods, and, finally, to machine learning strategies. Special focus is dedicated to the description of the most relevant genre schemata and datasets, and details on the availability of all of the datasets are provided. In addition, the paper presents the recent advances in machine learning approaches to automatic genre identification, and concludes with proposing the directions towards developing a stable multilingual genre classifier.
Jezik:
Angleški jezik
Ključne besede:
computational linguistics
,
text genre
,
web genre
,
automatic genre identification
,
genre schemata
,
genre datasets
,
survey paper
Vrsta gradiva:
Članek v reviji
Tipologija:
1.01 - Izvirni znanstveni članek
Organizacija:
FRI - Fakulteta za računalništvo in informatiko
Status publikacije:
Objavljeno
Različica publikacije:
Objavljena publikacija
Leto izida:
2025
Št. strani:
Str. 537–570
Številčenje:
Vol. 59, iss. 1
PID:
20.500.12556/RUL-167876
UDK:
004.9
ISSN pri članku:
1574-0218
DOI:
10.1007/s10579-023-09695-8
COBISS.SI-ID:
173422083
Datum objave v RUL:
19.03.2025
Število ogledov:
350
Število prenosov:
116
Metapodatki:
Citiraj gradivo
Navadno besedilo
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
Kopiraj citat
Objavi na:
Gradivo je del revije
Naslov:
Language resources and evaluation
Založnik:
Springer Nature
ISSN:
1574-0218
COBISS.SI-ID:
516101145
Licence
Licenca:
CC BY 4.0, Creative Commons Priznanje avtorstva 4.0 Mednarodna
Povezava:
http://creativecommons.org/licenses/by/4.0/deed.sl
Opis:
To je standardna licenca Creative Commons, ki daje uporabnikom največ možnosti za nadaljnjo uporabo dela, pri čemer morajo navesti avtorja.
Sekundarni jezik
Jezik:
Slovenski jezik
Ključne besede:
računalniško jezikoslovje
,
besedilni žanri
,
avtomatsko prepoznavanje žanra
,
anketa
Projekti
Financer:
EC - European Commission
Program financ.:
Connecting Europe Facility, CEF Telecom
Številka projekta:
INEA/CEF/ICT/A2020/2278341
Financer:
ARRS - Agencija za raziskovalno dejavnost Republike Slovenije
Številka projekta:
N6-0099
Naslov:
Jezikovna krajina sovražnega govora na družbenih omrežjih
Financer:
Flanders
Številka projekta:
FWO-G070619N
Naslov:
Linguistic landscape of hate speech on social media
Financer:
ARRS - Agencija za raziskovalno dejavnost Republike Slovenije
Številka projekta:
P6-0411
Naslov:
Jezikovni viri in tehnologije za slovenski jezik
Podobna dela
Podobna dela v RUL:
Podobna dela v drugih slovenskih zbirkah:
Nazaj