izpis_h1_title_alt

Kako dober je ChatGPT pri umeščanju sopomenk pod besedne pomene
ID Gapsa, Magdalena (Author), ID Arhar Holdt, Špela (Author), ID Kosem, Iztok (Author)

.pdfPDF - Presentation file, Download (280,14 KB)
MD5: FF8234AEE10C8BB22C5754FB9F54C7B8
URLURL - Source URL, Visit https://zenodo.org/records/13912515 This link opens in a new window

Abstract
V raziskavi preverjamo, kako dobro se ChatGPT-4 odreže pri čiščenju seznama strojno pridobljenih sopomenskih kandidatov in umeščanju sopomenskega gradiva pod besedne pomene. Kot zlati standard upoštevamo slovaropisne odločitve, ki so bile sprejete pri nadgrajevanju Slovarja sopomenk sodobne slovenščine v različico 2.0. V prispevku analiziramo rezultate za 246 slovarskih iztočnic. Za 41,9 % iztočnic je ChatGPT podatke uredil povsem enako kot slovaropisci, za 58,1 % pa se je v odločitvi razlikoval: 43,5 % iztočnic je vsebovalo razlike pri odstranjevanju neustreznih sopomenskih kandidatov, 28,9 % pa pri razvrščanju sopomenk pod pomene. Pri presojanju relevantnosti sopomenskih kandidatov je bil ChatGPT popustljivejši od zlatega standarda (priklic 0,33), medtem ko je bila natančnost višja (0,75), vendar razlike težje pojasnljive. Razlike v razvrščanju sopomenk (umestitev pod drug pomen pri 14,6 % iztočnicah, manjkajoča umestitev pri 19,9 %) deloma pripisujemo značilnostim vhodnih podatkov, kot sta kompleksnost naloge in kratkost pomenskih indikatorjev. Bodoče delo bo usmerjeno v preizkus implementacije strojnega postopka za pohitritev slovaropisnega dela.

Language:Slovenian
Keywords:digitalno slovaropisje, ChatGPT, sopomenke, besedni pomen, slovenščina
Work type:Other
Typology:1.08 - Published Scientific Conference Contribution
Organization:FRI - Faculty of Computer and Information Science
FF - Faculty of Arts
Publication status:Published
Publication version:Version of Record
Year:2024
Number of pages:Str. 144-162
PID:20.500.12556/RUL-164264 This link opens in a new window
UDC:81'322
COBISS.SI-ID:212016643 This link opens in a new window
Publication date in RUL:18.10.2024
Views:46
Downloads:5
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Record is a part of a monograph

Title:Jezikovne tehnologije in digitalna humanistika : zbornik konference
Editors:Špela Arhar Holdt, Tomaž Erjavec
Place of publishing:Ljubljana
Publisher:Inštitut za novejšo zgodovino, = Institute of Contemporary History
Year:2024
ISBN:978-961-7104-40-0
COBISS.SI-ID:211315971 This link opens in a new window

Licences

License:CC BY-SA 4.0, Creative Commons Attribution-ShareAlike 4.0 International
Link:http://creativecommons.org/licenses/by-sa/4.0/
Description:This Creative Commons license is very similar to the regular Attribution license, but requires the release of all derivative works under this same license.

Secondary language

Language:English
Title:How good is ChatGPT at placing synonyms under word senses
Abstract:
In this study, we test how well ChatGPT-4 cleans the list of automatically retrieved synonym candidates and distributes the synonyms under appropriate lexical senses. As a gold standard, we consider the lexicographic decisions made when updating the Thesaurus of Modern Slovene to version 2.0. In this paper, we compare the results for 246 dictionary entries. For 41.9% of entries, ChatGPT processed the data in the same way as lexicographers, while for 58.1%, it made a different decision: 43.5% of entries contained differences in the removal of noisy data, and 28.9% in the mapping of synonyms to lexical senses. When assessing the relevance of synonym candidates, ChatGPT is more permissive than the gold standard (recall 0.33), while precision is higher (0.75), but the differences are more difficult to explain. Differences in synonym placement (placement under a different sense in 14.6% of entries, missing placement in 19.9%) are partly attributed to features of the input data, such as task complexity and brevity of semantic indicators. Future work will focus on the validation of the method for speeding up lexicographic work.

Keywords:digital lexicography, ChatGPT, synonyms, word senses, Slovene language

Projects

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:P6-0411
Name:Jezikovni viri in tehnologije za slovenski jezik

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:J7-3159
Name:Empirična podlaga za digitalno podprt razvoj pisne jezikovne zmožnosti

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back