In this study, we examine how well ChatGPT-4 performs in two lexicographic tasks: (a) clea-ning the list of automatically retrieved synonym candidates and assigning synonymic material to lexical senses, and (b) generating dictionary entries, including sense division, definitions, and examples, based on different input data. As a gold standard, we consider the lexicographic decisions recorded in the Digital Dictionary Database for Slovene. In the first experiment, we analyse the results for 246 dictionary entries and find that ChatGPT processed the data iden-tically to lexicographers in 41.9 % of cases, while in 58.1 % of cases, it made different decisions. When assessing the relevance of synonym candidates, ChatGPT was more permissive than the gold standard. Differences in synonym placement (assignment to a different sense in 14.6 % of entries, missing placement in 19.9 %) can be partly attributed to input data characteristics, such as task complexity and the brevity of semantic indicators. In the second experiment, we test ChatGPT’s ability to autonomously generate dictionary entries for 116 headwords. The analysis of generated sense divisions and definitions reveals that the system performs modera-tely well: in 57 % of cases, it identified all senses, almost 80 % of generated entries received an average score of 3.5 or higher, and 19 % received the highest score from both evaluators. The main challenges include excessive splitting of senses, failure to recognise figurative meanings, and reduced predictability of results. We conclude that ChatGPT has potential for speeding up manual lexicographic work if its results are properly monitored and refined.
|