Your browser does not allow JavaScript!
JavaScript is necessary for the proper functioning of this website. Please enable JavaScript or use a modern browser.
Repository of the University of Ljubljana
Open Science Slovenia
Open Science
DiKUL
slv
|
eng
Search
Browse
New in RUL
About RUL
In numbers
Help
Sign in
Details
Vektorske vložitve za prepoznavanje slovenskih glagolskih idiomov
ID
ZELINKA, TILEN
(
Author
),
ID
Robnik Šikonja, Marko
(
Mentor
)
More about this mentor...
PDF - Presentation file,
Download
(332,07 KB)
MD5: 14C777F11D93FFA50D4E7A6C2A8D3AE7
Image galllery
Abstract
Vektorske vložitve preslikajo besede v visokodimenzionalne vektorje realnih števil, pri čemer imajo besede s podobnimi pomeni podobne vektorje. Preučili smo problem avtomatske prepoznave slovenskih glagolskih idiomov z uporabo značilk, zgrajenih iz vektorskih vložitev skupin besed in vektorskih vložitev posameznih besed. V ta namen smo zgradili dve podatkovni množici, ki vsebujeta primere glagolskih idiomov in naključnih skupin besed, opisanih z zgrajenimi značilkami. Na teh množicah smo ocenili uspešnost klasifikacije glagolskih idiomov z metodo podpornih vektorjev, naključnih gozdov in logistične regresije. Vse tri metode so pri klasifikaciji dokaj uspešne, najbolje se je izkazala metoda naključnih gozdov. Zaradi časovne zahtevnosti in omejitev prepoznave na skupine besed, za katere so znane vektorske vložitve, pa bodo za praktično uporabo potrebne še dodatne izboljšave.
Language:
Slovenian
Keywords:
obdelava naravnega jezika
,
vektorske vložitve
,
stalne besedne zveze
,
strojno učenje
Work type:
Bachelor thesis/paper
Organization:
FRI - Faculty of Computer and Information Science
Year:
2019
PID:
20.500.12556/RUL-106895
Publication date in RUL:
25.03.2019
Views:
1420
Downloads:
314
Metadata:
Cite this work
Plain text
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
ZELINKA, TILEN, 2019,
Vektorske vložitve za prepoznavanje slovenskih glagolskih idiomov
[online]. Bachelor’s thesis. [Accessed 12 April 2025]. Retrieved from: https://repozitorij.uni-lj.si/IzpisGradiva.php?lang=eng&id=106895
Copy citation
Share:
Secondary language
Language:
English
Title:
Word embeddings for detection of verbal idioms in Slovene
Abstract:
Word embeddings map words to a high dimensional vector space, where words with similar meanings have similar vectors. We analyzed the problem of automatic identification of verbal idioms in Slovene using features built from embeddings of single words and groups of words. For this purpose, we built two data sets that contain verbal idioms and random word groups described with corresponding features. Using these data sets we evaluated the classification of verbal idioms with support vector machines, random forests, and logistic regression. All three methods were successful, the best being random forests. Due to large computational time and limitation to only identify groups of words with precomputed word embeddings the approach requires further improvements to be practically useful.
Keywords:
natural language processing
,
word embeddings
,
multiword expressions
,
machine learning
Similar documents
Similar works from RUL:
Early programming learning with programming language Scratch Junior
Controlling the insertion of seals into lubrication system distributors
Detection of surface defects on highly glossy objects
Measuring distance and rotation of a surface using a laser scanning head and camera
Detection and tracking system for a swarm of millirobots
Similar works from other Slovenian collections:
ROSUS 2020: Computer image processing and its application in Slovenia 2020
ROSUS 2022: Computer image processing and its application in Slovenia 2022
ADVANCED USER INTERFACE BASED ON COMPUTER IMAGE PROCESSING TECHNIQUES
Computer vision techniques in automated recording of a handball match
DETECTION OF EXTRAORDINARY EVENTS USING WEBCAM CONNECTED TO A ROUTER
Back