izpis_h1_title_alt

Leksikalna gostota in variabilnost v strojnih prevodih
ID Brglez, Mojca (Author), ID Vintar, Špela (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (1,55 MB)
MD5: 9428B2F9A8CFA6969E222BC6228BFB56
.docxDOCX - Appendix, Download (83,32 KB)
MD5: CF62845281F3DC143AE1E52EB723B7B4
.docxDOCX - Appendix, Download (75,47 KB)
MD5: F61EF2342D6C8F4D9E403B15BFC8F927

Abstract
Nevronski strojni prevajalniki so v zadnjih petih letih postali primarna izbira med sistemi prevajanja. Raziskave kažejo, da tvorijo bolj kakovostne prevode, zato hitro nadomeščajo starejše statistične prevajalnike. Medtem ko naj bi za nekatere jezikovne kombinacije že dosegali prevode na ravni človeških, je strojno prevajanje z nevronskimi prevajalniki, ki vključuje slovenščino, še precej neraziskan teritorij. Poleg klasičnega ugotavljanja kakovosti skušamo v magistrskem delu nasloviti tudi vprašanje pestrosti besedišča strojnih prevodov in morebitne razlike s človeškimi prevodi in prevodi statističnih prevajalnikov na makro ravni. Z namenom iskanja teh razlik uporabimo kvantitativne metode za analizo leksikalne gostote in leksikalne variabilnosti, najprej človeških prevodov literarnega, tehničnega in kulinaričnega besedila, nato pa še njihovih strojnih prevodov z dvema nevronskima in enim statističnim prevajalnikom. Ugotavljamo, da je z vidika leksikalne gostote človeškim prevodom najbližji eden od nevronskih prevajalnikov (Google Translate) in da v primerjavi z drugima prevajalnikoma hkrati ustvarja tudi najboljše prevode. Z vidika pestrosti besedišča rezultati kvantitativnih metod kažejo, da strojni prevodi v primerjavi s človeškimi pravzaprav izkazujejo še večjo variabilnost besedišča. Preko bolj kvalitativnega pristopa pa kljub vsemu ugotavljamo, da rezultati kvantitativnih metod, predvsem z vidika pestrosti besedišča, niso vedno zanesljivi. Iz tega razloga bi bilo ugotovitve potrebno nadalje raziskati ali pa uporabiti bolj natančne in premišljene metode.

Language:Slovenian
Keywords:strojno prevajanje, nevronski prevajalniki, leksikalna gostota, leksikalna variabilnost, kvantitativne metode
Work type:Master's thesis/paper
Organization:FF - Faculty of Arts
Year:2020
PID:20.500.12556/RUL-119953 This link opens in a new window
Publication date in RUL:14.09.2020
Views:1362
Downloads:249
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Lexical density and variability in machine translations
Abstract:
In the last five years, neural machine translators have become the primary choice among translation systems. Research shows they produce higher-quality translations which is why they are quickly replacing older statistical systems. While they seem to have already achieved human parity for some language combinations, neural machine translation involving Slovenian is still a relatively poorly investigated territory. In addition to the classical quality assessment, this thesis tries to address the question of lexical richness of machine translations, possible differences compared to human translations and statistical machine translators on a macro level. To discover these differences, we use quantitative methods to analyse lexical density and lexical diversity, first for human translations of a literary, technical and culinary text, then their machine translations using two neural and one statistical machine translation system. With regard to lexical density, our findings show that one of the neural translators (Google Translate) is the closest to the human translation, and that it generally outperforms the other two systems in terms of translation quality. From the perspective of lexical richness, results obtained by quantitative methods show that machine translations exhibit greater variation of vocabulary than human translations. However, our qualitative analysis has shown that results obtained through quantitative methods, particularly those regarding lexical variety, are not always reliable. Thus the findings should be further investigated or replicated using more precise and targeted methods.

Keywords:machine translation, neural translation systems, lexical density, lexical variability, quantitative methods

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back