Your browser does not allow JavaScript!
JavaScript is necessary for the proper functioning of this website. Please enable JavaScript or use a modern browser.
Open Science Slovenia
Open Science
DiKUL
slv
|
eng
Search
Browse
New in RUL
About RUL
In numbers
Help
Sign in
Lexical diversity in statistical and neural machine translation
ID
Brglez, Mojca
(
Author
),
ID
Vintar, Špela
(
Author
)
PDF - Presentation file,
Download
(343,69 KB)
MD5: F4D87D51688865324A5170F8B8C3EFF9
URL - Source URL, Visit
https://www.mdpi.com/2078-2489/13/2/93
Image galllery
Abstract
Neural machine translation systems have revolutionized translation processes in terms of quantity and speed in recent years, and they have even been claimed to achieve human parity. However, the quality of their output has also raised serious doubts and concerns, such as loss in lexical variation, evidence of “machine translationese”, and its effect on post-editing, which results in “post-editese”. In this study, we analyze the outputs of three English to Slovenian machine translation systems in terms of lexical diversity in three different genres. Using both quantitative and qualitative methods, we analyze one statistical and two neural systems, and we compare them to a human reference translation. Our quantitative analyses based on lexical diversity metrics show diverging results; however, translation systems, particularly neural ones, mostly exhibit larger lexical diversity than their human counterparts. Nevertheless, a qualitative method shows that these quantitative results are not always a reliable tool to assess true lexical diversity and that a lot of lexical “creativity”, especially by neural translation systems, is often unreliable, inconsistent, and misguided.
Language:
English
Keywords:
machine translation
,
neural translation systems
,
lexical diversity
,
type-token ratio
,
measure of textual lexical diversity
Work type:
Article
Typology:
1.01 - Original Scientific Article
Organization:
FF - Faculty of Arts
Publication status:
Published
Publication version:
Version of Record
Year:
2022
Number of pages:
14 str.
Numbering:
Vol. 13, iss. 2, art. 93
PID:
20.500.12556/RUL-137294
UDC:
81\'25\'322.4
ISSN on article:
2078-2489
DOI:
10.3390/info13020093
COBISS.SI-ID:
100548099
Publication date in RUL:
09.06.2022
Views:
966
Downloads:
119
Metadata:
Cite this work
Plain text
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
Copy citation
Share:
Record is a part of a journal
Title:
Information
Shortened title:
Information
Publisher:
MDPI
ISSN:
2078-2489
COBISS.SI-ID:
18497046
Licences
License:
CC BY 4.0, Creative Commons Attribution 4.0 International
Link:
http://creativecommons.org/licenses/by/4.0/
Description:
This is the standard Creative Commons license that gives others maximum freedom to do what they want with the work as long as they credit the author.
Licensing start date:
15.02.2022
Secondary language
Language:
Slovenian
Keywords:
strojno prevajanje
,
nevronski prevajalniki
,
leksikalna diverziteta
,
razmerje različnic in pojavnic
,
merjenje besedilne leksikalne diverzitete
Projects
Funder:
ARRS - Slovenian Research Agency
Project number:
P6-0215
Name:
Slovenski jezik - bazične, kontrastivne in aplikativne raziskave
Similar documents
Similar works from RUL:
Similar works from other Slovenian collections:
Back