Automatic machine translation can be an incredibly useful tool
for companies whose employees are not fluent in English, because it can
enable them to use software in their preferred language. While several companies
offer machine translation services, these often do not reach the desired
accuracy level when translating industry-specific texts. As a solution to this
issue, some firms offer services that allow for translator tuning, using one’s
own data. One of these is Microsoft’s Azure Custom Translator, which is
the basis for this research paper. Since we cannot affect the model itself,
this paper primarily focuses on gathering and processing the required data.
Using LASER and Vecalign models, parallel sentences are extracted from
professionally-translated texts and properly aligned. These are then used to
train two separate versions of a custom translator, one based on a general
baseline model, and the other on a technology baseline model. To evaluate
our models, we employ the BLUE, chrF++, and BERTScore scoring systems
to compare them with Azure’s other options, as well as one of the leading
outside services. Upon completing our analysis, we conclude that our technology
baseline model is comparable to the outside service. Finally, we use
the best model to develop a simple translation app.
|