We aim to learn comma placing using machine learning. Our approach is
based on adding new attributes created from grammatical rules for the Slovenian
language, which provides more information and thus enable better learning,
i.e., higher precision and recall. We focus on placing all the commas in
the text. We modify an existing research with additional learning methods,
different parameters, undersampling and knowledge based attributes. We
use corpus Šolar and improved corpus Šolar for testing and machine learning
toolkit WEKA. Best results were achieved with random forests, alternating
decision tree and decision table models.
|