Lexicon construction and corpus annotation of historical language with CoBaLT editor
Kentner, Tom (Author), Erjavec, Tomaž (Author), Žorga Dulmin, Maja (Author), Fišer, Darja (Author)

URLURL - Presentation file, Visit http://aclweb.org/anthology-new/W/W12/W12-1001.pdf This link opens in a new window

called CoBaLT (Corpus-Based Lexicon Tool), developed to construct corpusbased computational lexica and to correct word-level annotations and transcription errors in corpora. The paper describes the tool as well as our experience in using it to annotate a reference corpus and compile a large lexicon of historical Slovene. The annotations used in our project are modern-day word form equivalent, lemma, part-of-speech tag and optional gloss. The CoBaLT interface is word form oriented and compact. It enables wildcard word searching and sorting according to several criteria, which makes the editing process flexible and efficient. The tool accepts preannotated corpora in TEI P5 format and is able to export the corpus and lexicon in TEI P5 as well. The tool is implemented using the LAMP architecture and is freely available for research purposes.

Keywords:označevanje korpusov, zgodovinski korpusi, zgodovinski jezik, corpus annotation, historical corpora, historical language
Work type:Not categorized (r6)
Tipology:1.08 - Published Scientific Conference Contribution
Organization:FF - Faculty of Arts
COBISS.SI-ID:51011682 Link is opened in a new window
Average score:(0 votes)
Your score:Voting is allowed only to logged in users.
AddThis uses cookies that require your consent. Edit consent...

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:


Leave comment

You have to log in to leave a comment.

Comments (0)
0 - 0 / 0
There are no comments!