Your browser does not allow JavaScript!
Javascript is necessary for the proper functioning of this website. Please enable JavaScript or use a modern browser.
Open Science Slovenia
Open Science
DiKUL
slv
|
eng
Search
Browse
New in RUL
About RUL
In numbers
Help
Sign in
Bilingual lexicon extraction from comparable corpora for closely related languages
Fišer, Darja
(
Author
),
Ljubešić, Nikola
(
Author
)
URL - Presentation file, Visit
http://lml.bas.bg/~iva/ranlp2011/RANLR2011_Proceedings.PDF
Abstract
In this paper we present a knowledge-light approach to extract a bilingual lexicon for closely related languages from comparable corpora. While in most related work an existing dictionary is used to translate context vectors, we take advantage of the similarities between languages instead and build a seed lexicon from words that are identical in both languages and then further extend it with context-based cognates and translations of the most frequent words. We also use cognates for reranking translation candidates obtained via context similarity and extract translation equivalents for all content words, not just nouns as in most related work. The results are very encouraging, suggesting that other similar languages could benefit from the same approach. By enlarging the seed lexicon with cognates and translations of the most frequent words and by cognate-based reranking of translation candidates we were able to improve the average baseline precision from 0.592 to 0.797 on themean reciprocal rank for the ten top-ranking translation candidates for nouns, verbs and adjectives with a 46% recall on the gold standard of 1000 random entries from a traditional dictionary.
Language:
English
Keywords:
korpusno jezikoslovje
,
spletni korpusi
,
sorodni jeziki
,
luščenje
,
leksikalna semantika
,
comparable corpora
,
web corpora
,
closely related languages
,
automatic bilingual lexicon extraction
,
lexical semantics
Work type:
Not categorized (r6)
Tipology:
1.08 - Published Scientific Conference Contribution
Organization:
FF - Faculty of Arts
Year:
2011
Number of pages:
[7 str.]
UDC:
004.9:81'322
COBISS.SI-ID:
46844258
Views:
613
Downloads:
195
Metadata:
Average score:
(0 votes)
Your score:
Voting is allowed only to
logged in
users.
Cite this work
Plain text
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago
Harvard
IEEE
ISO 690
MLA
Vancouver
:
Share:
AddThis uses cookies that require your consent.
Edit consent...
Similar documents
Similar works from RUL:
Similar works from other Slovenian collections:
Comments
Leave comment
You have to
log in
to leave a comment.
Comments (0)
0 - 0 / 0
There are no comments!
Back