izpis_h1_title_alt

Izdelava korpusa starejših slovenskih besedil v okviru projekta IMPACT
ID Erjavec, Tomaž (Author), ID Vodopivec, Ines (Author), ID Oliver, Maša (Author)

.pdfPDF - Presentation file, Download (169,70 KB)
MD5: D797A1AB3EDBE1886ED683C85010A716
URLURL - Source URL, Visit https://centerslo.si/simpozij-obdobja/zborniki/obdobja-30/ This link opens in a new window

Abstract
Institut "Jožef Stefan" in Narodna in univerzitetna knjižnica Ljubljana od leta 2010 sodelujeta pri evropskem projektu IMPACT (Improving Access to Text), katerega cilj je razviti tehnologije, ki bodo uporabniku in bralcu omogočale uspešnejši dostop do polnega besedila digitaliziranih starejših tiskanih besedil v slovenskem jeziku. V članku predstavimo postopek izdelave korpusa slovenskih besedil in izpostavimo težave, na katere smo naleteli med delom in ki med drugim izvirajo iz zgodovinskih in strukturnih značilnosti slovenskega jezika in črkopisa, ki smo ga uporabljali v preteklosti.

Language:Slovenian
Keywords:slovenščina, starejša slovenska besedila, digitalizacija, projekt IMPACT, digitalizacija, arhaični slovenski jezik, OCR
Work type:Article
Typology:1.16 - Independent Scientific Component Part or a Chapter in a Monograph
Organization:FF - Faculty of Arts
Year:2011
Number of pages:Str. 121-127
PID:20.500.12556/RUL-149206 This link opens in a new window
UDC:004.9:811.163.6"14/18"
COBISS.SI-ID:25362727 This link opens in a new window
Publication date in RUL:05.09.2023
Views:400
Downloads:36
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Record is a part of a monograph

Title:Meddisciplinarnost v slovenistiki
Editors:Simona Kranjc
Place of publishing:Ljubljana
Publisher:Znanstvena založba Filozofske fakultete
Year:2011
ISBN:978-961-237-461-7
COBISS.SI-ID:258646784 This link opens in a new window
Collection title:Obdobja
Collection numbering:30

Secondary language

Language:English
Abstract:
Since 2010 the Jožef Stefan Institute and the National and University Library have been collaborating on the project EU IMPACT (Improving Access to Text), which has as its goal the development of technologies that will enable better full-text access to digitised printed historical Slovene texts. The paper presents the work-flow of the corpus compilation and highlights the problems that we have had to face, which stem also from the historical and structural characteristics of Slovene.

Keywords:Slovene, digitisation, OCR, historical printed Slovene texts, historical Slovene, IMPACT project

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back