Vaš brskalnik ne omogoča JavaScript!
JavaScript je nujen za pravilno delovanje teh spletnih strani. Omogočite JavaScript ali pa uporabite sodobnejši brskalnik.
Repozitorij Univerze v Ljubljani
Nacionalni portal odprte znanosti
Odprta znanost
DiKUL
slv
|
eng
Iskanje
Napredno
Novo v RUL
Kaj je RUL
V številkah
Pomoč
Prijava
Podrobno
Carniolan Provincial Assembly : corpus improvements and enhancements
ID
Pretnar Žagar, Ajda
(
Avtor
),
ID
Pahor de Maiti, Kristina
(
Avtor
)
PDF - Predstavitvena datoteka,
prenos
(1,25 MB)
MD5: 51E40C830BDCD61D747CB03EB2DF8E00
URL - Izvorni URL, za dostop obiščite
https://journals.uio.no/dhnbpub/article/view/13202
Galerija slik
Izvleček
Historical parliamentary corpora offer crucial evidence for studying political discourse over time, yet their usability is often limited by poor OCR quality and incomplete metadata. This paper presents the enhancement of the Kranjska 1.0 corpus, a collection of Carniolan Provincial Assembly proceedings (1861–1913) in Slovenian and German, through a two-phase process aimed at improving textual accuracy and enriching speaker metadata. First, we conducted a manual correction campaign on a representative sample of transcripts, involving trained historians proficient in Gothic script and 19th-century politics. The corrections addressed both structural and textual errors in TEI-encoded XML files, providing a gold-standard dataset for future model training. Error analysis revealed recurring OCR issues, including segmentation problems, misattributed speakers, and systematic character-level noise. Second, we harmonised and expanded speaker metadata using multiple historical sources to unify name variants, resolve ambiguities, and document parliamentary terms, factions, and attendance. The resulting metadata enhance corpus usability and interpretability. This work lays the foundation for the next project phase, which explores the automatic correction of transcripts and metadata using Multimodal Large Language Models (MLLMs). By combining historical expertise with computational methods, we contribute to more accurate processing of historical texts and promote transparency and reusability in digital humanities research.
Jezik:
Angleški jezik
Ključne besede:
historical parliamentary proceedings
,
OCR correction
,
error analysis
,
metadata enrichment
Vrsta gradiva:
Drugo
Tipologija:
1.08 - Objavljeni znanstveni prispevek na konferenci
Organizacija:
FRI - Fakulteta za računalništvo in informatiko
FF - Filozofska fakulteta
Status publikacije:
Objavljeno
Različica publikacije:
Objavljena publikacija
Leto izida:
2026
Št. strani:
Str. 1-10
PID:
20.500.12556/RUL-180816
UDK:
004.89:328(497.12)”1861/1913”
ISSN pri članku:
2704-1441
DOI:
10.5617/dhnbpub.13202
COBISS.SI-ID:
271706883
Datum objave v RUL:
17.03.2026
Število ogledov:
122
Število prenosov:
37
Metapodatki:
Citiraj gradivo
Navadno besedilo
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
Kopiraj citat
Objavi na:
Gradivo je del zbornika
Naslov:
Lost in abundance
COBISS.SI-ID:
271613699
Gradivo je del revije
Naslov:
Digital humanities in the Nordic and Baltic countries publications : DHNB
Založnik:
University of Oslo library
ISSN:
2704-1441
COBISS.SI-ID:
228389891
Licence
Licenca:
CC BY 4.0, Creative Commons Priznanje avtorstva 4.0 Mednarodna
Povezava:
http://creativecommons.org/licenses/by/4.0/deed.sl
Opis:
To je standardna licenca Creative Commons, ki daje uporabnikom največ možnosti za nadaljnjo uporabo dela, pri čemer morajo navesti avtorja.
Sekundarni jezik
Jezik:
Slovenski jezik
Ključne besede:
zgodovinski parlamentarni zapisniki
,
popravki napak OCR
,
analiza napak
,
obogatitev metapodatkov
Projekti
Financer:
ARIS - Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost Republike Slovenije
Številka projekta:
P6-0436-2022
Naslov:
Digitalna humanistika: viri, orodja in metode
Financer:
ARIS - Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost Republike Slovenije
Številka projekta:
GC-0002-2024
Naslov:
Veliki jezikovni modeli za digitalno humanistiko
Financer:
EC - European Commission
Program financ.:
HE
Številka projekta:
101186647
Naslov:
Centre of Excellence in Artificial Intelligence for Digital Humanities
Akronim:
AI4DH
Podobna dela
Podobna dela v RUL:
Podobna dela v drugih slovenskih zbirkah:
Nazaj