Uporaba šestih mer skladenjske kompleksnosti za primerjavo jezika v govornem in pisnem korpusu

Terčon, Luka

Uporaba šestih mer skladenjske kompleksnosti za primerjavo jezika v govornem in pisnem korpusu
ID Terčon, Luka (Author)

	PDF - Presentation file, Download (371,29 KB) MD5: FE8958003FE263317F201524B2906E1C
	URL - Source URL, Visit https://zenodo.org/records/13912515

Abstract

Obstajajo številne metode za merjenje skladenjske kompleksnosti v digitaliziranih bazah jezika. Jezikovni korpusi, posebej takšni, ki vsebujejo skladenjske oznake, nam omogočajo, da analize in primerjave skladenjske kompleksnosti izvedemo avtomatsko in učinkovito. V tem prispevku predstavljam metodo za avtomatsko primerjavo dveh korpusov – korpusa pisnih besedil in korpusa govorjenih besedil – s pomočjo šestih uveljavljenih mer skladenjske kompleksnosti. Rezultati kažejo, da je skladenjska sestava jezika v pisnem korpusu nekoliko kompleksnejša kot v govornem korpusu. Razlike so najbolj izrazite predvsem pri dolžini povedi in globini skladenjskih dreves. Analiza korelacije med različnimi merami nakazuje na to, da nekatere od uporabljenih mer podajo precej drugačno informacijo o skladenjski sestavi neke povedi kot druge.

Language:	Slovenian
Keywords:	skladenjska kompleksnost, pisni korpus, govorni korpus, mere kompleksnosti
Work type:	Other
Typology:	1.08 - Published Scientific Conference Contribution
Organization:	FRI - Faculty of Computer and Information Science FF - Faculty of Arts
Publication status:	Published
Publication version:	Version of Record
Year:	2024
Number of pages:	Str. 668-686
PID:	20.500.12556/RUL-164265
UDC:	81'322
COBISS.SI-ID:	212016899
Publication date in RUL:	18.10.2024
Views:	55
Downloads:	9
Metadata:
:	Copy citation
Share:

Record is a part of a monograph

Title:	Jezikovne tehnologije in digitalna humanistika : zbornik konference
Editors:	Špela Arhar Holdt, Tomaž Erjavec
Place of publishing:	Ljubljana
Publisher:	Inštitut za novejšo zgodovino, = Institute of Contemporary History
Year:	2024
ISBN:	978-961-7104-40-0
COBISS.SI-ID:	211315971

Licences

License:	CC BY-SA 4.0, Creative Commons Attribution-ShareAlike 4.0 International

Link:	http://creativecommons.org/licenses/by-sa/4.0/
Description:	This Creative Commons license is very similar to the regular Attribution license, but requires the release of all derivative works under this same license.

Secondary language

Abstract:
Language:	English
Title:	The use of six syntactic complexity measures for linguistic comparisons between a spoken and a written corpus
There are anumber of methods for measuring syntactic complexity in digital language databases. Linguistic corpora, especially those containing syntactic annotations, enable researchers to automatically and efficiently conduct analyses and comparisons of syntactic complexity. In this paper, I present a method with which I automatically compare two corpora – one containing written texts and the other containing spoken texts – using six established measures of syntactic complexity.The results of this comparison indicate that the syntactic makeup of the language contained in the written corpus is slightly more complex than in the spoken corpus. The differences are most pronounced in sentence length and in syntactic tree depth. Additionally, an analysis of the correlation between the different measures suggests that some provide quite different information about the syntactic structure of a sentence compared too thers.
Keywords:	syntactic complexity, written corpus, spoken corpus, complexity measures

Projects

Funder:	ARIS - Slovenian Research and Innovation Agency
Project number:	Z6-4617
Name:	Na drevesnici temelječ pristop k raziskavam govorjene slovenščine

Similar works from RUL:
Similar works from other Slovenian collections:

Record is a part of a monograph

Licences

Secondary language

Projects

Similar documents