There are anumber of methods for measuring syntactic complexity in digital
language databases. Linguistic corpora, especially those containing syntactic
annotations, enable researchers to automatically and efficiently conduct analyses
and comparisons of syntactic complexity. In this paper, I present a method with
which I automatically compare two corpora – one containing written texts and
the other containing spoken texts – using six established measures of syntactic
complexity.The results of this comparison indicate that the syntactic makeup
of the language contained in the written corpus is slightly more complex than in
the spoken corpus. The differences are most pronounced in sentence length and
in syntactic tree depth. Additionally, an analysis of the correlation between the
different measures suggests that some provide quite different information about
the syntactic structure of a sentence compared too thers.
|