Automatic summarization of legal documents

Miščič, Andrej

Repository of the University of Ljubljana

Details

Automatic summarization of legal documents
ID Miščič, Andrej (Author), ID Žitnik, Slavko (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (1,42 MB)
MD5: 9E195FF050590DE548BC37F807DD200C

Abstract

The adoption of modern natural language processing is crucial for the legal industry to process large amounts of text data and provide efficient services. Legal research is one the most impacted areas, allowing legal practitioners to find relevant legislation and case law faster. Intending to provide summaries of long legal documents, we tackle the task of automatic summarization of Slovene judicial decisions. We propose GloBerta-Sum, an extractive approach based on recently introduced Slovene pretrained language models. It exploits the structure of judicial decisions to deal with their length and is trained on proposed soft labels to mitigate the effect of a high sentence compression ratio. We additionally combine GloBerta-Sum with an abstractive model to form a hybrid system capable of producing summaries in a paraphrasing manner. We evaluate our approaches using automatic metrics and human evaluation. Results show that our approaches match the relevance of human written summaries, albeit producing a bit less coherent summaries containing more redundant information. Nevertheless, we believe our work highlights the potential of using the proposed methodology to equip legal documents with summaries that allow legal practitioners to quickly assess their relevance.

Language:	English
Keywords:	automatic text summarization, extractive summarization, abstractive summarization, legal documents, natural language processing
Work type:	Master's thesis/paper
Typology:	2.09 - Master's Thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2022
PID:	20.500.12556/RUL-142575
COBISS.SI-ID:	130546947
Publication date in RUL:	14.11.2022
Views:	1501
Downloads:	214
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	Slovenian
Title:	Avtomatsko povzemanje pravnih besedil
Uporaba sodobnih pristopov obdelave naravnega jezika je ključna, da lahko pravna industrija obdeluje velike količine besedil in zagotavlja učinkovite storitve. Pravne raziskave so področje, na katerega imajo ti pristopi največji vpliv, saj pravnikom omogočajo hitrejše iskanje ustrezne zakonodaje in sodne prakse. S ciljem zagotoviti povzetke dolgih pravnih besedil v delu obravnavamo avtomatsko povzemanje slovenskih sodnih odločb. Predlagamo GloBerto-Sum, ekstraktivni pristop, ki temelji na nedavno predstavljenih slovenskih vnaprej naučenih jezikovnih modelih. Da lahko obravnava daljše dokumente, se naš pristop zanaša na strukturo sodnih odločb. Naučen je na mehkih oznakah, kar ublaži težave, ki jih prinaša visoko razmerje med številom povedi v dokumentih in povzetkih. GloBerto-Sum dodatno združimo z abstraktivnim modelom - tako pridobljen hibridni pristop je zmožen generirati povzetke s parafriziranjem. Rezultati kažejo, da naši pristopi generirajo povzetke, ki so po ustreznosti na ravni ročno napisanih, a so lahko nekoliko manj koherentni in vsebujejo več redundantnih informacij. Kljub temu menimo, da z našim delom pokažemo možnost uporabe predlagane metodologije za tvorjenje povzetkov, ki pravnikom omogočajo hitrejši pregled pravnih besedil.
Keywords:	avtomatsko povzemanje besedil, ekstraktivno povzemanje, abstraktivno povzemanje, pravna besedila, obdelava naravnega jezika

Similar works from RUL:
Similar works from other Slovenian collections:

Details

Secondary language

Similar documents