Prepoznavanje imenskih entitet na domenskih besedilih iz farmacije

KOVAČ KEBER, BENJAMIN

Prepoznavanje imenskih entitet na domenskih besedilih iz farmacije
ID KOVAČ KEBER, BENJAMIN (Author), ID Žitnik, Slavko (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (2,42 MB)
MD5: 6AE85713445399234A53D0EA9234ABAB

Abstract

Prepoznavanje imenskih entitet je ena od nalog problema procesiranja naravnega jezika. Gre za označevanje besed in besednih zvez z oznakami v naprej določenih tipov imenskih entitet. Primeri uporabe prepoznavanja imenskih entitet so klasifikacija vsebine za ponudnike novic, učinkoviti iskalni algoritmi, priporočanje vsebine, organizacija člankov in podpora strankam. Preučili smo problem prepoznavanja imenskih entitet na domenskih besedilih iz farmacije. V ta namen smo uporabili štiri različne metode in za učenje modelov uporabili dva korpusa (CHEMDNER in n2c2), ki imata ročno označene imenske entitete iz področja farmacije (in kemije). Modele smo evalvirali tudi na besedilih, ki smo jih sami ročno označili. Najbolje se je odrezal model BERT. Za praktično uporabo pa bo verjetno potrebno v modele vložiti še nekaj truda za izboljšave.

Language:	Slovenian
Keywords:	procesiranje naravnega jezika, prepoznavanje imenskih entitet, farmacija
Work type:	Bachelor thesis/paper
Typology:	2.11 - Undergraduate Thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2023
PID:	20.500.12556/RUL-144589
COBISS.SI-ID:	144127491
Publication date in RUL:	02.03.2023
Views:	645
Downloads:	539
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Named entity recognition in pharmaceutical domain texts
Named entity recognition is one of the tasks of the natural language processing problem. It is about tagging words and phrases with labels of predefined types of named entities. Examples of named entity recognition use cases are content classification for news providers, efficient search algorithms, content recommendation, organization of research papers and customer support. We have studied the problem of named entity recognition on domain texts from pharmacy. For this purpose, we used four different named entity recognition methods using two corpora (CHEMDNER and n2c2) that contain manually annotated named entities from the pharmacy domain. We also evaluated the models on texts, which we manually annotated. The BERT model performed best. For practical use, it is probably necessary to put some more effort in the model in order to improve it.
Keywords:	natural language processing, named entity recognition, pharmacy

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents