Your browser does not allow JavaScript!
JavaScript is necessary for the proper functioning of this website. Please enable JavaScript or use a modern browser.
Repository of the University of Ljubljana
Open Science Slovenia
Open Science
DiKUL
slv
|
eng
Search
Browse
New in RUL
About RUL
In numbers
Help
Sign in
Details
Knjižnica za tekstovno analitiko v programskem okolju Orange
ID
NOVAK, DAVID
(
Author
),
ID
Zupan, Blaž
(
Mentor
)
More about this mentor...
PDF - Presentation file,
Download
(5,00 MB)
MD5: BE6EE6A0CFB92281906DBCAD72B2A6F5
PID:
20.500.12556/rul/45c0b1fd-237c-494a-9505-29c67835d8c3
Image galllery
Abstract
Razvili smo sistem za analizo besedil in ga osnovali kot dodatek za programsko okolje Orange. Orange združuje bogat nabor metod za nadzorovano in nenadzorovano strojno učenje, zato je odličen temelj za razvoj takega sistema. S pregledom literature in odprtih orodij smo določili kaj so temeljne metode, ki se uporabljajo na tem področju in na podlagi le-tega osnovali funkcionalnosti naše knjižnice. Dodali smo gradnike za zajem podatkov s spletnih virov kot sta PubMed in New York Times. Implementirali smo metode za predobdelavo, ki vključujejo pretvorbo besedil v vektorje, odstranjevanje odvečnih besed, lematizacijo in krnjenje, tok dela pa nato podprli z vizualizacijami, na primer z oblakom besed. Naš cilj je bil razviti gradnike, ki se med seboj dobro povezujejo z vizualnim programiranjem, so dobro povezljivi z ostalimi gradniki sistema Orange, ter jih je moč enostavno nadgraditi z razvojem novih gradnikov.
Language:
Slovenian
Keywords:
analiza besedil
,
predobdelava podatkov
,
vizualizacija
,
vizualno programiranje
Work type:
Master's thesis/paper
Organization:
FRI - Faculty of Computer and Information Science
Year:
2016
PID:
20.500.12556/RUL-83811
Publication date in RUL:
30.06.2016
Views:
2044
Downloads:
553
Metadata:
Cite this work
Plain text
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
NOVAK, DAVID, 2016,
Knjižnica za tekstovno analitiko v programskem okolju Orange
[online]. Master’s thesis. [Accessed 14 April 2025]. Retrieved from: https://repozitorij.uni-lj.si/IzpisGradiva.php?lang=eng&id=83811
Copy citation
Share:
Secondary language
Language:
English
Title:
Text mining library for Orange data mining suite
Abstract:
We have developed a text mining system that can be used as an add-on for Orange, a data mining platform. Orange envelops a set of supervised and unsupervised machine learning methods that benefit a typical text mining platform and therefore offers an excellent foundation for development. We have studied the field of text mining and reviewed several open-source toolkits to define its base components. We have included widgets that enable retrieval of data from remote repositories, such as PubMed and New York Times. The pre-processing was designed to include transformation of documents to vectors, stop word removal, lemmatization and stemming. The results can be visualized via widgets such as the word cloud. Our goal was to develop widgets that can be easily incorporated into the existing Orange workflow, can be upgraded with additional widgets, and perform well in a visual programming environment.
Keywords:
text mining
,
data pre-processing
,
visualization
,
visual programming
Similar documents
Similar works from RUL:
No similar works found
Similar works from other Slovenian collections:
No similar works found
Back