izpis_h1_title_alt

Sferična metoda $k$-voditeljev : delo diplomskega seminarja
Lampič, Jan (Author), Knez, Marjetka (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (1,65 MB)
.rR - Appendix, Download (2,07 KB)

Abstract
Hiter napredek pri tehnikah pridobivanja podatkov, je povzročil, da se količina podatkov iz dneva v dan eksponentno veča. Ocenjuje se, da je 80% svetovnih podatkov shranjenih v nestrukturiranem besedilu. Tekstovno rudarjenje je tako postalo zanimivo raziskovalno področje, saj poskuša odkriti dragocene informacije iz nestrukturiranih besedil. Temeljni problem tekstovnega rudarjenja je grupiranje dokumentov. V delu diplomskega seminarja je, kot ena izmed najbolj priljubljenih metod grupiranja dokumentov, predstavljena sferična metoda $k$-voditeljev. Za lažje razumevanje metode, sta na začetku opisana problema grupiranja in reprezentacije dokumentov. Glavni cilj dela je izpeljava algoritma sferične metode $k$-voditeljev. S tem namenom je najprej predstavljena paketna verzija algoritma, z njenimi slabostmi in računskimi izboljšavami. Sledi opis inkrementalne verzije algoritma, ki izboljša rezultate paketne verzije. Končen algoritem sferične metode $k$-voditeljev je dobljen s kombinacijo prejšnjih dveh. V zaključku dela je opisan še zgled uporabe algoritma sferične metode $k$-voditeljev, kjer je problem avtorstvo knjig ”Čarovnik z Oza”. Algoritem posameznim knjigam poišče avtorja besedila na podlagi pogostosti besed, ki jih avtor uporablja.

Language:Slovenian
Keywords:tekstovno rudarjenje, grupiranje, sferična metoda k-voditeljev, kosinusna podobnost, model vreče besed
Work type:Final seminar paper (mb14)
Tipology:2.11 - Undergraduate Thesis
Organization:FMF - Faculty of Mathematics and Physics
Year:2018
UDC:004
COBISS.SI-ID:18432601 Link is opened in a new window
Views:402
Downloads:213
Metadata:XML RDF-CHPDL DC-XML DC-RDF
 
Average score:(0 votes)
Your score:Voting is allowed only to logged in users.
:
Share:AddThis
AddThis uses cookies that require your consent. Edit consent...

Secondary language

Language:English
Title:Spherical $k$-means algorithm
Abstract:
Rapid progress in digital data acquisition techniques has led to huge volume of data. Approximately 80% of the world’s data is in stored as an unstructured text. Text mining has therefore become an exciting research field as it tries to discover valuable information from unstructured texts. Clustering is one of the most interesting and important topics in text mining. This work presents one of the most popular document clustering algorithms, the spherical $k$-means. First, the problem of clustering and representation of documents is described to better understand the method. The main goal of this work is to derive the spherical $k$-means algorithm. For this purpose, the batch version of the algorithm, with its weaknesses and calculation improvements, is introduced first. A description of the incremental version of the algorithm which improves the results of the batch version is presented next. Finally, the batch and incremental iterations are combined to generate the spherical $k$-means algorithm. To conclude the work an example of the use of the spherical $k$-means is given, where the problem is the authorship of books “The Wizard of Oz”. The algorithm assigns authors to the books based on the frequency of used words.

Keywords:text mining, clustering, spherical k-means algorithm, cosine similarity, bag-of-words model

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Comments

Leave comment

You have to log in to leave a comment.

Comments (0)
0 - 0 / 0
 
There are no comments!

Back