izpis_h1_title_alt

Avtomatsko napovedovanje lastnosti podjetja na podlagi njihove spletne strani
ID Anderle, Žan (Author), ID Demšar, Janez (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (1,00 MB)
MD5: 06E611D6AE2AC7A685CE9E845069572F
PID: 20.500.12556/rul/17a248c1-fcba-4209-9f4f-b367de3e35ec

Abstract
V magistrskem delu obravnavamo problem napovedovanja lastnosti (panoga, starost, število zaposlenih) podjetja na podlagi njihovega spletnega mesta. Predlagamo več napovednih modelov, ki spletno mesto obravnavajo na različne načine. V delu pokažemo kako iz spletnega mesta izluščiti tiste značilke, ki bodo za neko specifično napoved uporabne. V našem primeru se za najbolj uporabno izkaže besedilo celotnega spletnega mesta ter besedilo, ki ga najdemo v meta oznakah. S tem dobimo dva ločena napovedna modela, ki ju lahko združimo v eno združeno napoved. Tak združevalni napovedni model smo uporabili pri napovedovanju panoge podjetja, kjer je dosegel zadovoljive rezultate. Obenem smo preizkusili tudi napovedovanje na podlagi meta značilk spletnega mesta, s katerimi lahko spletno mesto opišemo na drugačen način in se s tem izognemo računsko zahtevni obdelavi besedil. Ta model smo preizkusili na problemu napovedovanja starosti in števila zaposlenih v podjetju. Z modelom nismo dosegli zadovoljivih rezultatov. V delu raziščemo tudi problematiko primernega nabora podatkov za razvijanje napovednih modelov, ki se za napoved zanašajo na spletna mesta. Ugotovimo, da je ta problematičen korak ključen za doseganje boljših rezultatov.

Language:Slovenian
Keywords:klasifikacija spletnih mest, strojno učenje, informacije spletnih strani
Work type:Master's thesis/paper
Organization:FRI - Faculty of Computer and Information Science
Year:2017
PID:20.500.12556/RUL-91343 This link opens in a new window
Publication date in RUL:28.03.2017
Views:1060
Downloads:203
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Automatic prediction of company's characteristics based on their website
Abstract:
Our main objective is predicting company's characteristics (industry, age, number of employees) based on the company's website. We present different prediction models which all extract information from the website in distinct ways. We show what features to extract from a website, that will be useful for a specific prediction. We find that website's content text and meta tags text are often the most relevant. By using these texts we get two separate prediction models and we can also use them in an ensemble model. The latter was used in predicting the company's industry and achieved satisfactory results. We also tested using alternative ways to describe a website by using different meta data that we can extract from a website. This is useful when it is necessary to avoid the computational cost of performing text analysis. We used a model using these features in predicting the age and number of employees. The model was not particularly successful. We also discuss the problem of an appropriate dataset needed for developing aformentioned prediction models. We find that solving this problem is crucial for achieving better results.

Keywords:website classification, machine learning, website information

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back