Self-supervised anomaly detection in production log streams

Martinčič, Tomaž

Self-supervised anomaly detection in production log streams
ID Martinčič, Tomaž (Author), ID Žitnik, Slavko (Mentor) More about this mentor... This link opens in a new window

, ID García Faura, Álvaro (Comentor)

PDF - Presentation file, Download (2,63 MB)
MD5: AAB334DAB176DB869367F71D92830D79

Abstract

Log-based anomaly detection solutions are needed to effectively analyze and interpret vast amounts of generated log data, uncover hidden patterns, and predict system anomalies, enhancing operational efficiency, ensuring system security, and reducing potential downtime. In recent times, there has been development in the field of automatic anomaly detection using machine learning methods. In this work, we extended LogBERT, a well-known method in the field, into a hierarchical transformer by including a pre-trained language model to obtain semantic embeddings of log templates. We provide richer information and avoid the out-of-vocabulary problem that is faced with the original LogBERT method. We introduce a novel method called SemLogBERT. We found out that the results presented in most of the SOTA methods severely overestimate models' performance. We evaluated LogBERT and SemLogBERT in a more realistic scenario, where it improved the performance on some of the standard benchmark datasets.

Language:	English
Keywords:	Natural language processing, anomaly detection, production logs, machine learning, self-supervised learning
Work type:	Master's thesis/paper
Typology:	2.09 - Master's Thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2023
PID:	20.500.12556/RUL-149498
COBISS.SI-ID:	163944963
Publication date in RUL:	07.09.2023
Views:	770
Downloads:	107
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	Slovenian
Title:	Samonadzorovano odkrivanje anomalij v produkcijskih dnevniških zapisih
Rešitve za avtomatsko odkrivanje anomalij v sistemskih dnevniških zapisih so potrebne za učinkovito analizo in interpretacijo ogromnih količin ustvarjenih podatkov dnevnikov, odkrivanje skritih vzorcev in napovedovanje sistemskih anomalij, izboljšanje učinkovitosti delovanja, zagotavljanje varnosti sistema in zmanjšanje možnih izpadov. V zadnjem času je prišlo do razvoja na področju samodejnega odkrivanja nepravilnosti z uporabo metod strojnega učenja. V tem delu smo razširili na tem področju dobro znano metodo LogBERT v hierarhični transformator z vključitvijo prednaučenega jezikovnega modela za pridobitev semantičnih vložitev predlog dnevniških zapisov. S tem zagotavljamo bogatejše informacije in se izognemo težavam novih predlog, s katerimi se sooča izvirna metoda LogBERT. Predstavljamo novo metodo, imenovano SemLogBERT. Ugotovili smo, da rezultati, predstavljeni v večini modernih metod, močno precenjujejo njihovo učinkovitost. LogBERT in SemLogBERT smo ovrednotili v bolj realističnem scenariju, kjer smo izboljšali rezultate na nekaterih izmed standardnih primerjalnih podatkovih zbirk na tem področju.
Keywords:	Obdelava naravnega jezika, odkrivanje anomalij, produkcijski dnevniški zapisi, strojno učenje, samonadzorovano strojno učenje

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents