Odkrivanje enačb za celoštevilska zaporedja z verjetnostnimi gramatikami : magistrsko delo

Gec, Boštjan

Odkrivanje enačb za celoštevilska zaporedja z verjetnostnimi gramatikami : magistrsko delo
ID Gec, Boštjan (Author), ID Todorovski, Ljupčo (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (724,86 KB)
MD5: C32830D6786386D353CA7C5C73DF25E2

Abstract

Algoritmi za odkrivanje enačb, ki uporabljajo verjetnostne gramatike, delujejo tako, da najprej vzorčijo strukture izrazov iz gramatike in nato na podlagi teh poiščejo enačbe, ki se najbolj prilegajo vhodnim podatkom. Strukture izrazov vzorčijo na podlagi verjetnosti, ki jih določa verjetnostna gramatika. Problem, ki ga srečamo pri tem je, da želimo tvoriti samo končne strukture in želimo imeti ustrezno verjetnostno porazdelitev na množici vseh možnih končnih struktur izrazov, ki jih tvori gramatika. Na srečo lahko v ta namen na verjetnostne gramatike gledamo kot na večtipske procese razvejanja. Za te obstaja izrek, ki pod določenimi pogoji pove, kdaj lahko ustrezno porazdelitev definiramo in kdaj ne. Poleg tega v magistrskem delu razvijem empirično okolje, ki omogoča uporabo omenjenih algoritmov za odkrivanje enačb v celoštevilskih zaporedjih iz Spletne enciklopedije celoštevilskih zaporedij (OEIS). Uporabo okolja ilustriram na odkrivanju enačb za štirinajst izbranih zaporedij iz OEIS.

Language:	Slovenian
Keywords:	odkrivanje enačb, simbolna regresija, strojno učenje, verjetnostne kontekstno-neodvisne gramatike, večtipski procesi razvejanja
Work type:	Master's thesis/paper
Typology:	2.09 - Master's Thesis
Organization:	FMF - Faculty of Mathematics and Physics
Year:	2021
PID:	20.500.12556/RUL-131669
UDC:	519.2
COBISS.SI-ID:	79059971
Publication date in RUL:	01.10.2021
Views:	1613
Downloads:	125
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Equation discovery for integer sequences with probabilistic grammars
Equation discovery algorithms that are based on probabilistic grammars sample arithmetic expressions from the grammar that are then fitted to the input data, to become equations that describe that data. The arithmetical expressions are generated according to the probabilities encoded in the probabilistic grammar. The problem we encounter in this approach is that we consider only finite expressions and we try to define the corresponding probabilistic distribution on the space of the candidate finite expressions. Fortunately, probabilistic grammars can be seen as multitype branching processes. I present and partly prove a theorem that holds for multitype branching processes that tells us whether the grammar properly define the corresponding distribution or not. Furthermore, in this master thesis I design an empirical framework for applying the aforementioned algorithm to the task of discovery of equations that hold for integer sequences from The On-Line Encyclopedia of Integer Sequences (OEIS). I illustrate the use of the framework on discovery of equations for fourteen selected sequences from OEIS.
Keywords:	equation discovery, symbolic regression, machine learning, probabilistic context-free grammars, multitype branching processes

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents