Equation discovery algorithms that are based on probabilistic grammars sample arithmetic expressions from the grammar that are then fitted to the input data, to become equations that describe that data. The arithmetical expressions are generated according to the probabilities encoded in the probabilistic grammar. The problem we encounter in this approach is that we consider only finite expressions and we try to define the corresponding probabilistic distribution on the space of the candidate finite expressions. Fortunately, probabilistic grammars can be seen as multitype branching processes. I present and partly prove a theorem that holds for multitype branching processes that tells us whether the grammar properly define the corresponding distribution or not. Furthermore, in this master thesis I design an empirical framework for applying the aforementioned algorithm to the task of discovery of equations that hold for integer sequences from The On-Line Encyclopedia of Integer Sequences (OEIS). I illustrate the use of the framework on discovery of equations for fourteen selected sequences from OEIS.
|