Equation discovery searches for algebraic expressions that model the given data. In equation discovery tasks, strong emphasis is usually put on the generation of expressions. Historically, expressions are generated by using context-free grammars, evolutionary algorithms and other approaches, but recently generators based on deep learning started to emerge. First attempts at generating discrete, structured data with deep generative models include variational autoencoders (VAE) for simple, unconstrained character sequences, and grammar VAEs, which employ context-free grammars to syntactically constrain the output of the decoder. In contrast, the hierarchical VAE (HVAE) proposed in this paper constrains the output of the decoder to binary expression trees. These trees are encoded and decoded with two simple extensions of gated recursive units. We conjecture that the HVAE can be trained more efficiently than sequential and grammar based VAEs. Indeed, the experimental evaluation results show that the HVAE can be trained with less data and in a lower-dimensional latent space, while still significantly outperforming other approaches. The latter allows for efficient symbolic regression via Bayesian optimization in the latent space and the discovery of complex equations from data.
|