Measuring Bloat, Overfitting and Functional Complexity in Genetic Programming

Authors

Leonardo Vanneschi
Mauro Castelli
Sara Silva

Abstract

Recent contributions clearly show that eliminating bloat in a genetic programming system does not necessarily eliminate overfitting and vice-versa. This fact seems to contradict a common agreement of many researchers known as the minimum description length principle, which states that the best model is the one that minimizes the amount of information needed to encode it. Another common agreement is that overfitting should be, in some sense, related to the functional complexity of the model. The goal of this paper is to define three measures to respectively quantify bloat, overfitting and functional complexity of solutions and show their suitability on a set of test problems including a simple bidimensional symbolic regression test function and two real-life multidimensional regression problems. The experimental results are encouraging and should pave the way to further investigation. Advantages and drawbacks of the proposed measures are discussed, and ways to improve them are suggested. In the future, these measures should be useful to study and better understand the relationship between bloat, overfitting and functional complexity of solutions.

Keywords

Genetic Programming, Bloat, Overfitting, Complexity

Subject

Genetic Programming

Related Project

EnviGP - Improving Genetic Programming for the Environment and Other Applications

Conference

2010 Genetic and Evolutionary Computation Conference (GECCO 2010), July 2010