CISUC

Operator Equalisation, Bloat and Overfitting - A Study on Human Oral Bioavailability Prediction

Authors

Abstract

Operator equalisation was recently proposed as a new bloat control technique for genetic programming. By controlling the distribution of program lengths inside the population, it can bias the search towards smaller or larger programs. In this paper we propose a new implementation of operator equalisation and compare it to a previous version, using a hard real-world regression problem where bloat and overfitting are major issues. The results show that both implementations of operator equalisation are completely bloat-free, producing smaller individuals than standard genetic programming, without compromising the generalization ability. We also show that the new implementation of operator equalisation is more efficient and exhibits a more predictable and reliable behavior than the previous version. We advance some arguable ideas regarding the relationship between bloat and overfitting, and support them with our results.

Keywords

Genetic Programming, Bloat, Overfitting, Operator Equalisation, Real-World Application, Human Oral Bioavailability, Regression

Subject

Genetic Programming

Conference

2009 Genetic and Evolutionary Computation Conference (GECCO 2009), July 2009


Cited by

Year 2012 : 2 citations

 Harper R (2012). Spatial Co-Evolution - Quicker, Fitter and Less Bloated. In Genetic and Evolutionary Computation Conference (GECCO 2012), 759-766.

 Nguyen TH, Nguyen XH, McKay B, Nguyen QU (2012). Where should we stop? an investigation on early stopping for GP learning. Proceedings of the 9th international conference on Simulated Evolution and Learning (SEAL-2012), 391-399.

Year 2011 : 4 citations

 Ahn EY, Mullen T, Yen J (2011). Evolutionary based feature extraction with dynamic mutation. In Proc IEEE 2011 Congress on Evolutionary Computation, 409–416.

 Ahn EY, Mullen T, Yen J (2011). A two-population evolutionary algorithm for feature extraction: Combining filter and wrapper. In Proc IEEE 2011 Congress on Evolutionary Computation, 736–743.

 Gardner M-A, Gagné C, Parizeau M (2011). Bloat control in genetic programming with a histogram-based accept-reject method. In Proc 13th Genetic and Evolutionary Computation Conference, 187–188.

 Kronberger G, Kommenda M, Affenzeller M (2011). Overfitting detection and adaptive covariant parsimony pressure for symbolic regression. In Proc Genetic and Evolutionary Computation Conference (GECCO 2011), 631–638.