Automated Learning of RVM for Large Scale Text Sets: Divide to Conquer
Authors
Abstract
Three methods are investigated and presented for automated learning of Relevance Vector Machines (RVM) in large scale text sets.RVM probabilistic Bayesian nature allows both predictive distributions on test instances and model-based selection yielding a parsimonious solution. However, scaling up the algorithm is not workable in most digital information processing applications. We look at the properties of the baseline RVM algorithm and propose new scaling approaches based on choosing appropriate working sets which retain the most informative data. Incremental, ensemble and boosting algorithms are deployed to improve classification performance by taking advantage of the large training set available. Results on Reuters-21578 are presented, showing performance gains and maintaining sparse solutions that can be deployed in distributed environments.