Zeke, Thunder, Joe and I are comparing the performance of several Hyperparameter Optimization algorithms (i.e. autotuning) for a presentation at the Wolfram Data Summit in September.


Advances in neural networks and deep learning have renewed interest in algorithms that assist in tuning the hyperparameters of these models. Bergstra et. al have developed a statistical (Gaussian Process) approach to hyperparameter optimization that exceed the performance of humans for image and speech processing.[2] Both the model dimension and hyper parameter dimension are large enough to make exhaustive "grid" search and random search impractical.[1] Bergstra's Sequential Model-Based Global Optimization (SMGO) hyperparameter optimization approach has improved efficiency further by using an approximation of the model training results as a heuristic within the tree search for optimal parameters. In this paper we will demonstrate these automatic model parameter optimization algorithms and compare them to the straight-forward approach in which each model parameter is optimized independently of the others using hill-climbing in one dimension at a time. This should only perform well for convex loss (merit) functions. Models with a large number of internal degrees of freedom, such as neural nets and Bayesean model should be possible to optimize this way. This paper will compare results for this sequential single-dimension "Manhattan search" approach with the more complex and efficient SMGO approach for several toy and real-world problems.


[1] Snoek, Larachelle, Adams, "Practical Bayesian Optimization of Machine Learning Algorithms", 2014

[2] Bergstra, Bardenet, Bengio, and Ḱegl, "Algorithms for Hyper-Parameter Optimization," 2014

Currently unrated
  • Share

Comments