Efficient active learning with generalized linear models

Jeremy Lewi, Robert Butera, Liam Paninski

Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, PMLR 2:267-274, 2007.

Abstract

Active learning can significantly reduce the amount of training data required to fit parametric statistical models for supervised learning tasks. Here we present an efficient algorithm for choosing the optimal (most informative) query when the output labels are related to the inputs by a generalized linear model (GLM). The algorithm is based on a Laplace approximation of the posterior distribution of the GLM's parameters. The algorithm requires only low-rank matrix manipulations and a single two-dimensional search to choose the optimal query and has complexity $O(n^2)$ (with $n$ the dimension of the feature space), making active learning with GLMs feasible even for high-dimensional feature spaces. In certain cases the twodimensional search may be reduced to a onedimensional search, further improving the algorithm's efficiency. Simulation results show that the model parameters can be estimated much more efficiently using the active learning technique than by using randomly chosen queries. We compute the asymptotic posterior covariance semi-analytically and demonstrate that the algorithm empirically achieves this asymptotic convergence rate, which is generally better than the convergence rate in the random-query setting. Finally, we generalize the approach to efficiently handle both output history effects (for applications to time-series models of autoregressive type) and slow, non-systematic drifts in the model parameters

Cite this Paper

BibTeX


@InProceedings{pmlr-v2-lewi07a,
  title = 	 {Efficient active learning with generalized linear models},
  author = 	 {Lewi, Jeremy and Butera, Robert and Paninski, Liam},
  booktitle = 	 {Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics},
  pages = 	 {267--274},
  year = 	 {2007},
  editor = 	 {Meila, Marina and Shen, Xiaotong},
  volume = 	 {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Juan, Puerto Rico},
  month = 	 {21--24 Mar},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v2/lewi07a/lewi07a.pdf},
  url = 	 {https://proceedings.mlr.press/v2/lewi07a.html},
  abstract = 	 {Active learning can significantly reduce the amount of training data required to fit parametric statistical models for supervised learning tasks. Here we present an efficient algorithm for choosing the optimal (most informative) query when the output labels are related to the inputs by a generalized linear model (GLM). The algorithm is based on a Laplace approximation of the posterior distribution of the GLM's parameters. The algorithm requires only low-rank matrix manipulations and a single two-dimensional search to choose the optimal query and has complexity $O(n^2)$ (with $n$ the dimension of the feature space), making active learning with GLMs feasible even for high-dimensional feature spaces. In certain cases the twodimensional search may be reduced to a onedimensional search, further improving the algorithm's efficiency. Simulation results show that the model parameters can be estimated much more efficiently using the active learning technique than by using randomly chosen queries. We compute the asymptotic posterior covariance semi-analytically and demonstrate that the algorithm empirically achieves this asymptotic convergence rate, which is generally better than the convergence rate in the random-query setting. Finally, we generalize the approach to efficiently handle both output history effects (for applications to time-series models of autoregressive type) and slow, non-systematic drifts in the model parameters}
}

Endnote

%0 Conference Paper
%T Efficient active learning with generalized linear models
%A Jeremy Lewi
%A Robert Butera
%A Liam Paninski
%B Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2007
%E Marina Meila
%E Xiaotong Shen	
%F pmlr-v2-lewi07a
%I PMLR
%P 267--274
%U https://proceedings.mlr.press/v2/lewi07a.html
%V 2
%X Active learning can significantly reduce the amount of training data required to fit parametric statistical models for supervised learning tasks. Here we present an efficient algorithm for choosing the optimal (most informative) query when the output labels are related to the inputs by a generalized linear model (GLM). The algorithm is based on a Laplace approximation of the posterior distribution of the GLM's parameters. The algorithm requires only low-rank matrix manipulations and a single two-dimensional search to choose the optimal query and has complexity $O(n^2)$ (with $n$ the dimension of the feature space), making active learning with GLMs feasible even for high-dimensional feature spaces. In certain cases the twodimensional search may be reduced to a onedimensional search, further improving the algorithm's efficiency. Simulation results show that the model parameters can be estimated much more efficiently using the active learning technique than by using randomly chosen queries. We compute the asymptotic posterior covariance semi-analytically and demonstrate that the algorithm empirically achieves this asymptotic convergence rate, which is generally better than the convergence rate in the random-query setting. Finally, we generalize the approach to efficiently handle both output history effects (for applications to time-series models of autoregressive type) and slow, non-systematic drifts in the model parameters

RIS


TY  - CPAPER
TI  - Efficient active learning with generalized linear models
AU  - Jeremy Lewi
AU  - Robert Butera
AU  - Liam Paninski
BT  - Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics
DA  - 2007/03/11
ED  - Marina Meila
ED  - Xiaotong Shen	
ID  - pmlr-v2-lewi07a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 2
SP  - 267
EP  - 274
L1  - http://proceedings.mlr.press/v2/lewi07a/lewi07a.pdf
UR  - https://proceedings.mlr.press/v2/lewi07a.html
AB  - Active learning can significantly reduce the amount of training data required to fit parametric statistical models for supervised learning tasks. Here we present an efficient algorithm for choosing the optimal (most informative) query when the output labels are related to the inputs by a generalized linear model (GLM). The algorithm is based on a Laplace approximation of the posterior distribution of the GLM's parameters. The algorithm requires only low-rank matrix manipulations and a single two-dimensional search to choose the optimal query and has complexity $O(n^2)$ (with $n$ the dimension of the feature space), making active learning with GLMs feasible even for high-dimensional feature spaces. In certain cases the twodimensional search may be reduced to a onedimensional search, further improving the algorithm's efficiency. Simulation results show that the model parameters can be estimated much more efficiently using the active learning technique than by using randomly chosen queries. We compute the asymptotic posterior covariance semi-analytically and demonstrate that the algorithm empirically achieves this asymptotic convergence rate, which is generally better than the convergence rate in the random-query setting. Finally, we generalize the approach to efficiently handle both output history effects (for applications to time-series models of autoregressive type) and slow, non-systematic drifts in the model parameters
ER  -

APA


Lewi, J., Butera, R. & Paninski, L.. (2007). Efficient active learning with generalized linear models. Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 2:267-274 Available from https://proceedings.mlr.press/v2/lewi07a.html.

Related Material

Download PDF