An LP-based hyperparameter optimization model for language modeling

Amir Hossein Akhavan Rahnama¹,
Mehdi Toloo ORCID: orcid.org/0000-0002-6977-4608² &
Nezer Jacob Zaidenberg¹

432 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models’ hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrate that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parameter-efficient fine-tuning of large-scale pre-trained language models

Article Open access 02 March 2023

Machine Learning and Natural Language Processing: Review of Models and Optimization Problems

Fine-tuning adaptive stochastic optimizers: determining the optimal hyperparameter $\epsilon$ via gradient magnitude histogram analysis

Article 18 September 2024

Notes

SwiftKey is an input method that employs various artificial intelligence technologies to predict the next word the user intends to type. For an available free download, visit: https://swiftkey.com/en.

References

Bazaraa MS, Jarvis JJ, Sherali HD (2010) Linear programming and network flows, 4th edn. Wiley, Hoboken
MATH Google Scholar
Bazaraa MS, Sherali HD, Shetty CM (2013) Nonlinear programming: theory and algorithms. Wiley, Hoboken
MATH Google Scholar
Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
MATH Google Scholar
Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305
MathSciNet MATH Google Scholar
Bousquet O, Boucheron S, Lugosi G (2004) Introduction to statistical learning theory. In: Advanced lectures on machine learning. Springer, Berlin, pp 169–207
Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Book MATH Google Scholar
Brants T, Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. In: The Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp 858–867
Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1984) Class based n-gram models of natural language. In: Computational linguistics, vol 18. Association for Computational Linguistics
Charnes A, Cooper WW (1962) Programming with linear fractional functionals. Naval Res Logist Q 9:181–186
Article MathSciNet MATH Google Scholar
Chen SF, Rosenfeld R (1999) A Gaussian prior for smoothing maximum entropy models. School of Computer Science, Carnegie Mellon University
Goodman J (2001) A bit of progress in language modeling. Technical Report
Halevy A, Norvig P, Pereira F (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24(2):8–12
Article Google Scholar
Hillier FS, Lieberman GJ (2001) Introduction to operations research. McGraw-Hill, New York
MATH Google Scholar
Mikolov T, Karafiát M, Burget L, Khudanpur S (2010) Recurrent neural network based language model. Interspeech 2:3
Google Scholar
Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):206–226
Article MathSciNet Google Scholar
Vose D (2008) Risk analysis: a quantitative guide, 3rd edn. Wiley, Hoboken
MATH Google Scholar

Download references

Acknowledgements

Mehdi Toloo would like to thank the European Social Fund (CZ.1.07/2.3.00/20.0296) and the Czech Science Foundation (GAČR 17-23495S).

Author information

Authors and Affiliations

Department of Mathematical Information Technology, University of Jyväskylä, Jyväskylä, Finland
Amir Hossein Akhavan Rahnama & Nezer Jacob Zaidenberg
Department of Systems Engineering, Technical University of Ostrava, Sokolska třida 33, 702 00, Ostrava, Czech Republic
Mehdi Toloo

Authors

Amir Hossein Akhavan Rahnama
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Toloo
View author publications
You can also search for this author in PubMed Google Scholar
Nezer Jacob Zaidenberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehdi Toloo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rahnama, A.H.A., Toloo, M. & Zaidenberg, N.J. An LP-based hyperparameter optimization model for language modeling. J Supercomput 74, 2151–2160 (2018). https://doi.org/10.1007/s11227-018-2236-6

Download citation

Published: 09 January 2018
Issue Date: May 2018
DOI: https://doi.org/10.1007/s11227-018-2236-6

An LP-based hyperparameter optimization model for language modeling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parameter-efficient fine-tuning of large-scale pre-trained language models

Machine Learning and Natural Language Processing: Review of Models and Optimization Problems

Fine-tuning adaptive stochastic optimizers: determining the optimal hyperparameter \(\epsilon\) via gradient magnitude histogram analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An LP-based hyperparameter optimization model for language modeling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parameter-efficient fine-tuning of large-scale pre-trained language models

Machine Learning and Natural Language Processing: Review of Models and Optimization Problems

Fine-tuning adaptive stochastic optimizers: determining the optimal hyperparameter \(\epsilon\) via gradient magnitude histogram analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation