Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

An LP-based hyperparameter optimization model for language modeling

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models’ hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrate that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. SwiftKey is an input method that employs various artificial intelligence technologies to predict the next word the user intends to type. For an available free download, visit: https://swiftkey.com/en.

References

  1. Bazaraa MS, Jarvis JJ, Sherali HD (2010) Linear programming and network flows, 4th edn. Wiley, Hoboken

    MATH  Google Scholar 

  2. Bazaraa MS, Sherali HD, Shetty CM (2013) Nonlinear programming: theory and algorithms. Wiley, Hoboken

    MATH  Google Scholar 

  3. Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  4. Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems

  5. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305

    MathSciNet  MATH  Google Scholar 

  6. Bousquet O, Boucheron S, Lugosi G (2004) Introduction to statistical learning theory. In: Advanced lectures on machine learning. Springer, Berlin, pp 169–207

  7. Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  8. Brants T, Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. In: The Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp 858–867

  9. Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1984) Class based n-gram models of natural language. In: Computational linguistics, vol 18. Association for Computational Linguistics

  10. Charnes A, Cooper WW (1962) Programming with linear fractional functionals. Naval Res Logist Q 9:181–186

    Article  MathSciNet  MATH  Google Scholar 

  11. Chen SF, Rosenfeld R (1999) A Gaussian prior for smoothing maximum entropy models. School of Computer Science, Carnegie Mellon University

  12. Goodman J (2001) A bit of progress in language modeling. Technical Report

  13. Halevy A, Norvig P, Pereira F (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24(2):8–12

    Article  Google Scholar 

  14. Hillier FS, Lieberman GJ (2001) Introduction to operations research. McGraw-Hill, New York

    MATH  Google Scholar 

  15. Mikolov T, Karafiát M, Burget L, Khudanpur S (2010) Recurrent neural network based language model. Interspeech 2:3

    Google Scholar 

  16. Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):206–226

    Article  MathSciNet  Google Scholar 

  17. Vose D (2008) Risk analysis: a quantitative guide, 3rd edn. Wiley, Hoboken

    MATH  Google Scholar 

Download references

Acknowledgements

Mehdi Toloo would like to thank the European Social Fund (CZ.1.07/2.3.00/20.0296) and the Czech Science Foundation (GAČR 17-23495S).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehdi Toloo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rahnama, A.H.A., Toloo, M. & Zaidenberg, N.J. An LP-based hyperparameter optimization model for language modeling. J Supercomput 74, 2151–2160 (2018). https://doi.org/10.1007/s11227-018-2236-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2236-6

Keywords

Navigation