Tuning word2vec for large scale recommendation systems

BP Chamberlain, E Rossi, D Shiebler… - Proceedings of the 14th …, 2020 - dl.acm.org
Proceedings of the 14th ACM Conference on Recommender Systems, 2020dl.acm.org
Word2vec is a powerful machine learning tool that emerged from Natural Language
Processing (NLP) and is now applied in multiple domains, including recommender systems,
forecasting, and network analysis. As Word2vec is often used off the shelf, we address the
question of whether the default hyperparameters are suitable for recommender systems. The
answer is emphatically no. In this paper, we first elucidate the importance of hyperparameter
optimization and show that unconstrained optimization yields an average 221 …
Word2vec is a powerful machine learning tool that emerged from Natural Language Processing (NLP) and is now applied in multiple domains, including recommender systems, forecasting, and network analysis. As Word2vec is often used off the shelf, we address the question of whether the default hyperparameters are suitable for recommender systems. The answer is emphatically no. In this paper, we first elucidate the importance of hyperparameter optimization and show that unconstrained optimization yields an average 221% improvement in hit rate over the default parameters. However, unconstrained optimization leads to hyperparameter settings that are very expensive and not feasible for large scale recommendation tasks. To this end, we demonstrate 138% average improvement in hit rate with a runtime budget-constrained hyperparameter optimization. Furthermore, to make hyperparameter optimization applicable for large scale recommendation problems where the target dataset is too large to search over, we investigate generalizing hyperparameters settings from samples. We show that applying constrained hyperparameter optimization using only a 10% sample of the data still yields a 91% average improvement in hit rate over the default parameters when applied to the full datasets. Finally, we apply hyperparameters learned using our method of constrained optimization on a sample to the Who To Follow recommendation service at Twitter and are able to increase follow rates by 15%.
ACM Digital Library