research-article

Bandit based Optimization of Multiple Objectives on a Music Streaming Platform

Authors:

Rishabh Mehrotra,

Mounia LalmasAuthors Info & Claims

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 3224 - 3233

https://doi.org/10.1145/3394486.3403374

Published: 20 August 2020 Publication History

Abstract

Recommender systems powering online multi-stakeholder platforms often face the challenge of jointly optimizing multiple objectives, in an attempt to efficiently match suppliers and consumers. Examples of such objectives include user behavioral metrics (e.g. clicks, streams, dwell time, etc), supplier exposure objectives (e.g. diversity) and platform centric objectives (e.g. promotions). Jointly optimizing multiple metrics in online recommender systems remains a challenging task. Recent work has demonstrated the prowess of contextual bandits in powering recommendation systems to serve recommendation of interest to users. This paper aims at extending contextual bandits to multi-objective setting so as to power recommendations in a multi-stakeholder platforms.

Specifically, in a contextual bandit setting, we learn a recommendation policy that can optimize multiple objectives simultaneously in a fair way. This multi-objective online optimization problem is formalized by using the Generalized Gini index (GGI) aggregation function, which combines and balances multiple objectives together. We propose an online gradient ascent learning algorithm to maximise the long-term vectorial rewards for different objectives scalarised using the GGI function. Through extensive experiments on simulated data and large scale music recommendation data from Spotify, a streaming platform, we show that the proposed algorithm learns a superior policy among the disparate objectives compared with other state-of-the-art approaches.

References

[1]

Gediminas Adomavicius and YoungOk Kwon. 2007. New Recommendation Techniques for Multicriteria Rating Systems. IEEE Intelligent Systems (2007).

Digital Library

[2]

Deepak Agarwal, Bee-Chung, Pradheep, and Xuanhui. 2011. Click shaping to optimize multiple objectives. In Proceedings of KDD 2011.

Digital Library

[3]

Deepak Agarwal, Bee-Chung Chen, Pradheep Elango, and Xuanhui Wang. 2012. Personalized click shaping through lagrangian duality for online recommendation. In Proceedings of SIGIR 2012.

Digital Library

[4]

Leon Barrett and Srini Narayanan. 2008. Learning All Optimal Policies with Multiple Criteria. In ICML. 41--47.

[5]

Róbert Busa-Fekete, Balázs Szörényi, Paul Weng, and Shie Mannor. 2017. Multi-objective Bandits: Optimizing the Generalized Gini Index. In ICML.

[6]

Konstantina Christakopoulou, Jaya Kawale, and Arindam Banerjee. Recommendation with capacity constraints. In Proceedings of CIKM 2017.

Digital Library

[7]

Wei Chu and Seung-Taek Park. 2009. Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models. In WWW.

[8]

Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of Recommender Algorithms on Top-n Recommendation Tasks. In RecSys.

[9]

M. M. Drugan and A. Nowe. 2013. Designing multi-objective multi-armed bandits algorithms: A study. In International Joint Conference on Neural Networks (IJCNN).

[10]

Zoltán Gábor, Zsolt Kalmár, and Csaba Szepesvári. 1998. Multi-criteria Reinforcement Learning. In ICML. 197--205.

[11]

Rupesh Gupta, Guanfeng Liang, Ravi Kiran Tseng, Xiaoyu Chen, and Romer Rosales. Email volume optimization at LinkedIn. In KDD 2016.

Digital Library

[12]

Tamas Jambor and Jun Wang. Optimizing multiple objectives in collaborative filtering. In Proceedings of RecSys 2010.

Digital Library

[13]

Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 306--310.

Digital Library

[14]

Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 133--142.

Digital Library

[15]

I. Y. Kim and O. L. de Weck. 2006. Adaptive weighted sum method for multiobjective optimization: a new method for Pareto front generation. Structural and Multidisciplinary Optimization, Vol. 31, 2 (2006), 105--116.

[16]

Anisio Lacerda. 2015. Contextual Bandits for Multi-objective Recommender Systems. In Proceedings of the 2015 Brazilian Conference on Intelligent Systems (BRACIS). 68--73.

Digital Library

[17]

Kleanthi Lakiotaki, Nikolaos F Matsatsinis, and Alexis Tsoukias. 2011. Multicriteria user modeling in recommender systems. IEEE Intelligent Systems (2011).

Digital Library

[18]

John Langford and Tong Zhang. 2008. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information. NIPS.

[19]

Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A Contextual-bandit Approach to Personalized News Article Recommendation. In WWW.

[20]

Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms. In WSDM.

[21]

C. Liu, X. Xu, and D. Hu. 2015. Multiobjective Reinforcement Learning: A Comprehensive Overview. IEEE Transactions on Systems, Man, and Cybernetics (2015).

[22]

Donald W. Marquardt and Ronald D. Snee. 1975. Ridge Regression in Practice. The American Statistician, Vol. 29, 1 (1975), 3--20.

[23]

James McInerney, Benjamin Lacker, Samantha Hansen, Karl Higley, Hugues Bouchard, Alois Gruson, and Rishabh Mehrotra. Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits. In Proceedings of RecSys 2018.

[24]

Rishabh Mehrotra and Benjamin Carterette. 2019. Recommendations in a marketplace. In Proceedings of the 13th ACM Conference on Recommender Systems. 580--581.

Digital Library

[25]

Rishabh Mehrotra, James McInerney, Hugues Bouchard, Mounia Lalmas, and Fernando Diaz. 2018. Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems. In CIKM.

[26]

K. Van Moffaert, K. Van Vaerenbergh, P. Vrancx, and A. Nowe.

[27]

Eric Moulines and Francis R. Bach. 2011. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning. NIPS.

[28]

Wodzimierz Ogryczak and Tomasz Sliwinski. 2003. On solving linear programs with the ordered weighted averaging objective. European Journal of Operational Research, Vol. 148, 1 (2003), 80--91.

[29]

Saba Q. Yahyaa, Madalina M. Drugan, and Bernard Manderick. 2014. Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms. In Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1.

[30]

Marco Tulio Ribeiro, Lacerda, Veloso, and Ziviani. Pareto-efficient hybridization for multi-objective recommender systems. In RecSys 2012.

Digital Library

[31]

Herbert Robbins and Sutton Monro. 1951. A stochastic approximation method. The Annals of Mathematical Statistics, Vol. 22, 3 (1951), 400--407.

[32]

Lili Shan, Lei Lin, and Chengjie Sun. Combined Regression and Tripletwise Learning for Conversion Rate Prediction in Real-Time Bidding Advertising. In SIGIR 2018.

[33]

Aleksandrs Slivkins. 2014. Contextual Bandits with Similarity Information. J. Mach. Learn. Res., Vol. 15, 1 (2014), 2533--2568.

Digital Library

[34]

Richard S. Sutton and Francis Bach. 1998. Reinforcement Learning: An Introduction. MIT Press.

[35]

Krysta M Svore, Maksims N Volkovs, and Christopher JC Burges. Learning to rank with multiple objective functions. In Proceedings of WWW 2011.

Digital Library

[36]

Cem Tekin and Eralp Turug ay. 2018. Multi-objective contextual multi-armed bandit with a dominant objective. IEEE Transactions on Signal Processing (2018).

Digital Library

[37]

Eralp Turug ay, Doruk Öner, and Cem Tekin. 2018. Multi-objective contextual bandit problem with similarity information. arXiv preprint arXiv:1803.04015 (2018).

[38]

Umair ul Hassan and Edward Curry. 2016. Efficient task assignment for spatial crowdsourcing: A combinatorial fractional optimization approach with semi-bandit learning. Expert Systems with Applications, Vol. 58 (2016), 36--56.

Digital Library

[39]

John A. Weymark. 1981. Generalized Gini inequality indices. Mathematical Social Sciences, Vol. 1 (1981), 409--430.

[40]

Lin Xiao, Minand, Zhaoquan, L Yiqun, and Ma Shaoping. Fairness-aware group recommendation with pareto-efficiency. In RecSys 2018.

[41]

R. R. Yager. On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Transactions on Systems, Man, & Cybernetics ( ????).

[42]

Saba Yahyaa, Madalina Drugan, and Bernard Manderick. 2015. Thompson Sampling in the Adaptive Linear Scalarized Multi Objective Multi Armed Bandit. In Proceedings of the 7th International Conference on Agents and Artificial Intelligence.

Digital Library

[43]

Xing Yi, Liangjie Hong, Erheng Zhong, Nanthan Nan Liu, and Suju Rajan. Beyond clicks: dwell time for personalization. In Proceedings of RecSys 2014.

Digital Library

[44]

Chongjie Zhang and Julie A Shah. 2014. Fairness in Multi-Agent Sequential Decision-Making. NIPS. 2636--2644.

Cited By

Dash AChakraborty AGhosh SMukherjee AGummadi K(2024)Investigating Nudges toward Related Sellers on E-commerce Marketplaces: A Case Study on AmazonProceedings of the ACM on Human-Computer Interaction10.1145/36869948:CSCW2(1-31)Online publication date: 8-Nov-2024
https://doi.org/10.1145/3686994
Jeunen OMandav JPotapov IAgarwal NVaid SShi WUstimenko A(2024)Multi-Objective Recommendation via Multivariate Policy LearningProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688132(712-721)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688132
Gupta SJeunen OOosterhuis Hde Rijke M(2024)Optimal Baseline Corrections for Off-Policy Contextual BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688105(722-732)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688105
Show More Cited By

Index Terms

Bandit based Optimization of Multiple Objectives on a Music Streaming Platform
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Cost-sensitive learning
    2. Learning settings
      1. Learning from implicit feedback
2. Information systems
  1. World Wide Web
    1. Web searching and information discovery
      1. Content ranking

Recommendations

Handling multiple objectives with biogeography-based optimization

Biogeography-based optimization (BBO) is a new evolutionary optimization method inspired by biogeography. In this paper, BBO is extended to a multi-objective optimization, and a biogeography-based multi-objective optimization (BBMO) is introduced, which ...
Music recommendations for groups of users
ImmersiveMe '13: Proceedings of the 2013 ACM international workshop on Immersive media experiences

This paper presents an algorithm capable of providing meaningful recommendations to small sets of users. We consider not only rating patterns, bias tendencies, and temporal fluctuations, but also group-leaders. The approach here presented intends to ...
Optimizing multiple objectives in collaborative filtering
RecSys '10: Proceedings of the fourth ACM conference on Recommender systems

This paper is about the utility of making personalized recommendations. While it is important to accurately predict the target user's preference, in practice the accuracy should not be the only concern; a useful recommender system needs to consider the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

August 2020

3664 pages

ISBN:9781450379984

DOI:10.1145/3394486

General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '20

Sponsor:

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

July 6 - 10, 2020

CA, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
2,177
Total Downloads

Downloads (Last 12 months)263
Downloads (Last 6 weeks)29

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dash AChakraborty AGhosh SMukherjee AGummadi K(2024)Investigating Nudges toward Related Sellers on E-commerce Marketplaces: A Case Study on AmazonProceedings of the ACM on Human-Computer Interaction10.1145/36869948:CSCW2(1-31)Online publication date: 8-Nov-2024
https://doi.org/10.1145/3686994
Jeunen OMandav JPotapov IAgarwal NVaid SShi WUstimenko A(2024)Multi-Objective Recommendation via Multivariate Policy LearningProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688132(712-721)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688132
Gupta SJeunen OOosterhuis Hde Rijke M(2024)Optimal Baseline Corrections for Off-Policy Contextual BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688105(722-732)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688105
Dinnissen K(2024)Fairness and Transparency in Music Recommender Systems: Improvements for ArtistsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688024(1368-1375)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688024
Ma LZhang ZDing YZhou ZBaeza-Yates RBonchi F(2024)Handling Varied Objectives by Online Decision MakingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671812(2130-2140)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671812
Jeunen OPotapov IUstimenko ABaeza-Yates RBonchi F(2024)On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top-n RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671687(1222-1233)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671687
Su YLu HLi YLiu LBi SChi EChen MBaeza-Yates RBonchi F(2024)Multi-Task Neural Linear Bandit for Exploration in Recommender SystemsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671649(5723-5730)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671649
van den Akker BJeunen OLi YLondon BNazari ZParekh DAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)Practical Bandits: An Industry PerspectiveProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3636449(1132-1135)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3636449
Sagtani HJhawar MMehrotra RJeunen OAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)Ad-load Balancing via Off-policy Learning in a Content MarketplaceProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635846(586-595)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635846
Zhang JHowson BSavva PLoh E(2024)DISCO: An End-to-End Bandit Framework for Personalised Discount AllocationMachine Learning and Knowledge Discovery in Databases. Applied Data Science Track10.1007/978-3-031-70381-2_3(33-49)Online publication date: 22-Aug-2024
https://doi.org/10.1007/978-3-031-70381-2_3
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents