Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3340531.3411962acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Evaluating Stochastic Rankings with Expected Exposure

Published: 19 October 2020 Publication History

Abstract

We introduce the concept of expected exposure as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principle is desirable for many retrieval objectives and scenarios, including topical diversity and fair ranking. %Leveraging user models from existing retrieval metrics, we propose a general evaluation methodology based on expected exposure and draw connections to related metrics in information retrieval evaluation. Importantly, this methodology relaxes classic information retrieval assumptions, allowing a system, in response to a query, to produce a distribution over rankings instead of a single fixed ranking. We study the behavior of the expected exposure metric and stochastic rankers across a variety of information access conditions, including ad hoc retrieval and recommendation. %We believe that measuring and optimizing expected exposure metrics using randomization opens a new area for retrieval algorithm development and progress.

Supplementary Material

MP4 File (3340531.3411962.mp4)
We introduce the concept of expected exposure as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. Leveraging user models from existing retrieval metrics, we propose a general evaluation methodology based on expected exposure and draw connections to related metrics in information retrieval evaluation. Importantly, this methodology relaxes classic information retrieval assumptions, allowing a system, in response to a query, to produce a distribution over rankings instead of a single fixed ranking. We believe that measuring and optimizing expected exposure metrics using randomization opens a new area for retrieval algorithm development and progress.

References

[1]
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).
[2]
Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi, and Cristos Goodrow. 2019. Fairness in Recommendation Ranking Through Pairwise Comparisons. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '19). 2212--2220.
[3]
Asia J. Biega, Krishna P. Gummadi, and Gerhard Weikum. 2018. Equity of Attention: Amortizing Individual Fairness in Rankings. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR '18). 405--414.
[4]
Sebastian Bruch, Shuguang Han, Mike Bendersky, and Marc Najork. 2020. A Stochastic Treatment of Learning to Rank Scoring Functions. In Proceedings of the 13th ACM International Conference on Web Search and Data Mining (WSDM 2020).
[5]
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning. 89--96.
[6]
Robin Burke. 2017. Multisided Fairness for Recommendation. (July 2017). arxiv: cs.CY/1707.00093
[7]
L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. 2018. Ranking with Fairness Constraints. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9--13, 2018, Prague, Czech Republic. 28:1--28:15.
[8]
Allison J B Chaney, Brandon M Stewart, and Barbara E Engelhardt. 2018. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18). 224--232.
[9]
Olivier Chapelle, Donald Metzler, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management (CIKM '09). 621--630.
[10]
David Cossock and Tong Zhang. 2006. Subset ranking using regression. In International Conference on Computational Learning Theory. Springer, 605--619.
[11]
Persi Diaconis and Mehrdad Shahshahani. 1981. Generating a random permutation with random transpositions. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, Vol. 57, 2 (Jun 1981), 159--179.
[12]
Fernando Diaz, Ryen W. White, Georg Buscher, and Dan Liebling. 2013. Robust Models of Mouse Movement on Dynamic Web Search Results Pages. In Proceedings of the 22nd ACM conference on Information and knowledge management (CIKM 2013). 1451--1460.
[13]
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS '12). 214--226.
[14]
Michael D. Ekstrand. 2020. LensKit for Python: Next-Generation Software for Recommender System Experiments. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. ACM.
[15]
Ruoyuan Gao and Chirag Shah. 2020. Toward creating a fairer ranking in search engine results. Information Processing & Management, Vol. 57, 1 (2020), 102138.
[16]
F Maxwell Harper and Joseph A Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems, Vol. 5, 4 (Dec. 2015), 19:1--19:19.
[17]
Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. 2018. Fairness Without Demographics in Repeated Loss Minimization. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. 1929--1938.
[18]
K. Hofmann. 2013. Fast and Reliable Online Learning to Rank for Information Retrieval. phd. University of Amsterdam.
[19]
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased learning-to-rank with biased feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 781--789.
[20]
Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. 2016. Fairness in Learning: Classic and Contextual Bandits. In Advances in Neural Information Processing Systems 29, D D Lee, M Sugiyama, U V Luxburg, I Guyon, and R Garnett (Eds.). 325--333.
[21]
Michael Kearns, Aaron Roth, and Zhiwei Steven Wu. 2017. Meritocratic Fairness for Cross-Population Selection. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. 1828--1836.
[22]
R. Duncan Luce. 1959. Individual Choice Behavior.
[23]
Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. 2017. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables.
[24]
Rishabh Mehrotra, Amit Sharma, Ashton Anderson, Fernando Diaz, Hanna Wallach, and Emine Yilmaz. 2017. Auditing Search Engines for Differential Satisfaction Across Demographics. In Proceedings of the 26th International Conference on World Wide Web.
[25]
Alistair Moffat and Justin Zobel. 2008. Rank-biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Syst., Vol. 27, 1 (Dec. 2008), 2:1--2:27.
[26]
Sandeep Pandey, Sourashis Roy, Christopher Olston, Junghoo Cho, and Soumen Chakrabarti. 2005. Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results. In VLDB. 781--792.
[27]
István Pilászy, Dávid Zibriczky, and Domonkos Tikk. 2010. Fast Als-based Matrix Factorization for Explicit and Implicit Feedback Datasets. In Proceedings of the Fourth ACM Conference on Recommender Systems (RecSys '10). 71--78.
[28]
R. L. Plackett. 1975. The Analysis of Permutations. Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 24, 2 (1975), 193--202.
[29]
Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 Datasets. CoRR, Vol. abs/1306.2597 (2013).
[30]
Tao Qin, Tie-Yan Liu, and Hang Li. 2010. A general approximation framework for direct optimization of information retrieval measures. Information retrieval, Vol. 13, 4 (2010), 375--397.
[31]
Filip Radlinski and Thorsten Joachims. 2006. Minimally Invasive Randomization for Collecting Unbiased Preferences from Clickthrough Logs. In Proceedings of the 21st National Conference on Artificial Intelligence - Volume 2 (AAAI'06). 1406--1412.
[32]
Filip Radlinski and Thorsten Joachims. 2007. Active exploration for learning rankings from clickthrough data. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 570--579.
[33]
F. Radlinski, R. Kleinberg, and T. Joachims. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), Andrew McCallum and Sam Roweis (Eds.). 784--791.
[34]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI '09). 452--461.
[35]
Tetsuya Sakai and Ruihua Song. 2011. Evaluating Diversified Search Results Using Per-Intent Graded Relevance. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '11). 1043--1052.
[36]
Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2015. Search Result Diversification. Foundations and Trends in Information Retrieval, Vol. 9. Now Publishers Inc.
[37]
Piotr Sapiezynski, Wesley Zeng, Ronald E Robertson, Alan Mislove, and Christo Wilson. 2019. Quantifying the Impact of User Attentionon Fair Group Representation in Ranked Lists. In Companion Proceedings of The 2019 World Wide Web Conference on - WWW '19. 553--562.
[38]
Tobias Schnabel, Paul N. Bennett, Susan T. Dumais, and Thorsten Joachims. 2018. Short-Term Satisfaction and Long-Term Coverage: Understanding How Users Tolerate Algorithmic Exploration. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM '18). 513--521.
[39]
Ashudeep Singh and Thorsten Joachims. 2018. Fairness of Exposure in Rankings. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). 2219--2228.
[40]
Ashudeep Singh and Thorsten Joachims. 2019. Policy Learning for Fairness in Ranking. In Advances in Neural Information Processing Systems.
[41]
Till Speicher, Hoda Heidari, Nina Grgic-Hlaca, Krishna P. Gummadi, Adish Singla, Adrian Weller, and Muhammad Bilal Zafar. 2018. A Unified Approach to Quantifying Algorithmic Unfairness: Measuring Individual & Group Unfairness via Inequality Indices. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). 2239--2248.
[42]
Tom Sühr, Asia J Biega, Meike Zehlike, Krishna P Gummadi, and Abhijnan Chakraborty. 2019. Two-sided fairness for repeated matchings in two-sided markets: A case study of a ride-hailing platform. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 3082--3092.
[43]
Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to rank with selection bias in personal search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 115--124.
[44]
Mingrui Wu, Yi Chang, Zhaohui Zheng, and Hongyuan Zha. 2009. Smoothing DCG for learning to rank: A novel approach using smoothed hinge functions. In Proc. CIKM. ACM, 1923--1926.
[45]
Himank Yadav, Zhengxiao Du, and Thorsten Joachims. 2019. Fair Learning-to-Rank from Implicit Feedback. arXiv preprint arXiv:1911.08054 (2019).
[46]
Ke Yang and Julia Stoyanovich. 2017. Measuring Fairness in Ranked Outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management (SSDBM '17).
[47]
Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. FA*IR: A Fair Top-k Ranking Algorithm. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). 1569--1578.
[48]
Meike Zehlike and Carlos Castillo. 2020. Reducing Disparate Exposure in Ranking: A Learning To Rank Approach. In WWW '20: The Web Conference 2020, Taipei, Taiwan, April 20--24, 2020, Yennun Huang, Irwin King, Tie-Yan Liu, and Maarten van Steen (Eds.). 2849--2855.
[49]
Jianhan Zhu, Jun Wang, Ingemar J. Cox, and Michael J. Taylor. 2009. Risky business: modeling and exploiting uncertainty in information retrieval. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. 99--106.

Cited By

View all
  • (2024)The Implementation of Recommender Systems for Mental Health Recovery Narratives: Evaluation of Use and PerformanceJMIR Mental Health10.2196/4575411(e45754)Online publication date: 29-Mar-2024
  • (2024)Properties of Group Fairness Measures for RankingsACM Transactions on Social Computing10.1145/3674883Online publication date: 27-Aug-2024
  • (2024)Towards Group-aware Search SuccessProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672526(123-131)Online publication date: 2-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. diversity
  2. evaluation
  3. fairness
  4. learning to rank

Qualifiers

  • Research-article

Conference

CIKM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)157
  • Downloads (Last 6 weeks)17
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Implementation of Recommender Systems for Mental Health Recovery Narratives: Evaluation of Use and PerformanceJMIR Mental Health10.2196/4575411(e45754)Online publication date: 29-Mar-2024
  • (2024)Properties of Group Fairness Measures for RankingsACM Transactions on Social Computing10.1145/3674883Online publication date: 27-Aug-2024
  • (2024)Towards Group-aware Search SuccessProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672526(123-131)Online publication date: 2-Aug-2024
  • (2024)Overcoming Diverse Undesired Effects in Recommender Systems: A Deontological ApproachACM Transactions on Intelligent Systems and Technology10.1145/364385715:4(1-23)Online publication date: 1-Feb-2024
  • (2024)Counterfactual Explanation for Fairness in RecommendationACM Transactions on Information Systems10.1145/364367042:4(1-30)Online publication date: 29-Jan-2024
  • (2024)On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top-n RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671687(1222-1233)Online publication date: 25-Aug-2024
  • (2024)Recommend Me? Designing Fairness Metrics with ProvidersProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3659044(2389-2399)Online publication date: 3-Jun-2024
  • (2024)PreFAIR: Combining Partial Preferences for Fair Consensus Decision-makingProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658961(1133-1149)Online publication date: 3-Jun-2024
  • (2024)A Framework for Exploring the Consequences of AI-Mediated Enterprise Knowledge Access and Identifying Risks to WorkersProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658900(207-220)Online publication date: 3-Jun-2024
  • (2024)Wise Fusion: Group Fairness Enhanced Rank FusionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679649(163-174)Online publication date: 21-Oct-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media