Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

AMRank: : An adversarial Markov ranking model combining short- and long-term returns

Published: 01 January 2023 Publication History

Abstract

Learning to rank (LTR) is a method of ranking search results using machine learning techniques. Currently, the reinforcement-learning-based ranking models have achieved some success in LTR task. However, these models have disadvantages like high variance gradient estimates and train inefficiency, which bring great challenges to the convergence and accuracy of the ranking model. Combining short- and long-term returns, this paper proposes AMRank, an adversarial Markov ranking model, which is based on reinforcement learning and formalizes the ranking task as a Markov decision process. To address the aforementioned weaknesses, in AMRank, we present a sequence discriminator to output a long-term return with a smaller variance and conduct single step updates, and use a document discriminator to yield a short-term return. The two discriminators are trained simultaneously before the decision is made. In the training process, the policy network is applied as a generator to sample candidate documents and get negative samples. At the beginning of the decision, the discriminator outputs the returns based on the environment state and the policy, and finally updates the parameters of the policy network using the policy gradient method. Experimental results on three LETOR benchmark datasets, OHUSMED, MQ2007 and MQ2008, demonstrate that the proposed AMRank outperforms the baseline models in document ranking task.

Highlights

AMRank, a novel document ranking model, is proposed, which combines MDP and GAN.
AMRank employs long- and short-term returns to improve decision making.
A sequence discriminator is presented to generate long-term returns.
AMRank can realize one-step update and output return with less variance.

References

[1]
Ai Q., Wang X., Bruch S., Golbandi N., Bendersky M., Najork M., Learning groupwise multivariate scoring functions using deep neural networks, in: Fang Y., Zhang Y., Allan J., Balog K., Carterette B., Guo J. (Eds.), Proceedings of the 2019 ACM SIGIR international conference on theory of information retrieval, ACM, 2019, pp. 85–92,.
[2]
Ali Z., Kefalas P., Muhammad K., Ali B., Imran M., Deep learning in citation recommendation models survey, Expert Systems with Applications 162 (2020),.
[3]
Ali Z., Qi G., Muhammad K., Kefalas P., Khusro S., Global citation recommendation employing generative adversarial network, Expert Systems with Applications 180 (2021),.
[4]
Burges, C. J. (2010). From ranknet to lambdarank to LambdaMART: An overview: Technical report MSR-TR-2010-82, URL https://www.microsoft.com/en-us/research/publication/from-ranknet-to-lambdarank-to-lambdamart-an-overview/.
[5]
Burges C.J.C., Ragno R., Le Q.V., Learning to rank with nonsmooth cost functions, in: Schölkopf B., Platt J.C., Hofmann T. (Eds.), Advances in neural information processing systems 19, proceedings of the twentieth annual conference on neural information processing systems, MIT Press, 2006, pp. 193–200. URL https://proceedings.neurips.cc/paper/2006/hash/af44c4c56f385c43f2529f9b1b018f6a-Abstract.html.
[6]
Cao Z., Qin T., Liu T., Tsai M., Li H., Learning to rank: From pairwise approach to listwise approach, in: Ghahramani Z. (Ed.), Machine Learning, proceedings of the twenty-fourth international conference, in: ACM international conference proceeding series, vol. 227, ACM, 2007, pp. 129–136,.
[7]
Chapelle O., Keerthi S.S., Efficient algorithms for ranking with SVMs, Information Retrieval 13 (3) (2010) 201–215,.
[8]
Clarke C.L.A., Kolla M., Cormack G.V., Vechtomova O., Ashkan A., Büttcher S., MacKinnon I., Novelty and diversity in information retrieval evaluation, in: Myaeng S., Oard D.W., Sebastiani F., Chua T., Leong M. (Eds.), proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, ACM, 2008, pp. 659–666,.
[9]
Cossock D., Zhang T., Subset ranking using regression, in: Lugosi G., Simon H.U. (Eds.), Learning theory, 19th annual conference on learning theory, in: Lecture notes in computer science, vol. 4005, Springer, 2006, pp. 605–619,.
[10]
Dehghani M., Zamani H., Severyn A., Kamps J., Croft W.B., Neural ranking models with weak supervision, in: Kando N., Sakai T., Joho H., Li H., de Vries A.P., White R.W. (Eds.), Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, 2017, pp. 65–74,.
[11]
Feng Y., Xu J., Lan Y., Guo J., Zeng W., Cheng X., From greedy selection to exploratory decision-making: Diverse ranking with policy-value networks, in: Collins-Thompson K., Mei Q., Davison B.D., Liu Y., Yilmaz E. (Eds.), The 41st international ACM SIGIR conference on research & development in information retrieval, ACM, 2018, pp. 125–134,.
[12]
Gampa P., Fujita S., BanditRank: Learning to rank using contextual bandits, in: Karlapalem K., Cheng H., Ramakrishnan N., Agrawal R.K., Reddy P.K., Srivastava J., Chakraborty T. (Eds.), Advances in knowledge discovery and data mining - 25th Pacific-Asia conference, in: Lecture notes in computer science, vol. 12714, Springer, 2021, pp. 259–271,.
[13]
Goodfellow I.J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A.C., Bengio Y., Generative adversarial nets, in: Ghahramani Z., Welling M., Cortes C., Lawrence N.D., Weinberger K.Q. (Eds.), Advances in neural information processing systems 27: Annual conference on neural information processing systems 2014, 2014, pp. 2672–2680. URL https://proceedings.neurips.cc/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html.
[14]
Guo J., Fan Y., Pang L., Yang L., Ai Q., Zamani H., Wu C., Croft W.B., Cheng X., A deep look into neural ranking models for information retrieval, Information Processing & Management 57 (6) (2020),.
[15]
He K., Zhang X., Ren S., Sun J., Deep residual learning for image recognition, in: 2016 IEEE conference on computer vision and pattern recognition, IEEE Computer Society, 2016, pp. 770–778,.
[16]
Jain M., Kamath S.S., Improving convergence in IRGAN with PPO, in: Roy R.S. (Ed.), CoDS-COMAD 2020: 7th ACM IKDD CoDS and 25th COMAD, ACM, 2020, pp. 328–329,.
[17]
Peng D., Yang W., Liu C., Lü S., SAM-GAN: Self-attention supporting multi-stage generative adversarial networks for text-to-image synthesis, Neural Networks 138 (2021) 57–67,.
[18]
Pobrotyn P., Bartczak T., Synowiec M., Bialobrzeski R., Bojar J., Context-aware learning to rank with self-attention, 2020, CoRR abs/2005.10084, URL https://arxiv.org/abs/2005.10084.
[19]
Pobrotyn P., Bialobrzeski R., Neuralsndcg: Direct optimisation of a ranking metric via differentiable relaxation of sorting, 2021, CoRR abs/2102.07831, URL https://arxiv.org/abs/2102.07831.
[20]
Qin T., Liu T., Xu J., Li H., LETOR: A benchmark collection for research on learning to rank for information retrieval, Information Retrieval 13 (4) (2010) 346–374,.
[21]
Severyn A., Moschitti A., Learning to rank short text pairs with convolutional deep neural networks, in: Baeza-Yates R., Lalmas M., Moffat A., Ribeiro-Neto B.A. (Eds.), Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, ACM, 2015, pp. 373–382,.
[22]
Sheetrit E., Shtok A., Kurland O., A passage-based approach to learning to rank documents, Information Retrieval Journal 23 (2) (2020) 159–186,.
[23]
Sutton R.S., Barto A.G., Reinforcement learning: An introduction, MIT Press, 2018, URL https://www-inst.eecs.berkeley.edu//~cs188/sp20/assets/files/SuttonBartoIPRLBook2ndEd.pdf.
[24]
Sutton R.S., McAllester D.A., Singh S.P., Mansour Y., Policy gradient methods for reinforcement learning with function approximation, in: Solla S.A., Leen T.K., Müller K. (Eds.), Advances in neural information processing systems 12, The MIT Press, 1999, pp. 1057–1063. URL https://proceedings.neurips.cc/paper/1999/hash/464d828b85b0bed98e80ade0a5c43b0f-Abstract.html.
[25]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I., Attention is all you need, in: Guyon I., von Luxburg U., Bengio S., Wallach H.M., Fergus R., Vishwanathan S.V.N., Garnett R. (Eds.), Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017, 2017, pp. 5998–6008. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
[26]
Wang J., Yu L., Zhang W., Gong Y., Xu Y., Wang B., Zhang P., Zhang D., IRGAN: A minimax game for unifying generative and discriminative information retrieval models, in: Kando N., Sakai T., Joho H., Li H., de Vries A.P., White R.W. (Eds.), Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, 2017, pp. 515–524,.
[27]
Wei Z., Xu J., Lan Y., Guo J., Cheng X., Reinforcement learning to rank with Markov decision process, in: Kando N., Sakai T., Joho H., Li H., de Vries A.P., White R.W. (Eds.), Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, 2017, pp. 945–948,.
[28]
Williams R.J., Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning 8 (1992) 229–256,.
[29]
Wu Q., Liu Y., Miao C., Zhao B., Zhao Y., Guan L., PD-GAN: Adversarial learning for personalized diversity-promoting recommendation, in: Kraus S. (Ed.), Proceedings of the twenty-eighth international joint conference on artificial intelligence, ijcai.org, 2019, pp. 3870–3876,.
[30]
Xia L., Xu J., Lan Y., Guo J., Zeng W., Cheng X., Adapting Markov decision process for search result diversification, in: Kando N., Sakai T., Joho H., Li H., de Vries A.P., White R.W. (Eds.), Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, 2017, pp. 535–544,.
[31]
Xu J., Li H., Adarank: A boosting algorithm for information retrieval, in: Kraaij W., de Vries A.P., Clarke C.L.A., Fuhr N., Kando N. (Eds.), SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, ACM, 2007, pp. 391–398,.
[32]
Xu J., Wei Z., Xia L., Lan Y., Yin D., Cheng X., Wen J., Reinforcement learning to rank with pairwise policy gradient, in: Huang J., Chang Y., Cheng X., Kamps J., Murdock V., Wen J., Liu Y. (Eds.), Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, ACM, 2020, pp. 509–518,.
[33]
Yue Y., Finley T., Radlinski F., Joachims T., A support vector method for optimizing average precision, in: Kraaij W., de Vries A.P., Clarke C.L.A., Fuhr N., Kando N. (Eds.), SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, ACM, 2007, pp. 271–278,.
[34]
Zhao K., Wang X., Zhang Y., Zhao L., Liu Z., Xing C., Xie X., Leveraging demonstrations for reinforcement recommendation reasoning over knowledge graphs, in: Huang J., Chang Y., Cheng X., Kamps J., Murdock V., Wen J., Liu Y. (Eds.), Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, ACM, 2020, pp. 239–248,.
[35]
Zhou J., Agichtein E., RLIRank: Learning to rank with reinforcement learning for dynamic search, in: Huang Y., King I., Liu T., van Steen M. (Eds.), WWW ’20: The web conference 2020, ACM / IW3C2, 2020, pp. 2842–2848,.
[36]
Zhu X., Klabjan D., Listwise learning to rank by exploring unique ratings, in: Caverlee J., Hu X.B., Lalmas M., Wang W. (Eds.), WSDM ’20: The thirteenth ACM international conference on web search and data mining, ACM, 2020, pp. 798–806,.
[37]
Zou S., Li Z., Akbari M., Wang J., Zhang P., MarlRank: Multi-agent reinforced learning to rank, in: Zhu W., Tao D., Cheng X., Cui P., Rundensteiner E.A., Carmel D., He Q., Yu J.X. (Eds.), Proceedings of the 28th ACM international conference on information and knowledge management, ACM, 2019, pp. 2073–2076,.

Cited By

View all

Index Terms

  1. AMRank: An adversarial Markov ranking model combining short- and long-term returns
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Expert Systems with Applications: An International Journal
    Expert Systems with Applications: An International Journal  Volume 211, Issue C
    Jan 2023
    1635 pages

    Publisher

    Pergamon Press, Inc.

    United States

    Publication History

    Published: 01 January 2023

    Author Tags

    1. Document ranking
    2. Learning to rank
    3. Reinforcement learning
    4. Discriminator

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media