Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Modeling actions of PubMed users with n-gram language models

Published: 01 August 2009 Publication History

Abstract

Transaction logs from online search engines are valuable for two reasons: First, they provide insight into human information-seeking behavior. Second, log data can be used to train user models, which can then be applied to improve retrieval systems. This article presents a study of logs from PubMed®, the public gateway to the MEDLINE® database of bibliographic records from the medical and biomedical primary literature. Unlike most previous studies on general Web search, our work examines user activities with a highly-specialized search engine. We encode user actions as string sequences and model these sequences using n-gram language models. The models are evaluated in terms of perplexity and in a sequence prediction task. They help us better understand how PubMed users search for information and provide an enabler for improving users' search experience.

References

[1]
Agichtein, E., Brill, E., & Dumais, S. (2006). Improving Web search ranking by incorporating user behavior information. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006) (pp. 19-26). Seattle, WA.
[2]
Anick, P. (2003). Using terminological feedback for Web search refinement--A log-based study. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003) (pp. 88-95). Toronto, Canada.
[3]
Beitzel, S. M., Jensen, E. C., Chowdhury, A., Grossman, D., & Frieder, O. (2004). Hourly analysis of a very large topically categorized Web query log. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004) (pp. 321- 328). Sheffield, UK.
[4]
Broder, A. (2002). A taxonomy of Web search. SIGIR Forum, 36(2), 3-10.
[5]
Cahan, M. A. (1989). GRATEFUL MED: A tool for studying searching behavior. Medical Reference Services Quarterly, 8(4), 61-79.
[6]
Catledge, L. D., & Pitkow, J. E. (1995). Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems, 27(6), 1065-1073.
[7]
Chen, H.-M., & Cooper, M. D. (2002). Stochastic modeling of usage patterns in a Web-based information system. Journal of the American Society for Information Science and Technology, 53(7), 536-548.
[8]
Chen, S. F., & Goodman, J. (1996). An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL 1996) (pp. 310-318). Santa Cruz, CA.
[9]
Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22-29.
[10]
Clopper, C. J., & Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26. 404-413.
[11]
Cui, H., Wen, J.-R., Nie, J.-Y., & Ma, W.-Y. (2003). Query expansion by mining user logs. IEEE Transactions on Knowledge and Data Engineering, 15(4), 829-839.
[12]
De Groote, S. L., & Dorsch, J. L. (2003). Measuring use patterns of online journals and databases. Journal of the Medical Library Association, 91(2), 231-240.
[13]
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61-74.
[14]
Eirinaki, M., & Vazirgiannis, M. (2003). Web mining for Web personalization. ACM Transactions on Internet Technology, 3(1), 1-27.
[15]
Haynes, R. B., Wilczynski, N., McKibbon, K. A., Walker, C. J., & Sinclair, J. C. (1994). Developing optimal search strategies for detecting clinically sound studies in MEDLINE. Journal of the American Medical Informatics Association, 1(6), 447-458.
[16]
He, D., & Göker, A. (2000). Detecting session boundaries from Web user logs. In Proceedings of the BCS-IRSG 22nd Annual Colloquium on Information Retrieval Research (pp. 57-66). Cambridge, UK.
[17]
Hersh, W. R., Cohen, A., Ruslen, L., & Roberts, P. (2007). TREC 2007 Genomics Track overview. In Proceedings of the Sixteenth Text REtrieval Conference (TREC 2007). Gaithersburg, MD.
[18]
Hersh, W. R., Cohen, A., Yang, J., Bhupatiraju, R., Roberts, P., & Hearst, M. (2005). TREC 2005 Genomics Track overview. In Proceedings of the Fourteenth Text REtrieval Conference (TREC 2005), Gaithersburg, MD.
[19]
Herskovic, J. R., Tanaka, L. Y., Hersh, W. R., & Bernstam, E. V. (2007). A day in the life of : Analysis of a typical day's query log. Journal of the American Medical Informatics Association, 14(2), 212-220.
[20]
Horowitz, G. L., Jackson, J. D., & Bleich, H. L. (1983). PaperChase. Self-service bibliographic retrieval. JAMA, 250(18), 2494-2499.
[21]
Jansen, B. J., & Spink, A. (2004). An analysis of documents viewing patterns of Web search engine users. In A. Scime (Ed.), Web mining: Applications and techniques (pp. 339-354). Hershey, PA: IGI Publishing.
[22]
Jansen, B. J., & Spink, A. (2006). How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Information Processing and Management, 42(1), 248-263.
[23]
Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: A study and analysis of user queries on the Web. Information Processing and Management, 36(2), 207-227.
[24]
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., & Gay, G. (2007). Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Transactions on Information Systems, 25(2), 1-27.
[25]
Katz, S. M. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3), 400-401.
[26]
King, N. S. (1991). Search characteristics and the effects of experience on end users of PaperChase. College and Research Libraries, 52(4), 360-374.
[27]
Lin, J., DiCuccio, M., Grigoryan, V., & Wilbur, W. J. (2008). Navigating information spaces: A case study of related article search in . Information Processing & Management, 44(5), 1771-1783.
[28]
Lin, J., & Smucker, M. D. (2008). How do users find things with ? Towards automatic utility evaluation with user simulations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008) (pp. 19-26). Singapore.
[29]
Lin, J., & Wilbur, W. J. (2007). related articles: A probabilistic topic-based model for content similarity. BMC Bioinformatics, 8, 423.
[30]
Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
[31]
Moore, R. C. (2004). On log-likelihood-ratios and the significance of rare events. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002) (pp. 333-340). Barcelona, Spain.
[32]
Murray, G. C., Lin, J., & Chowdhury, A. (2006). Action modeling: Using language models to predict query behavior. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006) (pp. 681-682). Seattle, WA.
[33]
Murray, G. C., & Teevan, J. (2007). Query log analysis: Social and technological challenges. SIGIR Forum, 41(2), 112-120.
[34]
Pearce, D. (2002). A comparative evaluation of collocation extraction techniques. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002) (pp. 1530- 1536). Canary Islands, Spain.
[35]
Rose, D. E., & Levinson, D. (2004). Understanding user goals in Web search. In Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004) (pp. 13-19). New York, NY.
[36]
Shen, X., Tan, B., & ChengXiang, Z. (2005). Implicit user modeling for personalized search. In Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management (pp. 824-831). Bremen, Germany.
[37]
Silverstein, C., Marais, H., Henzinger, M., & Moricz, M. (1999). Analysis of a very large Web search engine query log. SIGIR Forum, 33(1), 6-12.
[38]
Smucker, M. D., & Allan, J. (2006). Find-similar: Similarity browsing as a search tool. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006) (pp. 461-468). Seattle, WA.
[39]
Stolcke, A. (2002). SRILM-An extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002) (pp. 901-904). Denver, CO.
[40]
Wilbur, W. J., & Coffee, L. (1994). The effectiveness of document neighboring in search enhancement. Information Processing and Management, 30(2), 253-266.

Cited By

View all
  • (2023)DIGMN: Dynamic Intent Guided Meta Network for Differentiated User Engagement Forecasting in Online Professional Social PlatformsProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570420(384-392)Online publication date: 27-Feb-2023
  • (2017)Domain-independent search expertiseJournal of the Association for Information Science and Technology10.1002/asi.2377668:6(1462-1479)Online publication date: 1-Jun-2017
  • (2017)How many ways to use CiteSpace? A study of user interactive events over 14 monthsJournal of the Association for Information Science and Technology10.1002/asi.2377068:5(1234-1256)Online publication date: 1-May-2017
  • Show More Cited By
  1. Modeling actions of PubMed users with n-gram language models

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Information Retrieval
    Information Retrieval  Volume 12, Issue 4
    Aug 2009
    72 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 August 2009

    Author Tags

    1. Query log analysis
    2. Search behavior

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)DIGMN: Dynamic Intent Guided Meta Network for Differentiated User Engagement Forecasting in Online Professional Social PlatformsProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570420(384-392)Online publication date: 27-Feb-2023
    • (2017)Domain-independent search expertiseJournal of the Association for Information Science and Technology10.1002/asi.2377668:6(1462-1479)Online publication date: 1-Jun-2017
    • (2017)How many ways to use CiteSpace? A study of user interactive events over 14 monthsJournal of the Association for Information Science and Technology10.1002/asi.2377068:5(1234-1256)Online publication date: 1-May-2017
    • (2015)Domain-independent search expertiseJournal of the Association for Information Science and Technology10.1002/asi.2327266:7(1388-1405)Online publication date: 1-Jul-2015
    • (2013)Report on the SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation (MUBE 2013)ACM SIGIR Forum10.1145/2568388.256840347:2(84-95)Online publication date: 21-Jan-2013
    • (2012)The unified logging infrastructure for data analytics at TwitterProceedings of the VLDB Endowment10.14778/2367502.23675165:12(1771-1780)Online publication date: 1-Aug-2012
    • (2012)Task complexity and information searching in administrative tasks revisitedProceedings of the 4th Information Interaction in Context Symposium10.1145/2362724.2362759(204-213)Online publication date: 21-Aug-2012
    • (2012)Barriers to task-based information access in molecular medicineJournal of the American Society for Information Science and Technology10.1002/asi.2167263:1(86-97)Online publication date: 1-Jan-2012
    • (2010)Information interaction in molecular medicineProceedings of the third symposium on Information interaction in context10.1145/1840784.1840800(95-104)Online publication date: 18-Aug-2010
    • (2010)Agro-GatorProceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval10.1145/1835449.1835573(706-706)Online publication date: 19-Jul-2010
    • Show More Cited By

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media