article

Modeling actions of PubMed users with n-gram language models

Authors:

W. John WilburAuthors Info & Claims

Information Retrieval, Volume 12, Issue 4

Pages 487 - 503

https://doi.org/10.1007/s10791-008-9067-7

Published: 01 August 2009 Publication History

Abstract

Transaction logs from online search engines are valuable for two reasons: First, they provide insight into human information-seeking behavior. Second, log data can be used to train user models, which can then be applied to improve retrieval systems. This article presents a study of logs from PubMed^®, the public gateway to the MEDLINE^® database of bibliographic records from the medical and biomedical primary literature. Unlike most previous studies on general Web search, our work examines user activities with a highly-specialized search engine. We encode user actions as string sequences and model these sequences using n-gram language models. The models are evaluated in terms of perplexity and in a sequence prediction task. They help us better understand how PubMed users search for information and provide an enabler for improving users' search experience.

References

[1]

Agichtein, E., Brill, E., & Dumais, S. (2006). Improving Web search ranking by incorporating user behavior information. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006) (pp. 19-26). Seattle, WA.

[2]

Anick, P. (2003). Using terminological feedback for Web search refinement--A log-based study. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003) (pp. 88-95). Toronto, Canada.

[3]

Beitzel, S. M., Jensen, E. C., Chowdhury, A., Grossman, D., & Frieder, O. (2004). Hourly analysis of a very large topically categorized Web query log. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004) (pp. 321- 328). Sheffield, UK.

[4]

Broder, A. (2002). A taxonomy of Web search. SIGIR Forum, 36(2), 3-10.

Digital Library

[5]

Cahan, M. A. (1989). GRATEFUL MED: A tool for studying searching behavior. Medical Reference Services Quarterly, 8(4), 61-79.

[6]

Catledge, L. D., & Pitkow, J. E. (1995). Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems, 27(6), 1065-1073.

Digital Library

[7]

Chen, H.-M., & Cooper, M. D. (2002). Stochastic modeling of usage patterns in a Web-based information system. Journal of the American Society for Information Science and Technology, 53(7), 536-548.

Digital Library

[8]

Chen, S. F., & Goodman, J. (1996). An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL 1996) (pp. 310-318). Santa Cruz, CA.

[9]

Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22-29.

Digital Library

[10]

Clopper, C. J., & Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26. 404-413.

[11]

Cui, H., Wen, J.-R., Nie, J.-Y., & Ma, W.-Y. (2003). Query expansion by mining user logs. IEEE Transactions on Knowledge and Data Engineering, 15(4), 829-839.

Digital Library

[12]

De Groote, S. L., & Dorsch, J. L. (2003). Measuring use patterns of online journals and databases. Journal of the Medical Library Association, 91(2), 231-240.

[13]

Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61-74.

Digital Library

[14]

Eirinaki, M., & Vazirgiannis, M. (2003). Web mining for Web personalization. ACM Transactions on Internet Technology, 3(1), 1-27.

Digital Library

[15]

Haynes, R. B., Wilczynski, N., McKibbon, K. A., Walker, C. J., & Sinclair, J. C. (1994). Developing optimal search strategies for detecting clinically sound studies in MEDLINE. Journal of the American Medical Informatics Association, 1(6), 447-458.

[16]

He, D., & Göker, A. (2000). Detecting session boundaries from Web user logs. In Proceedings of the BCS-IRSG 22nd Annual Colloquium on Information Retrieval Research (pp. 57-66). Cambridge, UK.

[17]

Hersh, W. R., Cohen, A., Ruslen, L., & Roberts, P. (2007). TREC 2007 Genomics Track overview. In Proceedings of the Sixteenth Text REtrieval Conference (TREC 2007). Gaithersburg, MD.

[18]

Hersh, W. R., Cohen, A., Yang, J., Bhupatiraju, R., Roberts, P., & Hearst, M. (2005). TREC 2005 Genomics Track overview. In Proceedings of the Fourteenth Text REtrieval Conference (TREC 2005), Gaithersburg, MD.

[19]

Herskovic, J. R., Tanaka, L. Y., Hersh, W. R., & Bernstam, E. V. (2007). A day in the life of : Analysis of a typical day's query log. Journal of the American Medical Informatics Association, 14(2), 212-220.

[20]

Horowitz, G. L., Jackson, J. D., & Bleich, H. L. (1983). PaperChase. Self-service bibliographic retrieval. JAMA, 250(18), 2494-2499.

[21]

Jansen, B. J., & Spink, A. (2004). An analysis of documents viewing patterns of Web search engine users. In A. Scime (Ed.), Web mining: Applications and techniques (pp. 339-354). Hershey, PA: IGI Publishing.

[22]

Jansen, B. J., & Spink, A. (2006). How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Information Processing and Management, 42(1), 248-263.

Digital Library

[23]

Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: A study and analysis of user queries on the Web. Information Processing and Management, 36(2), 207-227.

Digital Library

[24]

Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., & Gay, G. (2007). Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Transactions on Information Systems, 25(2), 1-27.

Digital Library

[25]

Katz, S. M. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3), 400-401.

[26]

King, N. S. (1991). Search characteristics and the effects of experience on end users of PaperChase. College and Research Libraries, 52(4), 360-374.

[27]

Lin, J., DiCuccio, M., Grigoryan, V., & Wilbur, W. J. (2008). Navigating information spaces: A case study of related article search in . Information Processing & Management, 44(5), 1771-1783.

[28]

Lin, J., & Smucker, M. D. (2008). How do users find things with ? Towards automatic utility evaluation with user simulations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008) (pp. 19-26). Singapore.

[29]

Lin, J., & Wilbur, W. J. (2007). related articles: A probabilistic topic-based model for content similarity. BMC Bioinformatics, 8, 423.

[30]

Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.

[31]

Moore, R. C. (2004). On log-likelihood-ratios and the significance of rare events. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002) (pp. 333-340). Barcelona, Spain.

[32]

Murray, G. C., Lin, J., & Chowdhury, A. (2006). Action modeling: Using language models to predict query behavior. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006) (pp. 681-682). Seattle, WA.

[33]

Murray, G. C., & Teevan, J. (2007). Query log analysis: Social and technological challenges. SIGIR Forum, 41(2), 112-120.

Digital Library

[34]

Pearce, D. (2002). A comparative evaluation of collocation extraction techniques. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002) (pp. 1530- 1536). Canary Islands, Spain.

[35]

Rose, D. E., & Levinson, D. (2004). Understanding user goals in Web search. In Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004) (pp. 13-19). New York, NY.

[36]

Shen, X., Tan, B., & ChengXiang, Z. (2005). Implicit user modeling for personalized search. In Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management (pp. 824-831). Bremen, Germany.

[37]

Silverstein, C., Marais, H., Henzinger, M., & Moricz, M. (1999). Analysis of a very large Web search engine query log. SIGIR Forum, 33(1), 6-12.

Digital Library

[38]

Smucker, M. D., & Allan, J. (2006). Find-similar: Similarity browsing as a search tool. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006) (pp. 461-468). Seattle, WA.

[39]

Stolcke, A. (2002). SRILM-An extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002) (pp. 901-904). Denver, CO.

[40]

Wilbur, W. J., & Coffee, L. (1994). The effectiveness of document neighboring in search enhancement. Information Processing and Management, 30(2), 253-266.

Digital Library

Cited By

Li FDu LFu QHan SDu YLu GLi ZChua TLauw HSi LTerzi ETsaparas P(2023)DIGMN: Dynamic Intent Guided Meta Network for Differentiated User Engagement Forecasting in Online Professional Social PlatformsProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570420(384-392)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3539597.3570420
Smith C(2017)Domain-independent search expertiseJournal of the Association for Information Science and Technology10.1002/asi.2377668:6(1462-1479)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1002/asi.23776
Ping QHe JChen C(2017)How many ways to use CiteSpace? A study of user interactive events over 14 monthsJournal of the Association for Information Science and Technology10.1002/asi.2377068:5(1234-1256)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1002/asi.23770
Show More Cited By

Modeling actions of PubMed users with n-gram language models
1. Information systems

Recommendations

Evaluation of phrasal query suggestions
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

This paper evaluates the uptake and efficacy of a unified approach to phrasal query suggestions in the context of a high-precision search engine. The search engine performs ranked extended-Boolean searches with the proximity operator <scp>NEAR</scp> ...
Implementing and evaluating phrasal query suggestions for proximity search

This paper describes and evaluates a unified approach to phrasal query suggestions in the context of a high-precision search engine. The search engine performs ranked extended-Boolean searches with the proximity operator near being the default ...
Web searcher interaction with the Dogpile.com metasearch engine

Metasearch engines are an intuitive method for improving the performance of Web search by increasing coverage, returning large numbers of results with a focus on relevance, and presenting alternative views of information needs. However, the use of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Retrieval

Information Retrieval Volume 12, Issue 4

Aug 2009

72 pages

ISSN:1386-4564

Issue’s Table of Contents

Copyright © Copyright © 2009 Springer Science+Business Media, LLC.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 August 2009

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li FDu LFu QHan SDu YLu GLi ZChua TLauw HSi LTerzi ETsaparas P(2023)DIGMN: Dynamic Intent Guided Meta Network for Differentiated User Engagement Forecasting in Online Professional Social PlatformsProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570420(384-392)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3539597.3570420
Smith C(2017)Domain-independent search expertiseJournal of the Association for Information Science and Technology10.1002/asi.2377668:6(1462-1479)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1002/asi.23776
Ping QHe JChen C(2017)How many ways to use CiteSpace? A study of user interactive events over 14 monthsJournal of the Association for Information Science and Technology10.1002/asi.2377068:5(1234-1256)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1002/asi.23770
Smith C(2015)Domain-independent search expertiseJournal of the Association for Information Science and Technology10.1002/asi.2327266:7(1388-1405)Online publication date: 1-Jul-2015
https://dl.acm.org/doi/10.1002/asi.23272
Clarke CFreund LSmucker MYilmaz E(2013)Report on the SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation (MUBE 2013)ACM SIGIR Forum10.1145/2568388.256840347:2(84-95)Online publication date: 21-Jan-2013
https://dl.acm.org/doi/10.1145/2568388.2568403
Lee GLin JLiu CLorek ARyaboy D(2012)The unified logging infrastructure for data analytics at TwitterProceedings of the VLDB Endowment10.14778/2367502.23675165:12(1771-1780)Online publication date: 1-Aug-2012
https://dl.acm.org/doi/10.14778/2367502.2367516
Saastamoinen MKumpulainen SJärvelin KKamps JKraaij WFuhr N(2012)Task complexity and information searching in administrative tasks revisitedProceedings of the 4th Information Interaction in Context Symposium10.1145/2362724.2362759(204-213)Online publication date: 21-Aug-2012
https://dl.acm.org/doi/10.1145/2362724.2362759
Kumpulainen SJärvelin K(2012)Barriers to task-based information access in molecular medicineJournal of the American Society for Information Science and Technology10.1002/asi.2167263:1(86-97)Online publication date: 1-Jan-2012
https://dl.acm.org/doi/10.1002/asi.21672
Kumpulainen SJärvelin KBelkin NKelly D(2010)Information interaction in molecular medicineProceedings of the third symposium on Information interaction in context10.1145/1840784.1840800(95-104)Online publication date: 18-Aug-2010
https://dl.acm.org/doi/10.1145/1840784.1840800
Huggett MCrestani FMarchand-Maillet SChen HEfthimiadis ESavoy J(2010)Agro-GatorProceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval10.1145/1835449.1835573(706-706)Online publication date: 19-Jul-2010
https://dl.acm.org/doi/10.1145/1835449.1835573
Show More Cited By

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents