Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Application of automatic topic identification on Excite Web search engine data logs

Published: 01 September 2005 Publication History

Abstract

The analysis of contextual information in search engine query logs enhances the understanding of Web users' search patterns. Obtaining contextual information on Web search engine logs is a difficult task, since users submit few number of queries, and search multiple topics. Identification of topic changes within a search session is an important branch of search engine user behavior analysis. The purpose of this study is to investigate the properties of a specific topic identification methodology in detail, and to test its validity. The topic identification algorithm's performance becomes doubtful in various cases. These cases are explored and the reasons underlying the inconsistent performance of automatic topic identification are investigated with statistical analysis and experimental design techniques.

References

[1]
Context learning in Okapi. Journal of Documentation. v53 i1. 80-83.
[2]
He, D., & Goker, A. (2000). Detecting session boundaries from Web user logs. In Proceedings of the BCS-IRSG 22nd annual colloquium on information retrieval research, Cambridge, UK (pp. 57-66)
[3]
Combining evidence for automatic Web session identification. Information Processing and Management. v38 i5. 727-742.
[4]
Cognitive perspectives of information retrieval interaction: elements of a cognitive IR theory. Journal of Documentation. v52 i1. 3-50.
[5]
Miwa, M. (2001). User situations and multiple levels of users goals in information problem solving processes of Ask ERIC users. In Proceedings of the 2001 Annual Meeting of the American Society for Information Sciences and Technology, 38, 355-371
[6]
Ozmutlu, S., & Harmonosky, C. M. (2003). A real time methodology for minimizing mean flowtime in FMSs with routing flexibility: threshold based Alternate Routing. In 5th EURO/INFORMS joint international meeting, 6-10 July, Istanbul, Turkey
[7]
Ozmutlu, S., & Harmonosky, C. M. (in press-a). A real-time methodology for minimizing mean flowtime in FMSs with routing flexibility: threshold-based alternate routing. European Journal of Operational Research
[8]
Ozmutlu, S., & Harmonosky, C. M. (in press-b). A real-time methodology for minimizing mean flowtime in FMSs with machine breakdowns: threshold-based selective rerouting. International Journal of Production Research
[9]
Analysis of large data logs: an application of Poisson sampling on Excite Web queries. Information Processing and Management. v38 i4. 473-490.
[10]
Ozmutlu, S., Ozmutlu, H. C., & Spink, A. (2002b). Multimedia Web searching. In ASIST 2002: proceedings of the 65th American Society of Information Science and Technology annual meeting, Philadephia, November 2002 (pp. 403-408)
[11]
Ozmutlu, S., Spink, A., & Ozmutlu, H. C. (2002c). Trends in multimedia Web searching: Excite queries. In IEEE ITCC 2002: proceedings of the international conference on information technology: coding and computing, 8-10 April 2002, Las Vegas (pp. 40-45)
[12]
Ozmutlu, S., Ozmutlu, H. C., & Spink, A. (2003a). Multitasking Web searching and implications for design. In ASIST 2003, Annual Meeting of the American Society for Information Science and Technology, 19-22 October, Long Beach, CA (pp. 416-421)
[13]
Multimedia Web searching trends: 1997-2001. Information Processing and Management. v39 i4. 611-621.
[14]
Are people asking questions of general Web search engines. Online Information Review. v27 i6. 396-406.
[15]
Ozmutlu, S., Spink, A., & Ozmutlu, H. C. (2003d). A study of multitasking Web search. In IEEE ITCC 2003: proceedings of the international conference on information technology: coding and computing, 28-30 April, Las Vegas (pp. 145-149)
[16]
A day in the life of Web searching: an exploratory study. Information Processing and Management. v40 i2. 319-345.
[17]
A mathematical theory of evidence. Princeton University Press, Princeton, NJ.
[18]
Analysis of a very large Web search engine query log. ACM SIGIR Forum. v33 i3.
[19]
Spink, A. (1998). Toward a theoretical framework for information retrieval (IR) within an information seeking context. In Proceedings of the 2nd international information seeking in context conference, 12-15 August 1998, Sheffield. UK: University of Sheffield, Department of Information Studies
[20]
Searching heterogeneous collections on the Web: a survey of Excite users. Internet Research: Electronic Networking Applications and Policy. v9 i2. 117-128.
[21]
Use of query reformulation and relevance feedback by Web users. Internet Research: Electronic Networking Applications and Policy. v10 i4. 317-328.
[22]
From e-sex to e-commerce: Web search changes. IEEE Computer. v35 i3. 133-135.
[23]
Multitasking information seeking and searching processes. Journal of the American Society for Information Science and Technology. v53 i8. 639-652.
[24]
US versus European Web searching trends. ACM SIGIR (Special Interest Group in Information Retrieval) Forum. v36 i2.
[25]
Spink, A., Ozmutlu, S., & Ozmutlu, H. C. (2003). A study on question format Web queries from the fast search engine. In 5th EURO/INFORMS joint international meeting, 6-10 July, Istanbul, Turkey
[26]
Searching the Web: the public and their queries. Journal of the American Society for Information Science and Technology. v53 i2. 226-234.
[27]
The production of 'context' in information seeking research: a metatheoretical view. Information Processing and Management. v35 i6. 751-763.
[28]
A theory of the task-based information retrieval process: a summary and generalization of a longitudinal study. Journal of Documentation. v57 i1. 44-60.
[29]
An evaluation of statistical approaches to text categorization. Information Retrieval. v1 i1-2. 67-88.

Cited By

View all
  • (2022)Analyzing the generalizability of the network-based topic emergence identification methodSemantic Web10.3233/SW-21295113:3(423-439)Online publication date: 1-Jan-2022
  • (2020)What Can Task Teach Us About Query Reformulations?Advances in Information Retrieval10.1007/978-3-030-45439-5_42(636-650)Online publication date: 14-Apr-2020
  • (2018)Impact of Domain and User's Learning Phase on Task and Session Identification in Smart Speaker Intelligent AssistantsProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271803(1193-1202)Online publication date: 17-Oct-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Processing and Management: an International Journal
Information Processing and Management: an International Journal  Volume 41, Issue 5
September 2005
313 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 September 2005

Author Tags

  1. Dempster-Shafer Theory
  2. Genetic algorithm
  3. Search engine
  4. Session identification
  5. Topic identification

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Analyzing the generalizability of the network-based topic emergence identification methodSemantic Web10.3233/SW-21295113:3(423-439)Online publication date: 1-Jan-2022
  • (2020)What Can Task Teach Us About Query Reformulations?Advances in Information Retrieval10.1007/978-3-030-45439-5_42(636-650)Online publication date: 14-Apr-2020
  • (2018)Impact of Domain and User's Learning Phase on Task and Session Identification in Smart Speaker Intelligent AssistantsProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271803(1193-1202)Online publication date: 17-Oct-2018
  • (2014)Character n-gram application for automatic new topic identificationInformation Processing and Management: an International Journal10.1016/j.ipm.2014.06.00550:6(821-856)Online publication date: 1-Nov-2014
  • (2013)Discovering tasks from search engine query logsACM Transactions on Information Systems10.1145/2493175.249317931:3(1-43)Online publication date: 5-Aug-2013
  • (2011)Identifying task-based sessions in search engine query logsProceedings of the fourth ACM international conference on Web search and data mining10.1145/1935826.1935875(277-286)Online publication date: 9-Feb-2011
  • (2010)Parallel browsing behavior on the webProceedings of the 21st ACM conference on Hypertext and hypermedia10.1145/1810617.1810622(13-18)Online publication date: 13-Jun-2010
  • (2010)Identifying the optimal set of parameters for new topic identification through experimental designExpert Systems with Applications: An International Journal10.1016/j.eswa.2010.04.04037:12(7947-7968)Online publication date: 1-Dec-2010
  • (2008)Beyond the session timeoutProceedings of the 17th ACM conference on Information and knowledge management10.1145/1458082.1458176(699-708)Online publication date: 26-Oct-2008
  • (2007)Using Monte-Carlo simulation for automatic new topic identification of search engine transaction logsProceedings of the 39th conference on Winter simulation: 40 years! The best is yet to come10.5555/1351542.1351950(2306-2314)Online publication date: 9-Dec-2007
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media