research-article

Exploiting real-time information retrieval in the microblogosphere

Authors:

Jianwu YangAuthors Info & Claims

JCDL '12: Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries

Pages 267 - 276

https://doi.org/10.1145/2232817.2232867

Published: 10 June 2012 Publication History

Abstract

Information seeking behavior in microblogging environments such as Twitter differs from traditional web search. The best performing microblog retrieval techniques attempt to utilize both semantic and temporal aspects of documents. In this paper, we present an effective approach, including the query modeling, the document modeling and the temporal re-ranking, to discover the most recent but relevant information to the query. For the query modeling, we introduce a two-stage pseudo-relevance feedback query expansion to overcome the severe vocabulary-mismatch problem of short message retrieval in microblog. For the document modeling, we propose two ways to expand document with the help of the shortened URL. For the temporal re-ranking, we suggest several methods to evaluate the temporal aspects of documents. Experimental results demonstrate that our approach obtains significant improvements compared with baseline systems. Specifically, the proposed system gives 26.37% and 9.94% further increases in P@30 and MAP over the best performing result on highrel in the TREC'11 Real-Time Search Task.

References

[1]

J. Allan, M. E. Connell, W. B. Croft, F. Feng, D. Fisher, and X. Li. Inquery and trec-9. In TREC, 2000.

[2]

C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART:textscTrec 3. In D. K. Harman, editor, Overview of the 3th Text REtrieval ConferencetextscTrec-3, pages 69--80, Gaithersburg, 1995. NIST.

[3]

C. C. Chen, Y.-T. Chen, Y. S. Sun, and M. C. Chen. Life cycle modeling of news events using aging theory. In N. Lavrac, D. Gamberger, L. Todorovski, and H. Blockeel, editors, ECML, volume 2837 of Lecture Notes in Computer Science, pages 47--59. Springer, 2003.

[4]

K.-Y. Chen, L. Luesukprasert, and S. cho Timothy Chou. Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Trans. Knowl. Data Eng., 19(8):1016--1025, 2007.

Digital Library

[5]

W. Dakka, L. Gravano, and P. G. Ipeirotis. Answering general time sensitive queries. In J. G. Shanahan, S. Amer-Yahia, I. Manolescu, Y. Zhang, D. A. Evans, A. Kolcz, K.-S. Choi, and A. Chowdhury, editors, CIKM, pages 1437--1438. ACM, 2008.

Digital Library

[6]

A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39, 1977.

[7]

A. Dong, R. Zhang, P. Kolari, J. Bai, F. Diaz, Y. Chang, Z. Zheng, and H. Zha. Time is of the essence: improving recency ranking using twitter data. In M. Rappa, P. Jones, J. Freire, and S. Chakrabarti, editors, WWW, pages 331--340. ACM, 2010.

Digital Library

[8]

R. T. Fernández, D. E. Losada, and L. Azzopardi. Extending the language modeling framework for sentence retrieval to include local context. Inf Retr., 14(4):355--389, 2011.

Digital Library

[9]

B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Micro-blogging as online word of mouth branding. In D. R. O. Jr., R. B. Arthur, K. Hinckley, M. R. Morris, S. E. Hudson, and S. Greenberg, editors, CHI Extended Abstracts, pages 3859--3864. ACM, 2009.

Digital Library

[10]

A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56--65. ACM, 2007.

Digital Library

[11]

V. Lavrenko and W. B. Croft. Relevance-based language models. In Proceedings of SIGIR, pages 120--127, 2001.

Digital Library

[12]

X. Li and W. B. Croft. Time-based language models. In CIKM, pages 469--475. ACM, 2003.

Digital Library

[13]

D. E. Losada and R. T. Fernández. Highly frequent terms and sentence retrieval. In N. Ziviani and R. A. Baeza-Yates, editors, SPIRE, volume 4726 of Lecture Notes in Computer Science, pages 217--228. Springer, 2007.

Digital Library

[14]

Y. Lv and C. Zhai. A comparative study of methods for estimating query language models with pseudo feedback. In D. W.-L. Cheung, I.-Y. Song, W. W. Chu, X. Hu, and J. J. Lin, editors, CIKM, pages 1895--1898. ACM, 2009.

Digital Library

[15]

Y. Lv and C. Zhai. Positional relevance model for pseudo-relevance feedback. In F. Crestani, S. Marchand-Maillet, H.-H. Chen, E. N. Efthimiadis, and J. Savoy, editors, SIGIR, pages 579--586. ACM, 2010.

Digital Library

[16]

V. Murdock. Aspects of sentence retrieval. SIGIR Forum, 41(2):127, 2007.

Digital Library

[17]

I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. Overview of the TREC-2011 Microblog Track. In Proceedings of TREC 2011, 2012.

[18]

J. Pontin. From many tweets, one loud voice on the Internet. New York Times Online {web site}. Retrieved May, 8:2006, 2007.

[19]

S. E. Robertson and K. S. Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129--146, 1976.

[20]

S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In TREC'94, pages 109--126, 1994.

[21]

J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System: experiments in automatic document processing, pages 313--323. Prentice Hall, 1971.

Digital Library

[22]

G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41:288--297, 1990.

[23]

T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo-relevance feedback. In E. N. Efthimiadis, S. T. Dumais, D. Hawking, and K. Jarvelin, editors, SIGIR, pages 162--169. ACM, 2006.

Digital Library

[24]

C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004.

Digital Library

[25]

C. Zhai and J. D. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM, pages 403--410. ACM, 2001.

Digital Library

Cited By

Wang YWang ZZhang HLiu Z(2023)Microblog Retrieval Based on Concept-Enhanced Pre-Training ModelACM Transactions on Knowledge Discovery from Data10.1145/355231117:3(1-32)Online publication date: 22-Feb-2023
https://dl.acm.org/doi/10.1145/3552311
Wang YHuang HFeng C(2021)Query Expansion With Local Conceptual Word Embeddings in Microblog RetrievalIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.294576433:4(1737-1749)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TKDE.2019.2945764
Zobaed SSalehi MBuyya R(2021)SAED: Edge-Based Intelligence for Privacy-Preserving Enterprise Search on the Cloud2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid51090.2021.00046(366-375)Online publication date: May-2021
https://doi.org/10.1109/CCGrid51090.2021.00046
Show More Cited By

Index Terms

Exploiting real-time information retrieval in the microblogosphere
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

A unified relevance model for opinion retrieval
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Representing the information need is the greatest challenge for opinion retrieval. Typical queries for opinion retrieval are composed of either just content words, or content words with a small number of cue "opinion" words. Both are inadequate for ...
Query Representation through Lexical Association for Information Retrieval

A user query for information retrieval (IR) applications may not contain the most appropriate terms (words) as actually intended by the user. This is usually referred to as the term mismatch problem and is a crucial research issue in IR. Using the ...
An empirical study of query expansion and cluster-based retrieval in language modeling approach
Special issue: AIRS2005: Information retrieval research in Asia

The term mismatch problem in information retrieval is a critical problem, and several techniques have been developed, such as query expansion, cluster-based retrieval and dimensionality reduction to resolve this issue. Of these techniques, this paper ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

JCDL '12: Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries

June 2012

458 pages

ISBN:9781450311540

DOI:10.1145/2232817

General Chairs:
Karim B. Boughida
The George Washington University, USA
,
Barrie Howard
The Library of Congress, USA
,
Program Chairs:
Michael L. Nelson
Old Dominion University, USA
,
Herbert Van de Sompel
Los Alamos National Laboratory, USA
,
Ingeborg Sølvberg
Norwegian University of Science & Technology, Norway

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

JCDL '12

Sponsor:

JCDL '12: 12th ACM/IEEE-CS Joint Conference on Digital Libraries

June 10 - 14, 2012

DC, Washington, USA

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
362
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang YWang ZZhang HLiu Z(2023)Microblog Retrieval Based on Concept-Enhanced Pre-Training ModelACM Transactions on Knowledge Discovery from Data10.1145/355231117:3(1-32)Online publication date: 22-Feb-2023
https://dl.acm.org/doi/10.1145/3552311
Wang YHuang HFeng C(2021)Query Expansion With Local Conceptual Word Embeddings in Microblog RetrievalIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.294576433:4(1737-1749)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TKDE.2019.2945764
Zobaed SSalehi MBuyya R(2021)SAED: Edge-Based Intelligence for Privacy-Preserving Enterprise Search on the Cloud2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid51090.2021.00046(366-375)Online publication date: May-2021
https://doi.org/10.1109/CCGrid51090.2021.00046
Kataoka DTajima K(2018)SNS Retrieval Based on User Profile Estimation Using Transfer Learning from Web Search2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)10.1109/WI.2018.00-79(278-285)Online publication date: Dec-2018
https://doi.org/10.1109/WI.2018.00-79
Park JLee OJung J(2018)Spatio‐temporal query contextualization for microtext retrieval in social mediaConcurrency and Computation: Practice and Experience10.1002/cpe.445830:15Online publication date: 28-Feb-2018
https://doi.org/10.1002/cpe.4458
CHY AULLAH MAONO M(2017)Microblog Retrieval Using Ensemble of Feature Sets through Supervised Feature SelectionIEICE Transactions on Information and Systems10.1587/transinf.2016DAP0032E100.D:4(793-806)Online publication date: 2017
https://doi.org/10.1587/transinf.2016DAP0032
Kataoka DKato MYamamoto TOhshima HTanaka KSheth ANgonga AWang yChang EŚlęzak DFranczyk BAlt RTao X(2017)Context-aware relevance feedback over SNS graph dataProceedings of the International Conference on Web Intelligence10.1145/3106426.3106527(823-830)Online publication date: 23-Aug-2017
https://dl.acm.org/doi/10.1145/3106426.3106527
Wang YHuang HFeng CBarrett RCummings RAgichtein EGabrilovich E(2017)Query Expansion Based on a Feedback Concept Model for Microblog RetrievalProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052710(559-568)Online publication date: 3-Apr-2017
https://dl.acm.org/doi/10.1145/3038912.3052710
Albishre KLi YXu Y(2017)Effective pseudo-relevance for Microblog retrievalProceedings of the Australasian Computer Science Week Multiconference10.1145/3014812.3014865(1-6)Online publication date: 30-Jan-2017
https://dl.acm.org/doi/10.1145/3014812.3014865
Park JLee OHan JLee EJung JCarratore LPiccialli F(2017)Spatio-Temporal Contextualization of Queries for Microtexts in Social Media: Mathematical ModelingProcedia Computer Science10.1016/j.procs.2017.08.317113(525-530)Online publication date: 2017
https://doi.org/10.1016/j.procs.2017.08.317
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten