Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

GTE-Rank

Published: 01 March 2016 Publication History

Abstract

We propose a novel temporal re-ranking algorithm.We devise and provide new datasets for time-sensitive evaluation purposes.We conduct comparative experiments (including algorithms with a temporal focus).We investigate the effectiveness of GRank by running a crowdsourcing experiment.We build a prototype system that can be tested by the research community. In the web environment, most of the queries issued by users are implicit by nature. Inferring the different temporal intents of this type of query enhances the overall temporal part of the web search results. Previous works tackling this problem usually focused on news queries, where the retrieval of the most recent results related to the query are usually sufficient to meet the user's information needs. However, few works have studied the importance of time in queries such as Philip Seymour Hoffman where the results may require no recency at all. In this work, we focus on this type of queries named time-sensitive queries where the results are preferably from a diversified time span, not necessarily the most recent one. Unlike related work, we follow a content-based approach to identify the most important time periods of the query and integrate time into a re-ranking model to boost the retrieval of documents whose contents match the query time period. For that purpose, we define a linear combination of topical and temporal scores, which reflects the relevance of any web document both in the topical and temporal dimensions, thus contributing to improve the effectiveness of the ranked results across different types of queries. Our approach relies on a novel temporal similarity measure that is capable of determining the most important dates for a query, while filtering out the non-relevant ones. Through extensive experimental evaluation over web corpora, we show that our model offers promising results compared to baseline approaches. As a result of our investigation, we publicly provide a set of web services and a web search interface so that the system can be graphically explored by the research community.

References

[1]
O. Alonso, R. Baeza-Yates, M. Gertz, Effectiveness of temporal snippets, in: Proceedings of the workshop on web search result summarization and presentation (WSSP) associated to the 18th international world wide web conference (WWW), ACM Press, 2009.
[2]
O. Alonso, M. Gertz, R. Baeza-Yates, Enhancing document snippets using temporal information, in: Proceedings of the18th international symposium on string processing and information retrieval (SPIRE) (Lecture notes in computer science), Springer, Berlin/Heidelberg, Pisa, Italy, 2011, pp. 26-31.
[3]
G. Amati, Probabilistic models for information retrieval based on divergence from randomness, School of Computing Science, University of Glasgow, Scotland, UK, 2003.
[4]
K. Berberich, M. Vazirgiannis, G. Weikum, Time-aware authority ranking, Internet Mathematics, 2 (2005) 301-332.
[5]
K. Berberich, S. Bedathur, O. Alonso, G. Weikum, A language modeling approach for temporal information needs, in: Proceedings of the 32nd European conference on information retrieval (ECIR) (Lecture notes in computer science research and advanced technology for digital libraries), Springer-Verlag, 2010, pp. 13-25.
[6]
K. Berberich, S. Bedathur, Temporal diversification of search results, in: Proceedings of the workshop on time-aware information access (TAIA) associated to the 36th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR), 2013.
[7]
J. Callan, A. Moffat, Panel on use of proprietary data, ACM SIGIR Forum, 46 (2012) 10-18.
[8]
R. Campos, G. Dias, A.M. Jorge, What is the temporal value of web snippets?, in: Proceedings of the 1st international temporal web analytics workshop (TWAW) associated to the 20th international world wide web conference (WWW), 2011, pp. 9-16.
[9]
R. Campos, G. Dias, A.M. Jorge, C. Nunes, GTE: a distributional second-order co-occurrence approach to improve the identification of top relevant dates, in: Proceedings of the 21st international conference on knowledge and information management (CIKM), ACM Press, 2012, pp. 2035-2039.
[10]
R. Campos, G. Dias, A.M. Jorge, C. Nunes, GTE-Cluster: a temporal search interface for implicit temporal queries, in: Proceedings of the 36th European conference on information retrieval (ECIR ) (Lecture notes in computer science advances in information retrieval, 8416/2014), Springer-Verlag, 2014, pp. 775-779.
[11]
R. Campos, G. Dias, A.M. Jorge, A. Jatowt, Survey of temporal information retrieval and related applications, ACM Computing Surveys, 47 (2014) 1-41.
[12]
R. Campos, G. Dias, A.M. Jorge, C. Nunes, GTE-Rank: searching for implicit temporal query results, in: Proceedings of 23rd ACM international conference on information and knowledge management (CIKM), ACM Press, 2014, pp. 2081.
[13]
A. Chang, C. Manning, SUTIME: a library for recognizing and normalizing time expressions, in: Proceedings of the 8th international conference on language resources and evaluation (LREC), 2012.
[14]
P.-T. Chang, Y-C. Huang, C.-L. Yang, S-D. Lin, P-J. Cheng, Learning-based time-sensitive re-ranking for web search, in: Proceedings of the 35th annual international ACM conference on research and development in information retrieval (SIGIR), ACM Press, 2012, pp. 1101-1102.
[15]
S. Cheng, A. Arvanitis, V. Hristidis, How fresh do you want your search results?, in: Proceedings of the 22nd international conference on knowledge and information management (CIKM), ACM Press, 2013, pp. 1271-1280.
[16]
K.W. Church, P. Hanks, Word association norms mutual information and lexicography, Computational Linguistics, 16 (1990) 23-29.
[17]
W.B. Croft, D. Metzler, T. Strohman, Search engines: information retrieval in practice, Addison Wesley, 2009.
[18]
N. Dai, M. Shokouhi, B.D. Davison, Learning to rank for freshness and relevance, in: Proceedings of the 34th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR), ACM Press, 2011, pp. 95-104.
[19]
W. Dakka, L. Gravano, P.G. Ipeirotis, Answering general time sensitive queries, IEEE Transactions on Knowledge and Data Engineering, 24 (2012) 220-235.
[20]
G. Dias, E. Alves, J. Lopes, Topic segmentation algorithms for text summarization and passage retrieval: an exhaustive evaluation, in: Proceedings of the 22th conference on artificial intelligence (AAAI), AAAI Press, 2007, pp. 1334-1340.
[21]
L.R. Dice, Measures of the amount of ecologic association between species, Ecological Society of America, 26 (1945) 297-302.
[22]
A. Dong, Y. Chang, Z. Zheng, G. Mishne, J. Bai, R. Zhang, Towards recency ranking in web search, in: Proceedings of the 3rd ACM international conference on web search and data mining (WSDM), ACM Press, 2010, pp. 11-20.
[23]
M. Efron, G. Golovchinsky, Estimation methods for ranking recent information, in: Proceedings of the 34th annual international ACM SIGIR conference on research and development in information retrieval, ACM Press, 2011, pp. 495-504.
[24]
J.L. Fleiss, Measuring nominal scale agreement among many raters, Psychological Bulletin, 76 (1971) 378-382.
[25]
F. Gey, R. Larson, J. Machado, M. Yoshioka, NTCIR9-GeoTime overview evaluating geographic and temporal search: round 2, in: Proceedings of the 9th NTCIR workshop (NTCIR-9), 2011, pp. 9-17.
[26]
F. Gey, R. Larson, N. Kando, J. Machado, T. Sakai, NTCIR-GeoTime overview: evaluating geographic and temporal search, in: Proceedings of the 8th NTCIR workshop (NTCIR-8), 2010, pp. 147-153.
[27]
Q. Guo, F. Diaz, E. Yom-Tov, Updating users about time critical events, Advances in Information Retrieval (Lecture Notes in Computer Science), 7814 (2013) 483-494.
[28]
D. Hiemstra, Using language models for information retrieval, Centre for Telematics and Information Technology, University of Twente, Netherlands, 2001.
[29]
H. Joho, A. Jatowt, R. Blanco, NTCIR temporalia: a test collection for temporal information access research, in: Proceedings of the 4th temporal web analytics workshop (TempWeb4) associated to the 23rd international world wide web conference (WWW), International World Wide Web Conferences Steering Committee, 2014, pp. 845-849.
[30]
R. Jones, F. Diaz, Temporal profiles of queries, ACM Transactions on Information Systems, 25 (2007).
[31]
N. Kanhabua, K. Nrvg, Determining time of queries for re-ranking search results, in: Proceedings of 14th European conference on digital libraries (ECDL), 2010, pp. 261-272.
[32]
N. Kanhabua, K. Nrvg, Learning to rank search results for time-sensitive queries, in: Proceedings of the 21st international conference on knowledge and information management (CIKM), ACM Press, 2012, pp. 2463-2466.
[33]
M.G. Kendall, A new measure of rank correlation, Biometrika, 30 (1938) 81-93.
[34]
R. Kumar, S. Vassilvitskii, Generalized distances between rankings, in: Proceedings of the 19th international world wide web conference (WWW), ACM Press, 2010, pp. 571-579.
[35]
X. Li, W.B. Croft, Time-based language models, in: Proceedings of the 12th international conference on knowledge and information management (CIKM), ACM Press, 2003, pp. 469-475.
[36]
D. Machado, T. Barbosa, S. Pais, B. Martins, G. Dias, Universal mobile information retrieval, in: Proceedings of the 13th international conference on humancomputer interaction (HCII), 2009, pp. 345-354.
[37]
D. Metzler, R. Jones, F. Peng, R. Zhang, Improving search relevance for implicitly temporal queries, in: Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval, ACM Press, 2009, pp. 700-701.
[38]
S. Nunes, C. Ribeiro, G. David, Using neighbors to date web documents, in: Proceedings of the 9th ACM international workshop on web information and data management (WIDM) associated to the 16th international conference on knowledge and information management (CIKM), ACM Press, 2007, pp. 129-136.
[39]
S.E. Robertson, E. Stephen, S. Walker, S. Jones, M. Hancock-Beaulieu, M. Gatford, Okapi at TREC-3, in: Proceedings of the third text retrieval conference (TREC), 1994, pp. 109-126.
[40]
U. Scaiella, P. Ferragina, A. Marino, M. Ciaramita, Topical clustering of search results, in: Proceedings of the 5th ACM international conference on web search and data mining (WSDM), ACM Press, 2012, pp. 223-232.
[41]
J.F. Silva, G. Dias, S. Guillor, J.G. Pereira, Using LocalMaxs algorithm for the extraction of contiguous and non-contiguous multiword lexical units, in: Proceedings of the 9th Portuguese conference in artificial (EPIA), 1999, pp. 21-24.
[42]
K. Sprck Jones, A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, 28 (1972) 11-21.
[43]
K. Sprck Jones, C.J.K. Van Rijsbergen, Report on the need for and provision of an Ideal information retrieval test collection, University Computer Laboratory, Cambridge, 1975.
[44]
K. Sprck Jones, R.G. Bates, Report on a design study for the Ideal information retrievaltest collection, University Computer Laboratory, Cambridge, 1977.
[45]
K. Sprck Jones, S. Walker, S.E. Robertson, A probabilistic model of information retrieval: development and comparative experiments, Information Processing and Management: An International Journal, 36 (2000) 779-840.
[46]
C. Spearman, The proof and measurement of association between two things. By C. Spearman, 1904, The American Journal of Psychology, 100 (1987) 441-471.
[47]
J. Strtgen, M. Gertz, HeidelTime: high quality rule-based extraction and normalization of temporal expressions, in: Proceedings of the 5th international workshop on semantic evaluation (IWSE) associated to the 41th annual meeting of the association for computational linguistics (ACL), 2010, pp. 321-324.
[48]
A. Styskin, F. Romanenko, F. Vorobyev, P. Serdyukov, Recency ranking by diversification of result set, in: Proceedings of 20th international conference on knowledge and information management (CIKM), ACM Press, 2011, pp. 1949-1952.
[49]
R. Zhang, Y. Chang, Z. Zheng, D. Metzler, J-Y. Nie, Search result re-ranking by feedback control adjustment for time-sensitive query, in: Proceedings of the North American chapter of the association for computational linguistics human language technologies (NAACL), 2009, pp. 165-168.
[50]
J. Zobek, How reliable are the results of large-scale retrieval experiments?, in: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, ACM Press, 1998, pp. 307-314.

Cited By

View all
  • (2024)Is this news article still relevant? Ranking by contemporary relevance in archival searchInternational Journal on Digital Libraries10.1007/s00799-023-00377-y25:2(197-216)Online publication date: 1-Jun-2024
  • (2023)A survey on narrative extraction from textual dataArtificial Intelligence Review10.1007/s10462-022-10338-756:8(8393-8435)Online publication date: 1-Aug-2023
  • (2022)Ranking Models for the Temporal Dimension of TextACM Transactions on Information Systems10.1145/356548141:2(1-34)Online publication date: 4-Oct-2022
  • Show More Cited By

Index Terms

  1. GTE-Rank
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Information Processing and Management: an International Journal
    Information Processing and Management: an International Journal  Volume 52, Issue 2
    March 2016
    186 pages

    Publisher

    Pergamon Press, Inc.

    United States

    Publication History

    Published: 01 March 2016

    Author Tags

    1. Temporal information retrieval
    2. Temporal query understanding
    3. Temporal re-ranking
    4. Time-sensitive queries

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Is this news article still relevant? Ranking by contemporary relevance in archival searchInternational Journal on Digital Libraries10.1007/s00799-023-00377-y25:2(197-216)Online publication date: 1-Jun-2024
    • (2023)A survey on narrative extraction from textual dataArtificial Intelligence Review10.1007/s10462-022-10338-756:8(8393-8435)Online publication date: 1-Aug-2023
    • (2022)Ranking Models for the Temporal Dimension of TextACM Transactions on Information Systems10.1145/356548141:2(1-34)Online publication date: 4-Oct-2022
    • (2022)Semantic Modelling of Document Focus-Time for Temporal Information RetrievalCompanion Proceedings of the Web Conference 202210.1145/3487553.3524668(896-902)Online publication date: 25-Apr-2022
    • (2021)Estimating Contemporary Relevance of Past NewsProceedings of the 2021 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL52503.2021.00019(70-79)Online publication date: 27-Sep-2021
    • (2020)A Framework for Event-oriented Text Retrieval Based on Temporal AspectsProceedings of the 2020 12th International Conference on Machine Learning and Computing10.1145/3383972.3384051(39-46)Online publication date: 15-Feb-2020
    • (2018)Understanding the use of Temporal Expressions on Persian Web SearchCompanion Proceedings of the The Web Conference 201810.1145/3184558.3191635(1743-1748)Online publication date: 23-Apr-2018
    • (2017)Towards Efficient Framework for Time-Aware Spatial Keyword Queries on Road NetworksACM Transactions on Information Systems (TOIS)10.1145/314380236:3(1-48)Online publication date: 3-Nov-2017
    • (2017)Identifying top relevant dates for implicit time sensitive queriesInformation Retrieval10.1007/s10791-017-9302-120:4(363-398)Online publication date: 1-Aug-2017

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media