article

Adapting pivoted document-length normalization for query size: Experiments in Chinese and English

Authors:

Tze Leung Chung,

Robert Wing Pong Luk,

Kam Fai Wong,

Kui Lam Kwok,

Dik Lun LeeAuthors Info & Claims

ACM Transactions on Asian Language Information Processing (TALIP), Volume 5, Issue 3

Pages 245 - 263

https://doi.org/10.1145/1194936.1194941

Published: 01 September 2006 Publication History

Get Access

Abstract

The vector space model (VSM) is one of the most widely used information retrieval (IR) models in both academia and industry. It was less effective at the Chinese ad hoc retrieval tasks than other retrieval models in the NTCIR-3 evaluation workshop, but comparable to those in the NTCIR-4 and NTCIR-5 workshops. We do not know whether the lower level performance was due to the VSM's inherent deficiencies or to a less effective normalization of document length. Hence we evaluated the VSM with various pivoted normalizations of document length using the NTCIR-3 collection for confirmation. We found that VSM's retrieval effectiveness with pivoted normalization was comparable to other competitive retrieval models (for example, 2-Poisson), and that VSM's retrieval speed with pivoted normalization was similar to competitive retrieval models (2-Poisson). We proposed a novel adaptive scheme that automatically estimates the (near) best parameters for pivoted document-length normalization based on query size; the new normalization is called adaptive pivoted document-length normalization. This scheme achieved good retrieval effectiveness, sometimes for short (title) queries and sometimes for long queries, without manually adjusting parameter values. We found that unique, adaptive pivoted normalization can enhance fixed pivoted normalizations for different test collections (TREC-5 and TREC-6). We also evaluated the VSM with the adaptive pivoted normalization using the pseudo-relevance feedback (PRF) and found that this type of VSM performs similarly to the competitive retrieval models (2-Poisson) with PRF. Hence, we conclude that the VSM with unique (adaptive) pivoted document-length normalization is effective for Chinese IR and that its retrieval effectiveness is comparable to that of other competitive retrieval models with or without PRF for the reference test collections used in this evaluation.

References

[1]

Abdou, S. and Savoy, J. 2005. Report on CLIR task for the NTCIR-5 evaluation campaign. In Proceedings of the Fifth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (National Center of Sciences, Tokyo, Dec.), N. Kando and M. Takaku, eds. Nihon Printing, Tokyo, 44--51.

Abstract

References

Cited By

Index Terms

Recommendations

Pivoted Document Length Normalization

A comparison of Chinese document indexing strategies and retrieval models

Pivoted Document Length Normalization

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations