Abstract
Novelty detection is to retrieve new information and filter redundancy from given sentences that are relevant to a specific topic. In TREC2003, the authors tried an approach to novelty detection with semantic distance computation. The motivation is to expand a sentence by introducing semantic information. Computation on semantic distance between sentences incorporates WordNet with statistical information. The novelty detection is treated as a binary classification problem: new sentence or not. The feature vector, used in the vector space model for classification, consists of various factors, including the semantic distance from the sentence to the topic and the distance from the sentence to the previous relevant context occurring before it. New sentences are then detected with Winnow and support vector machine classifiers, respectively. Several experiments are conducted to survey the relationship between different factors and performance. It is proved that semantic computation is promising in novelty detection. The ratio of new sentence size to relevant size is further studied given different relevant document sizes. It is found that the ratio reduced with a certain speed (about 0.86). Then another group of experiments is performed supervised with the ratio. It is demonstrated that the ratio is helpful to improve the novelty detection performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ian Soboroff, Donna Harman. Overview of the TREC 2003 Novelty track. In Proc. the Twelfth Text Retrieval Conference, Gaithersburg, Maryland, November 18–21, 2003, p.38.
Zhang M, Song R, Lin C et al. Expansion-based technologies in finding relevant and new information: THU TREC2002 novelty track experiments. In Proc. the Eleventh Text Retrieval Conference, Gaithersburg, Maryland, November 19–22, 2002, p.591.
Christof Monz, Jaap Kamps, Maarten de Rijke. The University of Amsterdam at TREC2002. In Proc. the Eleventh Text Retrieval Conference, Gaithersburg, Maryland, November 19–22, 2002, p.603.
Leah S, James Allen, Magaret E, Alvaro B, Courtey W. UMass at TREC2002: Cross language and novelty tracks. In Proc. the Eleventh Text Retrieval Conference, Gaithersburg, Maryland, November 19–22, 2002,p.721.
Hong Qi, Jahna O, Dragomir R.The University of Michigan at TREC2002: Question answering and novelty tracks. In Proc. the Eleventh Text Retrieval Conference, Gaithersburg, Maryland, November 19–22, 2002, p.733.
Srikanth K, Yongmei S et al. UMBC at TREC12. In Proc. the Twelfth Text Retrieval Conference, Gaithersburg, Maryland, November 18–21, 2003, p.699.
Ganesh R, Kedar B, Chirag Shah, Deepa P. Generic text summarization using Wordnet for novelty and hard. In Proc. the Twelfth Text Retrieval Conference, Gaithersburg, Maryland, November 18–21, 2003, p.303.
Ryosuke Ohgaya, Akiyoshi Shimmura, Tomohiro Takagi. Meiji University Web and Novelty Track Experiments at TREC2003. In Proc. the Twelfth Text Retrieval Conference, Gaithersburg, Maryland, November 18–21, 2003, p.399.
Jian Sun, Wenfeng Pan, Huaping Zhang. TREC2003 novelty and web track at ICT. In Proc. the Twelfth Text Retrieval Conference, Gaithersburg, Maryland, Nov. 18–21, 2003, p.138.
Taoufiq D, Josiane M. TREC novelty track at IRIT–SIG. In Proc. the Twelfth Text Retrieval Conference, Gaithersburg, Maryland, November 18–21, 2003, p.337.
John M, Daniel M, Dianne P. From TREC to DUC to TREC again. In Proc. the Twelfth Text Retrieval Conference, Gaithersburg, Maryland, November 18–21, 2003, p.293.
Ming-Feng Tsai, Wen-Juan Hou, Chun-Yuan Teng et al. Similarity computation in novelty detection and GeneRIF annotation. In Proc. the Twelfth Text Retrieval Conference, Gaithersburg, Maryland, November 18–21, 2003, p.474.
Qianli Jin, Jun Zhao, Bo Xu. NLPR at TREC2003: Novelty and robust. In Proc. the Twelfth Text Retrieval Conference, Gaithersburg, Maryland, November 18–21, 2003, p.126.
Church K W, Hanks P. Word association norms, mutual information, and lexicography. In Proceedings the 27th Annual Meeting of the Association for Computational Linguistics, ACL27, 1989, pp.76–83.
Grefenstette G. Use of syntactic context to produce term association lists for text retrieval. In Proceedings the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, June 21–24, 1992, pp.89–97.
George A Miller. WordNet 2.0. http://www.cogsci.princeton.edu/~wn/, 2003.
Sujian Li, Jian Zhang, Xiong Huang, Shuo Bai.Semantic computation in Chinese question-answering system. Journal of Computer Science and Technology, 2002, 17(6): 1–7.
Qun Liu, Sujian Li. Lexical semantic similarity computation based on HowNet. Computational Linguistics and Chinese Language Processing, August 2002, 7(2): 59–76.
Jay J Jiang, David W Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In Proc. Int. Conf. Research on Computational Linguistics (ROCLING X), 1997.
Joachims T. Making Large-Scale SVM Learning Practical. Advances in Kernel Methods — Support Vector Learning, Schölkopf B, Burges C, SmolaA (eds.), MIT-Press, 1999.
Author information
Authors and Affiliations
Corresponding author
Additional information
Partly supported by the National Basic Research 973 Program of China under Grant No.2004CB318109, and by the National Center of Computer Network and Information Security Management of China under Grant No.2004-Research-1-917-A-007.
Rights and permissions
About this article
Cite this article
Zhang, HP., Sun, J., Wang, B. et al. Computation on Sentence Semantic Distance for Novelty Detection. J Comput Sci Technol 20, 331–337 (2005). https://doi.org/10.1007/s11390-005-0331-7
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/s11390-005-0331-7