Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1708124.1708131dlproceedingsArticle/Chapter ViewAbstractPublication PagestextgraphsConference Proceedingsconference-collections
research-article
Free access

Random walks for text semantic similarity

Published: 07 August 2009 Publication History

Abstract

Many tasks in NLP stand to benefit from robust measures of semantic similarity for units above the level of individual words. Rich semantic resources such as WordNet provide local semantic information at the lexical level. However, effectively combining this information to compute scores for phrases or sentences is an open problem. Our algorithm aggregates local relatedness information via a random walk over a graph constructed from an underlying lexical resource. The stationary distribution of the graph walk forms a "semantic signature" that can be compared to another such distribution to get a relat-edness score for texts. On a paraphrase recognition task, the algorithm achieves an 18.5% relative reduction in error rate over a vector-space baseline. We also show that the graph walk similarity between texts has complementary value as a feature for recognizing textual entailment, improving on a competitive baseline system.

References

[1]
E. Agirre and A. Soroa. 2009. Personalizing pagerank for word sense disambiguation. In EACL, Athens, Greece.
[2]
R. Bar-Haim, I. Dagan, B. Dolan, L. Ferro, D. Giampiccolo, B. Magnini, and I. Szpektor. 2006. The 2nd PASCAL recognizing textual entailment challenge. In PASCAL Challenges Workshop on RTE.
[3]
A. Berger and J. Lafferty. 1999. Information retrieval as statistical translation. SIGIR 1999, pages 222--229.
[4]
P. Berkhin. 2005. A survey on pagerank computing. Internet Mathematics, 2(1):73--120.
[5]
A. Budanitsky and G. Hirst. 2006. Evaluating wordnet-based measures of lexical semantic related-ness. Computational Linguistics, 32(1):13--47.
[6]
N. Chambers, D. Cer, T. Grenager, D. Hall, C. Kiddon, B. MacCartney, M. de Marneffe, D. Ramage, E. Yeh, and C. D. Manning. 2007. Learning alignments and leveraging natural logic. In ACL-PASCAL Workshop on Textual Entailment and Paraphrasing.
[7]
K. Collins-Thompson and J. Callan. 2005. Query expansion using random walk models. In CIKM '05, pages 704--711, New York, NY, USA. ACM Press.
[8]
C. Corley and R. Mihalcea. 2005. Measuring the semantic similarity of texts. In ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pages 13--18, Ann Arbor, Michigan, June. ACL.
[9]
J. R. Curran. 2004. From Distributional to Semantic Similarity. Ph.D. thesis, University of Edinburgh.
[10]
I. Dagan, O. Glickman, and B. Magnini. 2005. The PASCAL recognizing textual entailment challenge. In Quinonero-Candela et al., editor, MLCW 2005, LNAI Volume 3944, pages 177--190. Springer-Verlag.
[11]
B. Dolan, C. Quirk, and C. Brockett. 2004. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Coling 2004, pages 350--356, Geneva, Switzerland, Aug 23--Aug 27. COLING.
[12]
C. Fellbaum. 1998. WordNet: An electronic lexical database. MIT Press.
[13]
D. Giampiccolo, B. Magnini, I. Dagan, and B. Dolan. 2007. The 3rd PASCAL Recognizing Textual Entailment Challenge. In ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pages 1--9, Prague, June.
[14]
O. Glickman, I. Dagan, and M. Koppel. 2005. Web based probabilistic textual entailment. In PASCAL Challenges Workshop on RTE.
[15]
T. H. Haveliwala. 2002. Topic-sensitive pagerank. In WWW '02, pages 517--526, New York, NY, USA. ACM.
[16]
T. Hughes and D. Ramage. 2007. Lexical semantic relatedness with random graph walks. In EMNLP-CoNLL, pages 581--589.
[17]
M. Jarmasz and S. Szpakowicz. 2003. Roget's the-saurus and semantic similarity. In Proceedings of RANLP-03, pages 212--219.
[18]
J. J. Jiang and D. W. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In ROCLING X, pages 19--33.
[19]
T. K. Landauer, P. W. Foltz, and D. Laham. 1998. An introduction to latent semantic analysis. Discourse Processes, 25(2--3):259--284.
[20]
L. Lee. 2001. On the effectiveness of the skew divergence for statistical language analysis. In Artificial Intelligence and Statistics 2001, pages 65--72.
[21]
M. Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. ACM SIGDOC: Proceedings of the 5th Annual International Conference on Systems Documentation, 1986:24--26.
[22]
V. I. Levenshtein. 1966. Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Ph.D. thesis, Soviet Physics Doklady.
[23]
C. Manning, P. Raghavan, and H. Schutze, 2008. Introduction to information retrieval, pages 258--263. Cambridge University Press.
[24]
R. Mihalcea, C. Corley, and C. Strapparava. 2006. Corpus-based and knowledge-based measures of text semantic similarity. AAAI 2006, 6.
[25]
E. Minkov and W. W. Cohen. 2007. Learning to rank typed graph walks: Local and global approaches. In WebKDD and SNA-KDD joint workshop 2007.
[26]
G. Minnen, J. Carroll, and D. Pearce. 2001. Applied morphological processing of English. Natural Language Engineering, 7(03):207--223.
[27]
A. Montejo-Ráez, J. M. Perea, F. Martínez-Santiago, M. A. García-Cumbreras, M. M. Valdivia, and A. Ureña López. 2007. Combining lexical-syntactic information with machine learning for recognizing textual entailment. In ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pages 78--82, Prague, June. ACL.
[28]
R. Navigli and P. Velardi. 2005. Structural semantic interconnections: A knowledge-based approach to word sense disambiguation. IEEE Trans. Pattern Anal. Mach. Intell., 27(7):1075--1086.
[29]
P. Resnik. 1999. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. JAIR, (11):95--130.
[30]
R. Snow, D. Jurafsky, and A. Y. Ng. 2006. Semantic taxonomy induction from heterogenous evidence. In ACL, pages 801--808.

Cited By

View all
  • (2016)Conceptual feature generation for textual information using a conceptual network constructed from WikipediaExpert Systems: The Journal of Knowledge Engineering10.1111/exsy.1213333:1(92-106)Online publication date: 1-Feb-2016
  • (2015)From senses to textsArtificial Intelligence10.1016/j.artint.2015.07.005228:C(95-128)Online publication date: 1-Nov-2015
  • (2014)Improving search over Electronic Health Records using UMLS-based query expansion through random walksJournal of Biomedical Informatics10.1016/j.jbi.2014.04.01351:C(100-106)Online publication date: 1-Oct-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
TextGraphs-4: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
August 2009
104 pages
ISBN:9781932432541

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 07 August 2009

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)5
Reflects downloads up to 08 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Conceptual feature generation for textual information using a conceptual network constructed from WikipediaExpert Systems: The Journal of Knowledge Engineering10.1111/exsy.1213333:1(92-106)Online publication date: 1-Feb-2016
  • (2015)From senses to textsArtificial Intelligence10.1016/j.artint.2015.07.005228:C(95-128)Online publication date: 1-Nov-2015
  • (2014)Improving search over Electronic Health Records using UMLS-based query expansion through random walksJournal of Biomedical Informatics10.1016/j.jbi.2014.04.01351:C(100-106)Online publication date: 1-Oct-2014
  • (2013)Random walks down the mention graphs for event coreference resolutionACM Transactions on Intelligent Systems and Technology10.1145/2508037.25080554:4(1-20)Online publication date: 8-Oct-2013
  • (2013)Grounding linked open data in wordnetProceedings of the 12th international conference on Web and Wireless Geographical Information Systems10.1007/978-3-642-37087-8_1(1-15)Online publication date: 4-Apr-2013
  • (2012)Random walk weighting over sentiwordnet for sentiment polarity detection on TwitterProceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis10.5555/2392963.2392969(3-10)Online publication date: 12-Jul-2012
  • (2012)Test collection recycling for semantic text similarityProceedings of the 14th International Conference on Information Integration and Web-based Applications & Services10.1145/2428736.2428784(286-289)Online publication date: 3-Dec-2012
  • (2012)Experiments on pseudo relevance feedback using graph random walksProceedings of the 19th international conference on String Processing and Information Retrieval10.1007/978-3-642-34109-0_20(193-198)Online publication date: 21-Oct-2012
  • (2010)Word sense induction & disambiguation using hierarchical random graphsProceedings of the 2010 Conference on Empirical Methods in Natural Language Processing10.5555/1870658.1870731(745-755)Online publication date: 9-Oct-2010
  • (2010)Multi-prototype vector-space models of word meaningHuman Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics10.5555/1857999.1858012(109-117)Online publication date: 2-Jun-2010

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media