research-article

Free access

Random walks for text semantic similarity

Authors:

Anna N. Rafferty,

Christopher D. ManningAuthors Info & Claims

TextGraphs-4: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing

Pages 23 - 31

Published: 07 August 2009 Publication History

Abstract

Many tasks in NLP stand to benefit from robust measures of semantic similarity for units above the level of individual words. Rich semantic resources such as WordNet provide local semantic information at the lexical level. However, effectively combining this information to compute scores for phrases or sentences is an open problem. Our algorithm aggregates local relatedness information via a random walk over a graph constructed from an underlying lexical resource. The stationary distribution of the graph walk forms a "semantic signature" that can be compared to another such distribution to get a relat-edness score for texts. On a paraphrase recognition task, the algorithm achieves an 18.5% relative reduction in error rate over a vector-space baseline. We also show that the graph walk similarity between texts has complementary value as a feature for recognizing textual entailment, improving on a competitive baseline system.

References

[1]

E. Agirre and A. Soroa. 2009. Personalizing pagerank for word sense disambiguation. In EACL, Athens, Greece.

Digital Library

[2]

R. Bar-Haim, I. Dagan, B. Dolan, L. Ferro, D. Giampiccolo, B. Magnini, and I. Szpektor. 2006. The 2nd PASCAL recognizing textual entailment challenge. In PASCAL Challenges Workshop on RTE.

[3]

A. Berger and J. Lafferty. 1999. Information retrieval as statistical translation. SIGIR 1999, pages 222--229.

Digital Library

[4]

P. Berkhin. 2005. A survey on pagerank computing. Internet Mathematics, 2(1):73--120.

[5]

A. Budanitsky and G. Hirst. 2006. Evaluating wordnet-based measures of lexical semantic related-ness. Computational Linguistics, 32(1):13--47.

Digital Library

[6]

N. Chambers, D. Cer, T. Grenager, D. Hall, C. Kiddon, B. MacCartney, M. de Marneffe, D. Ramage, E. Yeh, and C. D. Manning. 2007. Learning alignments and leveraging natural logic. In ACL-PASCAL Workshop on Textual Entailment and Paraphrasing.

Digital Library

[7]

K. Collins-Thompson and J. Callan. 2005. Query expansion using random walk models. In CIKM '05, pages 704--711, New York, NY, USA. ACM Press.

Digital Library

[8]

C. Corley and R. Mihalcea. 2005. Measuring the semantic similarity of texts. In ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pages 13--18, Ann Arbor, Michigan, June. ACL.

Digital Library

[9]

J. R. Curran. 2004. From Distributional to Semantic Similarity. Ph.D. thesis, University of Edinburgh.

[10]

I. Dagan, O. Glickman, and B. Magnini. 2005. The PASCAL recognizing textual entailment challenge. In Quinonero-Candela et al., editor, MLCW 2005, LNAI Volume 3944, pages 177--190. Springer-Verlag.

Digital Library

[11]

B. Dolan, C. Quirk, and C. Brockett. 2004. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Coling 2004, pages 350--356, Geneva, Switzerland, Aug 23--Aug 27. COLING.

Digital Library

[12]

C. Fellbaum. 1998. WordNet: An electronic lexical database. MIT Press.

[13]

D. Giampiccolo, B. Magnini, I. Dagan, and B. Dolan. 2007. The 3rd PASCAL Recognizing Textual Entailment Challenge. In ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pages 1--9, Prague, June.

Digital Library

[14]

O. Glickman, I. Dagan, and M. Koppel. 2005. Web based probabilistic textual entailment. In PASCAL Challenges Workshop on RTE.

[15]

T. H. Haveliwala. 2002. Topic-sensitive pagerank. In WWW '02, pages 517--526, New York, NY, USA. ACM.

Digital Library

[16]

T. Hughes and D. Ramage. 2007. Lexical semantic relatedness with random graph walks. In EMNLP-CoNLL, pages 581--589.

[17]

M. Jarmasz and S. Szpakowicz. 2003. Roget's the-saurus and semantic similarity. In Proceedings of RANLP-03, pages 212--219.

[18]

J. J. Jiang and D. W. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In ROCLING X, pages 19--33.

[19]

T. K. Landauer, P. W. Foltz, and D. Laham. 1998. An introduction to latent semantic analysis. Discourse Processes, 25(2--3):259--284.

[20]

L. Lee. 2001. On the effectiveness of the skew divergence for statistical language analysis. In Artificial Intelligence and Statistics 2001, pages 65--72.

[21]

M. Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. ACM SIGDOC: Proceedings of the 5th Annual International Conference on Systems Documentation, 1986:24--26.

Digital Library

[22]

V. I. Levenshtein. 1966. Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Ph.D. thesis, Soviet Physics Doklady.

[23]

C. Manning, P. Raghavan, and H. Schutze, 2008. Introduction to information retrieval, pages 258--263. Cambridge University Press.

Digital Library

[24]

R. Mihalcea, C. Corley, and C. Strapparava. 2006. Corpus-based and knowledge-based measures of text semantic similarity. AAAI 2006, 6.

Digital Library

[25]

E. Minkov and W. W. Cohen. 2007. Learning to rank typed graph walks: Local and global approaches. In WebKDD and SNA-KDD joint workshop 2007.

Digital Library

[26]

G. Minnen, J. Carroll, and D. Pearce. 2001. Applied morphological processing of English. Natural Language Engineering, 7(03):207--223.

Digital Library

[27]

A. Montejo-Ráez, J. M. Perea, F. Martínez-Santiago, M. A. García-Cumbreras, M. M. Valdivia, and A. Ureña López. 2007. Combining lexical-syntactic information with machine learning for recognizing textual entailment. In ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pages 78--82, Prague, June. ACL.

Digital Library

[28]

R. Navigli and P. Velardi. 2005. Structural semantic interconnections: A knowledge-based approach to word sense disambiguation. IEEE Trans. Pattern Anal. Mach. Intell., 27(7):1075--1086.

Digital Library

[29]

P. Resnik. 1999. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. JAIR, (11):95--130.

[30]

R. Snow, D. Jurafsky, and A. Y. Ng. 2006. Semantic taxonomy induction from heterogenous evidence. In ACL, pages 801--808.

Digital Library

Cited By

Jadidinejad AMahmoudi FMeybodi M(2016)Conceptual feature generation for textual information using a conceptual network constructed from WikipediaExpert Systems: The Journal of Knowledge Engineering10.1111/exsy.1213333:1(92-106)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1111/exsy.12133
Pilehvar MNavigli R(2015)From senses to textsArtificial Intelligence10.1016/j.artint.2015.07.005228:C(95-128)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1016/j.artint.2015.07.005
Martinez DOtegi ASoroa AAgirre E(2014)Improving search over Electronic Health Records using UMLS-based query expansion through random walksJournal of Biomedical Informatics10.1016/j.jbi.2014.04.01351:C(100-106)Online publication date: 1-Oct-2014
https://dl.acm.org/doi/10.1016/j.jbi.2014.04.013
Show More Cited By

Index Terms

Random walks for text semantic similarity
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Semantic networks
    2. Natural language processing

Recommendations

Robust semantic text similarity using LSA, machine learning, and linguistic resources

Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our ...
Semantic text similarity using corpus-based word similarity and string similarity

We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for ...
Semantic similarity measures for Malay sentences
ICADL'07: Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers

The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

TextGraphs-4: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing

August 2009

104 pages

ISBN:9781932432541

General Chairs:
Monojit Choudhury
Microsoft Research (India)
,
Samer Hassan
University of North Texas
,
Animesh Mukherjee
Indian Institute of Technology (India)
,
Smaranda Muresan
Rutgers University

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 07 August 2009

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
710
Total Downloads

Downloads (Last 12 months)57
Downloads (Last 6 weeks)5

Reflects downloads up to 08 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jadidinejad AMahmoudi FMeybodi M(2016)Conceptual feature generation for textual information using a conceptual network constructed from WikipediaExpert Systems: The Journal of Knowledge Engineering10.1111/exsy.1213333:1(92-106)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1111/exsy.12133
Pilehvar MNavigli R(2015)From senses to textsArtificial Intelligence10.1016/j.artint.2015.07.005228:C(95-128)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1016/j.artint.2015.07.005
Martinez DOtegi ASoroa AAgirre E(2014)Improving search over Electronic Health Records using UMLS-based query expansion through random walksJournal of Biomedical Informatics10.1016/j.jbi.2014.04.01351:C(100-106)Online publication date: 1-Oct-2014
https://dl.acm.org/doi/10.1016/j.jbi.2014.04.013
Chen BSu JTan C(2013)Random walks down the mention graphs for event coreference resolutionACM Transactions on Intelligent Systems and Technology10.1145/2508037.25080554:4(1-20)Online publication date: 8-Oct-2013
https://dl.acm.org/doi/10.1145/2508037.2508055
Ballatore ABertolotto MWilson D(2013)Grounding linked open data in wordnetProceedings of the 12th international conference on Web and Wireless Geographical Information Systems10.1007/978-3-642-37087-8_1(1-15)Online publication date: 4-Apr-2013
https://dl.acm.org/doi/10.1007/978-3-642-37087-8_1
Montejo-Ráez AMartínez-Cámara EMartín-Valdivia MUreña-López LBalahur AMontoyo AMartínez-Barco PBoldrini E(2012)Random walk weighting over sentiwordnet for sentiment polarity detection on TwitterProceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis10.5555/2392963.2392969(3-10)Online publication date: 12-Jul-2012
https://dl.acm.org/doi/10.5555/2392963.2392969
Rahutomo FKitasuka TAritsugi MPardede E(2012)Test collection recycling for semantic text similarityProceedings of the 14th International Conference on Information Integration and Web-based Applications & Services10.1145/2428736.2428784(286-289)Online publication date: 3-Dec-2012
https://dl.acm.org/doi/10.1145/2428736.2428784
de Groc CTannier X(2012)Experiments on pseudo relevance feedback using graph random walksProceedings of the 19th international conference on String Processing and Information Retrieval10.1007/978-3-642-34109-0_20(193-198)Online publication date: 21-Oct-2012
https://dl.acm.org/doi/10.1007/978-3-642-34109-0_20
Klapaftis IManandhar SLi HMàrquez L(2010)Word sense induction & disambiguation using hierarchical random graphsProceedings of the 2010 Conference on Empirical Methods in Natural Language Processing10.5555/1870658.1870731(745-755)Online publication date: 9-Oct-2010
https://dl.acm.org/doi/10.5555/1870658.1870731
Reisinger JMooney RKaplan R(2010)Multi-prototype vector-space models of word meaningHuman Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics10.5555/1857999.1858012(109-117)Online publication date: 2-Jun-2010
https://dl.acm.org/doi/10.5555/1857999.1858012

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents