Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2428736.2428784acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Test collection recycling for semantic text similarity

Published: 03 December 2012 Publication History

Abstract

Semantic text similarity (STS) uses specific test collections as its performance evaluation measurement. The test collections consist of text pairs with the same meaning even though in different text form. The existence is scarce compared with information retrieval (IR) test collections. This paper investigates the possibility to reuse IR test collections for STS tasks. Text pairs are derived from the relevant pair of IR test collections. Latent semantic analysis (LSA) and explicit semantic analysis (ESA) evaluate Glasgow's test collections, which are provided by ACM SIGIR community. Jaccard index measures the lexical similarity. Recall metric measures retrievability of recycling test collection with two existing test collections, Microsoft research paraphrase corpus and Microsoft research video description corpus, as evaluation baselines. Evaluation yields a promising outcome; the evaluated test collections have low Jaccard index and their recall values between the two baselines.

References

[1]
E. Agirre, D. Cer, M. Diab, and A. G. Agirre. Semeval-2012 task 6: A pilot on semantic textual similarity. In Proceedings of the First Joint Conference on Lexical and Computational Semantics, pages 385--393. SEM-2012, June 2012.
[2]
D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, pages 190--200. HLT '11, June 2011.
[3]
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 1606--1611. IJCAI'07, January 2007.
[4]
A. Islam and D. Inkpen. Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discov. Data, 2(2): Article 10, July 2008.
[5]
C. A. Kumar and S. Srinivas. Latent semantic indexing using eigenvalue analysis for efficient information retrieval. Int. J. Appl. Math. Comput. Sci, 16(4):551--558, July 2006.
[6]
M. D. Lee, B. Pincombe, and M. Welsh. An empirical evaluation of models of text document similarity. In Proceedings of the XXVII Annual Conference of the Cognitive Science Society, pages 1254--1259. CogSci2005, July 2005.
[7]
Y. Li, D. Mclean, Z. B, J. D. O'shea, and K. Crockett. Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering, 18(8):1138--1150, August 2006.
[8]
C. Liu, D. Dahlmeier, and H. T. Ng. Pem: a paraphrase evaluation metric exploiting parallel texts. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 923--932. EMNLP '10, October 2010.
[9]
E. Marsi and E. Krahmer. Automatic analysis of semantic similarity in comparable text through syntactic tree matching. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 752--760. COLING '10, August 2010.
[10]
R. Mihalcea, C. Corley, and C. Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the 21st national conference on Artificial intelligence - Volume 1, pages 775--780. AAAI'06, July 2006.
[11]
M. Mohler, R. Bunescu, and R. Mihalcea. Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, pages 752--762. HLT '11, June 2011.
[12]
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 311--318. ACL '02, July 2002.
[13]
C. Quirk, C. Brockett, and W. Dolan. Monolingual machine translation for paraphrase generation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 142--149. EMNLP 2004, July 2004.
[14]
D. Ramage, A. N. Rafferty, and C. D. Manning. Random walks for text semantic similarity. In Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, pages 23--31. ACL-IJCNLP 2009, August 2009.
[15]
W.-t. Yih, K. Toutanova, J. C. Platt, and C. Meek. Learning discriminative projections for text similarity measures. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pages 247--256. CoNLL '11, June 2011.

Cited By

View all
  • (2024)Social Robot Detection based on User Behavioral RepresentationInformation Sciences10.1016/j.ins.2024.121537(121537)Online publication date: Oct-2024
  • (2024)Harnessing the Power of AI-Instructor Collaborative Grading Approach: Topic-Based Effective Grading for Semi Open-Ended Multipart QuestionsComputers and Education: Artificial Intelligence10.1016/j.caeai.2024.100339(100339)Online publication date: Dec-2024
  • (2024)Leveraging Sentiment Analysis of Drugs Review-Based Drugs Recommender SystemInnovations in Data Analytics10.1007/978-981-97-4928-7_18(229-238)Online publication date: 10-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
December 2012
432 pages
ISBN:9781450313063
DOI:10.1145/2428736
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • @WAS: International Organization of Information Integration and Web-based Applications and Services

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. semantic text similarity
  2. test collection

Qualifiers

  • Research-article

Conference

IIWAS '12
Sponsor:
  • @WAS

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Social Robot Detection based on User Behavioral RepresentationInformation Sciences10.1016/j.ins.2024.121537(121537)Online publication date: Oct-2024
  • (2024)Harnessing the Power of AI-Instructor Collaborative Grading Approach: Topic-Based Effective Grading for Semi Open-Ended Multipart QuestionsComputers and Education: Artificial Intelligence10.1016/j.caeai.2024.100339(100339)Online publication date: Dec-2024
  • (2024)Leveraging Sentiment Analysis of Drugs Review-Based Drugs Recommender SystemInnovations in Data Analytics10.1007/978-981-97-4928-7_18(229-238)Online publication date: 10-Sep-2024
  • (2024)Mining Literary Trends: A Tool for Digital Library AnalysisLinking Theory and Practice of Digital Libraries10.1007/978-3-031-72437-4_20(342-359)Online publication date: 26-Sep-2024
  • (2024)Gesture-Based Machine Learning for Enhanced Autonomous Driving: A Novel Dataset and System Integration ApproachHCI International 2024 Posters10.1007/978-3-031-61963-2_24(247-256)Online publication date: 8-Jun-2024
  • (2023)Automatic Classification for Unlabeled Email Messages into FoldersHighlights in Science, Engineering and Technology10.54097/hset.v34i.543234(120-126)Online publication date: 28-Feb-2023
  • (2023)Semantic Image Captioning using Cosine Similarity Ranking with Semantic SearchProceedings of the 2023 Fifteenth International Conference on Contemporary Computing10.1145/3607947.3607987(220-223)Online publication date: 28-Sep-2023
  • (2023)Bubbles bursting: Investigating and measuring the personalisation of social media searchesTelematics and Informatics10.1016/j.tele.2023.10199982(101999)Online publication date: Aug-2023
  • (2023)An approach towards removal of data heterogeneity in SDN-based IoT frameworkInternet of Things10.1016/j.iot.2023.10076322(100763)Online publication date: Jul-2023
  • (2023)Analyzing research diversity of scholars based on multi-dimensional calculation of knowledge entitiesScientometrics10.1007/s11192-023-04821-3Online publication date: 16-Sep-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media