research-article

Comparison between Calculation Methods for Semantic Text Similarity based on Siamese Networks

Authors:

Lili YangAuthors Info & Claims

DSIT 2021: 2021 4th International Conference on Data Science and Information Technology

Pages 389 - 395

https://doi.org/10.1145/3478905.3478981

Published: 28 September 2021 Publication History

Abstract

In the era of information explosion, people are eager to obtain contents that meet their own needs and interests from massive amounts of information. Therefore, how to understand the needs of Internet users correctly and effectively is one of the urgent problems to be solved. In this case, semantic text similarity task is useful in many application scenarios. To measure semantic text similarity based on text matching model, several Siamese networks are constructed in this paper. Specifically, we firstly use the Stsbenchmark dataset, regarding the GloVe, BERT and DistilBERT as initial models, and add deep neural networks to train and fine-tune, fully utilizing the advantages of the existing models. Next, we test several similarity calculation methods to quantify the semantic similarity of sentence pairs. Moreover, the Pearson and Spearman correlation coefficients are used as evaluation indicators to compare the sentence embedding effects of different models. Finally, experiment result shows the Siamese network based on BERT model has the optimal effect among all, with the highest accuracy rate up to 84.5%. While among several similarity calculation methods, the Cosine Similarity usually obtain the best accuracy rate. In the future, this model can be appropriately used in semantic text similarity tasks, through matching texts between users’ needs and knowledge base. In this way, we can improve machines' language understanding ability as well as meeting the diverse needs of users.

Supplementary Material

p389-wang-supplement (p389-wang-supplement.pptx)

Presentation slides

Download
2.11 MB

References

[1]

Song, Y.Y., An unsupervised method on question similarity matching. Application Research of Computers, 2020(S02): p. 4.

[2]

Chen, Q. and W. Wa Ng, Sequential Attention-based Network for Noetic End-to-End Response Selection. Computer Speech & Language, 2019.

[3]

Thyagarajan, A. Siamese Recurrent Architectures for Learning Sentence Similarity. in Thirtieth Aaai Conference on Artificial Intelligence. 2016.

[4]

H. Gomaa, W. and A.A. Fahmy, A Survey of Text Similarity Approaches. IJCA, 2014. 68(13): p. 13-18.

[5]

Winkler, W.E., String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. Proceedings of the, 1990: p. 8.

[6]

Barrón-Cedeo, A.b.r., Plagiarism Detection across Distant Language Pairs. in COLING 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, 23-27 August 2010, Beijing, China. 2010.

[7]

Krause, E.F., Taxicab Geometry: an adventure in non-Euclidean geometry. 1987, New York: Dover Publications. viii, 88 p.

[8]

Jaccard, P., Etude comparative de la distribution florale dans une portion des Alpes et des Jura. bulletin del la societe vaudoise des sciences naturelles, 1901. 37(142): p. 547-579.

[9]

Dice, L.R., Measures of the Amount of Ecologic Association Between Species. Ecology, 1945. 26(3).

[10]

Landauer, T.K. Dumais, and T. Susan, A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 1997.

[11]

Neculoiu, P., M. Versteegh, and M. Rotaru. Learning Text Similarity with Siamese Recurrent Networks. in Repl4NLP workshop at ACL2016. 2016.

[12]

Huang, P.-S., Learning deep structured semantic models for web search using clickthrough data. Information & Knowledge Management, 2013.

Digital Library

[13]

Wan, S., A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations. AAAI Press, 2015.

[14]

Hu, B., Convolutional Neural Network Architectures for Matching Natural Language Sentences. Advances in neural information processing systems, 2015. 3.

[15]

Wan, S., Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN. Computers & Graphics, 2016. 28(5): p. 731-745.

[16]

Guo, J., A Deep Relevance Matching Model for Ad-hoc Retrieval. 2017.

[17]

Chen, Q., Enhanced LSTM for Natural Language Inference. in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016.

[18]

Mihalcea, R., C. Corley, and C. Strapparava, Corpus-based and Knowledge-based Measures of Text Semantic Similarity. Unt Scholarly Works, 2006. 1: p. 775–780.

[19]

Pennington, J., R. Socher, and C. Manning. Glove: Global Vectors for Word Representation. in Conference on Empirical Methods in Natural Language Processing. 2014.

[20]

Devlin, J., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018.

[21]

Sanh, V., DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. 2019.

[22]

Kim, Y., Convolutional Neural Networks for Sentence Classification. Eprint Arxiv, 2014.

[23]

Iyyer, M., Deep Unordered Composition Rivals Syntactic Methods for Text Classification. in Meeting of the Association for Computational Linguistics & the International Joint Conference on Natural Language Processing. 2015.

[24]

Cer, D., SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Cross-lingual Focused Evaluation. 2017.

[25]

Reimers, N. and I. Gurevych, Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. 2019.

Cited By

Zarębski SKuzmich ASitko SRusek KChołda P(2022)Siamese Neural Networks on the Trail of Similarity in Bugs in 5G Mobile Network Base StationsElectronics10.3390/electronics1122366411:22(3664)Online publication date: 9-Nov-2022
https://doi.org/10.3390/electronics11223664

Recommendations

Chinese Text Similarity Calculation Model Based on Multi-Attention Siamese Bi-LSTM
CSSE '21: Proceedings of the 4th International Conference on Computer Science and Software Engineering

Measuring text similarity is a key research area in natural language processing technology. In this research, we proposed a multi-attention Siamese bi-directional long short-term memory (MAS-Bi-LSTM) to calculate the semantic similarity between two ...
Short-Text Semantic Similarity Model of BERT-Based Siamese Network
icWCSN '23: Proceedings of the 2023 10th International Conference on Wireless Communication and Sensor Networks

People convey their emotions and thoughts through words, the medium of human thoughts. Up against the vigorous development of streaming media, the calculation of text similarity is imperative in the field of natural language processing. Any text-related ...
Semantic text similarity using corpus-based word similarity and string similarity

We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

DSIT 2021: 2021 4th International Conference on Data Science and Information Technology

July 2021

481 pages

ISBN:9781450390248

DOI:10.1145/3478905

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China
National Key R&D Program of China

Conference

DSIT 2021

DSIT 2021: 2021 4th International Conference on Data Science and Information Technology

July 23 - 25, 2021

Shanghai, China

Acceptance Rates

Overall Acceptance Rate 114 of 277 submissions, 41%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
175
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zarębski SKuzmich ASitko SRusek KChołda P(2022)Siamese Neural Networks on the Trail of Similarity in Bugs in 5G Mobile Network Base StationsElectronics10.3390/electronics1122366411:22(3664)Online publication date: 9-Nov-2022
https://doi.org/10.3390/electronics11223664

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents