Abstract
Recent progress in using deep learning for training word embedding has motivated us to explore the research of semantic representation in long texts, such as sentences, paragraphs and chapters. The existing methods typically use word weights and word vectors to calculate sentence embedding. However, these methods lose the word order and the syntactic structure information of sentences. This paper proposes a method for sentence embedding based on the results of syntactic parsing tree and word vectors. We propose the SynTree-WordVec method for deriving sentence embedding, which merges word vectors and the syntactic structure from the Stanford parser. The experimental results show the potential to solve the shortcomings of existing methods. Compared to the traditional sentence embedding weighting method, our method achieves better or comparable performance on various text similarity tasks, especially with the low dimension of the data set.
Supported by the National Natural Science Foundation of China (NSFC) under Grant No.61877031, No.61876074.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agirre, E., et al.: Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 252–263 (2015)
Agirre, E., et al.: Semeval-2014 task 10: Multilingual semantic textual similarity. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 81–91 (2014)
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: *sem 2013 shared task: Semantic textual similarity. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pp. 32–43 (2013)
Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: A pilot on semantic textual similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 385–393. Association for Computational Linguistics (2012)
Arora, S., Liang, Y., Ma, T.: A simple but tough-to-beat baseline for sentence embeddings (2016)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
Blacoe, W., Lapata, M.: A comparison of vector-based representations for semantic composition. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 546–556. Association for Computational Linguistics (2012)
Collobert, R., Weston, J.: A unified architecture for natural language processing. In: International Conference on Machine Learning (2008)
Frege, G.: Über begriff und gegenstand (1892)
Hermann, K.M.: Distributed representations for compositional semantics (2014). arXiv preprint arXiv:1411.3146
Lai, S.: Word and document embeddings based on neural network approaches (2016). arXiv preprint arXiv:1611.05962
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Manning, C.D., Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT press, Cambridge (1999)
Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 1–8 (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Mitchell, J., Lapata, M.: Vector-based models of semantic composition. In: proceedings of ACL-08: HLT, pp. 236–244 (2008)
Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1429 (2010)
Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)
Socher, R., Lin, C.C., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 129–136 (2011)
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Song, M., Zhao, X., Liu, Y., Zhao, Z.: Text sentiment analysis based on convolutional neural network and bidirectional LSTM model. In: Zhou, Q., Miao, Q., Wang, H., Xie, W., Wang, Y., Lu, Z. (eds.) ICPCSEE 2018. CCIS, vol. 902, pp. 55–68. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2206-8_6
Wu, W., Zhou, J., Qu, W.: A survey of syntactic parsing based on statistical learning. J. Chin. Inf. Process. 27(3), 9–19 (2013)
Xiong, Z., Shen, Q., Wang, Y., Zhu, C.: Paragraph vector representation based on word to vector and CNN learning. CMC Comput. Mater. Contin 55, 213–227 (2018)
Xiong, Z., Shen, Q., Xiong, Y., Wang, Y., Li, W.: New generation model of word vector representation based on CBOW or skip-gram. CMC-Comput. Mater. Continua 60(1), 259–273 (2019)
Xu, W., Callison-Burch, C., Dolan, B.: Semeval-2015 task 1: Paraphrase and semantic similarity in twitter (PIT). In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 1–11 (2015)
Zhang, X., Lu, W., Li, F., Peng, X., Zhang, R.: Deep feature fusion model for sentence semantic matching. Mater. Continua Comput. 61(2), 601–616 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Y., Zhong, M., Tao, L., Wu, S. (2020). Computing Sentence Embedding by Merging Syntactic Parsing Tree and Word Embedding. In: Sun, X., Wang, J., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2020. Lecture Notes in Computer Science(), vol 12239. Springer, Cham. https://doi.org/10.1007/978-3-030-57884-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-57884-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57883-1
Online ISBN: 978-3-030-57884-8
eBook Packages: Computer ScienceComputer Science (R0)