Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

STMAP: : A novel semantic text matching model augmented with embedding perturbations

Published: 01 January 2024 Publication History

Abstract

Semantic text matching models have achieved outstanding performance, but traditional methods may not solve Few-shot learning problems and data augmentation techniques could suffer from semantic deviation. To solve this problem, we propose STMAP, which is implemented from the perspective of data augmentation based on Gaussian noise and Noise Mask signal. We also employ an adaptive optimization network to dynamically optimize the several training targets generated by data augmentation. We evaluated our model on four English datasets: MRPC, SciTail, SICK, and RTE, with achieved scores of 90.3%, 94.2%, 88.9%, and 68.8%, respectively. Our model obtained state-of-the-art (SOTA) results on three of the English datasets. Furthermore, we assessed our approach on three Chinese datasets, and achieved an average improvement of 1.3% over the baseline model. Additionally, in the Few-shot learning experiment, our model outperformed the baseline performance by 5%, especially when the data volume was reduced by around 0.4. Our ablation experiments further validated the effectiveness of STMAP. We have released our source code. https://github.com/wangyanhao0517/STMAP.

Highlights

Proposing Semantic Text Matching Model Augmented with Perturbations (STMAP).
Introducing a noise perturbation augmentation scheme based on Gaussian noise.
Proposing a multi-task adaptive optimization scheme.
STMAP demonstrates strong generalization capabilities.
STMAP model achieves state-of-the-art result and performs well under Few-shot scenes.

References

[1]
Bai J., Wang Y., Chen Y., Yang Y., Bai J., Yu J., et al., Improving pre-trained transformers with syntax trees, in: Proceedings of the 16th conference of the european chapter of the association for computational linguistics, Kiev, Ukraine, 2021, pp. 21–23.
[2]
Bollacker K., Evans C., Paritosh P., Sturge T., Taylor J., Freebase: a collaboratively created graph database for structuring human knowledge, in: Proceedings of the 2008 ACM SIGMOD international conference on management of data, 2008, pp. 1247–1250.
[3]
Brown T., Mann C., et al., Language models are few-shot learners, Advances in Neural Information Processing Systems 33 (2020) 1877–1901.
[4]
Chen L., Zhao Y., Lyu B., Jin L., Chen Z., Zhu S., et al., Neural graph matching networks for Chinese short text matching, in: Proceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp. 6152–6158.
[5]
Chen Q., Zhu X., Ling Z.-H., Wei S., Jiang H., Inkpen D., Enhanced LSTM for natural language inference, in: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long papers), 2017, pp. 1657–1668.
[6]
Coulombe C., Text data augmentation made simple by leveraging NLP cloud APIs, 2018, CoRR abs/1812.04718.
[7]
Ding X., Chen B., Du L., Qin B., Liu T., CogBERT: Cognition-guided pre-trained language models, in: Proceedings of the 29th international conference on computational linguistics, 2022, pp. 3210–3225.
[8]
Fadaee M., Bisazza A., Monz C., Data augmentation for low-resource neural machine translation, in: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 2: Short papers), 2017, pp. 567–573.
[9]
Fukushima K., Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics 36 (4) (1980) 193–202.
[10]
Gao T., Yao X., Chen D., SimCSE: Simple contrastive learning of sentence embeddings, in: 2021 conference on empirical methods in natural language processing, EMNLP 2021, Association for Computational Linguistics (ACL), 2021, pp. 6894–6910.
[11]
Goodfellow I.J., Shlens J., Szegedy C., Explaining and harnessing adversarial examples, stat 1050 (2015) 20.
[12]
Guo J., Fan Y., Ji X., Cheng X., Matchzoo: A learning, practicing, and developing system for neural text matching, in: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, 2019, pp. 1297–1300.
[13]
He P., Liu X., Gao J., Chen W., Deberta: Decoding-enhanced BERT with disentangled attention, in: International conference on learning representations, 2021, pp. 1210–1233.
[14]
Hochreiter S., Schmidhuber J., Long short-term memory, Neural Computation 9 (8) (1997) 1735–1780.
[15]
Huang P.-S., He X., Gao J., Deng L., Acero A., Heck L., Learning deep structured semantic models for web search using clickthrough data, in: Proceedings of the 22nd ACM international conference on information & knowledge management, 2013, pp. 2333–2338.
[16]
Humeau S., Shuster K., Lachaux M.-A., Weston J., Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring, in: International conference on learning representations, 2020, pp. 2563–2578.
[17]
Jia Q., Zhang D., Yang S., Xia C., Shi Y., Tao H., et al., Traditional Chinese medicine symptom normalization approach leveraging hierarchical semantic information and text matching with attention mechanism, Journal of Biomedical Informatics 116 (2021).
[18]
Karimi A., Rossi L., Prati A., AEDA: An easier data augmentation technique for text classification, in: Findings of the association for computational linguistics: EMNLP 2021, 2021, pp. 2748–2754.
[19]
Karras T., Laine S., Aila T., A style-based generator architecture for generative adversarial networks, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
[20]
Kenton J.D.M.-W.C., Toutanova L.K., BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of NAACL-HLT, 2019, pp. 4171–4186.
[21]
Khot T., Sabharwal A., Clark P., SCITAIL: a textual entailment dataset from science question answering, in: Proceedings of the thirty-second AAAI conference on artificial intelligence and thirtieth innovative applications of artificial intelligence conference and eighth AAAI symposium on educational advances in artificial intelligence, 2018, pp. 5189–5197.
[22]
Kingma D.P., Ba J., Adam: A method for stochastic optimization, in: Bengio Y., LeCun Y. (Eds.), 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference track proceedings, 2015.
[23]
Le Scao T., Rush A.M., How many data points is a prompt worth?, in: Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: Human language technologies, 2021, pp. 2627–2636.
[24]
Lee S., Kang M., Lee J., Hwang S.J., Learning to perturb word embeddings for out-of-distribution QA, in: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long papers), 2021, pp. 5583–5595.
[25]
Li Z., Li X., Xie H., Wang F.L., Leng M., Li Q., et al., A novel dropout mechanism with label extension schema toward text emotion classification, Information Processing & Management 60 (2) (2023),. URL: https://www.sciencedirect.com/science/article/pii/S0306457322002746.
[26]
Li D., Yang Y., Tang H., Liu J., Wang Q., Wang J., et al., VIRT: Improving representation-based text matching via virtual interaction, in: Proceedings of the 2022 conference on empirical methods in natural language processing, 2022, pp. 914–925.
[27]
Li Y., Zhou K., Zhao W.X., Wen J.-R., Diffusion models for non-autoregressive text generation: A survey, 2023, arXiv preprint arXiv:2303.06574.
[28]
Lin D., Tang J., Li X., Pang K., Li S., Wang T., BERT-SMAP: Paying attention to Essential Terms in passage ranking beyond BERT, Information Processing & Management 59 (2) (2022).
[29]
Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., et al., RoBERTa: A robustly optimized BERT pretraining approach, 2019, CoRR abs/1907.11692.
[30]
Madry A., Makelov A., Schmidt L., Tsipras D., Vladu A., Towards deep learning models resistant to adversarial attacks, stat 1050 (2019) 4.
[31]
Marelli M., Menini S., Baroni M., Bentivogli L., Bernardi R., Zamparelli R., A SICK cure for the evaluation of compositional distributional semantic models, in: Proceedings of the ninth international conference on language resources and evaluation (LREC’14), 2014, pp. 216–223.
[32]
Miller G.A., WordNet: a lexical database for English, Communications of the ACM 38 (11) (1995) 39–41.
[33]
Min J., McCoy R.T., Das D., Pitler E., Linzen T., Syntactic data augmentation increases robustness to inference heuristics, in: Proceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp. 2339–2352.
[34]
Niu Z., Zhong G., Yu H., A review on the attention mechanism of deep learning, Neurocomputing 452 (2021) 48–62.
[35]
Pu X., Yuan L., Leng J., Wu T., Gao X., Lexical knowledge enhanced text matching via distilled word sense disambiguation, Knowledge-Based Systems (2023).
[36]
Raffel C., Shazeer N., Roberts A., Lee K., Narang S., Matena M., et al., Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of Machine Learning Research 21 (1) (2020) 5485–5551.
[37]
Santhanam K., Khattab O., Saad-Falcon J., Potts C., Zaharia M., ColBERTv2: Effective and efficient retrieval via lightweight late interaction, in: Proceedings of the 2022 conference of the north american chapter of the association for computational linguistics: Human language technologies, 2022, pp. 3715–3734.
[38]
Shen J., Pan T., Xu M., Gan D., An B., A novel DL-based algorithm integrating medical knowledge graph and doctor modeling for Q&A pair matching in OHP, Information Processing & Management 60 (3) (2023),. URL: https://www.sciencedirect.com/science/article/pii/S0306457323000596.
[39]
Shorten C., Khoshgoftaar T.M., Furht B., Text data augmentation for deep learning, Journal of Big Data 8 (2021) 1–34.
[40]
Tay Y., Luu A.T., Hui S.C., Hermitian co-attention networks for text matching in asymmetrical domains, in: IJCAI, 2018, pp. 4425–4431.
[41]
Wang Z., Hamza W., Florian R., Bilateral multi-perspective matching for natural language sentences, in: Proceedings of the 26th international joint conference on artificial intelligence, 2017, pp. 4144–4150.
[42]
Wang S., Liang D., Song J., Li Y., Wu W., DABERT: Dual attention enhanced BERT for semantic matching, in: Proceedings of the 29th international conference on computational linguistics, 2022, pp. 1645–1654.
[43]
Wang J., Pan M., He T., Huang X., Wang X., Tu X., A pseudo-relevance feedback framework combining relevance matching and semantic matching for information retrieval, Information Processing & Management 57 (6) (2020).
[44]
Wei J., Zou K., EDA: Easy data augmentation techniques for boosting performance on text classification tasks, in: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), 2019, pp. 6382–6388.
[45]
Wu L.-T., Lin J.-R., Leng S., Li J.-L., Hu Z.-Z., Rule-based information extraction for mechanical-electrical-plumbing-specific semantic web, Automation in Construction 135 (2022).
[46]
Wu X., Lv S., Zang L., Han J., Hu S., Conditional bert contextual augmentation, in: Computational science–ICCS 2019: 19th international conference, Faro, Portugal, June 12–14, 2019, Proceedings, Part IV 19, Springer, 2019, pp. 84–95.
[47]
Xia T., Wang Y., Tian Y., Chang Y., Using prior knowledge to guide bert’s attention in semantic textual matching tasks, in: Proceedings of the web conference 2021, 2021, pp. 2466–2475.
[48]
Xiang C., Zhang J., Li F., Fei H., Ji D., A semantic and syntactic enhanced neural model for financial sentiment analysis, Information Processing & Management 59 (4) (2022).
[49]
Xu B., Xu Y., Liang J., Xie C., Liang B., Cui W., et al., CN-DBpedia: A never-ending Chinese knowledge extraction system, in: Advances in artificial intelligence: From theory to practice: 30th international conference on industrial engineering and other applications of applied intelligent systems, IEA/AIE 2017, Arras, France, June 27-30, 2017, Proceedings, Part II, Springer, 2017, pp. 428–438.
[50]
Yang Z., Dai Z., Yang Y., Carbonell J., Salakhutdinov R.R., Le Q.V., Xlnet: Generalized autoregressive pretraining for language understanding, Advances in Neural Information Processing Systems 32 (2019).
[51]
Yang Y., Miao R., Wang Y., Wang X., Contrastive Graph Convolutional Networks with adaptive augmentation for text classification, Information Processing & Management 59 (4) (2022).
[52]
Yu C., Xue H., Jiang Y., An L., Li G., A simple and efficient text matching model based on deep interaction, Information Processing & Management 58 (6) (2021).
[53]
Zhang Q., Chen S., Fang M., Chen X., Joint reasoning with knowledge subgraphs for Multiple Choice Question Answering, Information Processing & Management 60 (3) (2023),. URL: https://www.sciencedirect.com/science/article/pii/S0306457323000341.
[54]
Zhang Z., Han X., Liu Z., Jiang X., Sun M., Liu Q., ERNIE: Enhanced language representation with informative entities, in: Proceedings of the 57th annual meeting of the association for computational linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 1441–1451,. URL: https://aclanthology.org/P19-1139.
[55]
Zhang Z., Wu Y., Zhao H., Li Z., Zhang S., Zhou X., et al., Semantics-aware BERT for language understanding, 2019, CoRR abs/1909.02209.
[56]
Zou Y., Liu H., Gui T., Wang J., Zhang Q., Tang M., et al., Divide and conquer: Text semantic matching with disentangled keywords and intents, in: Findings of the association for computational linguistics: ACL 2022, 2022, pp. 3622–3632.
[57]
Zuo Y., Lu W., Peng X., Wang S., Zhang W., Qiao X., DuCL: Dual-stage contrastive learning framework for Chinese semantic textual matching, Computers & Electrical Engineering 106 (2023).

Cited By

View all
  • (2024)Enhancing Chinese abbreviation prediction with LLM generation and contrastive evaluationInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10376861:4Online publication date: 18-Jul-2024

Index Terms

  1. STMAP: A novel semantic text matching model augmented with embedding perturbations
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image Information Processing and Management: an International Journal
            Information Processing and Management: an International Journal  Volume 61, Issue 1
            Jan 2024
            823 pages

            Publisher

            Pergamon Press, Inc.

            United States

            Publication History

            Published: 01 January 2024

            Author Tags

            1. Semantic text matching
            2. Data augmentation
            3. Embedding perturbations
            4. Adaptive networks
            5. Few-shot

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 21 Jan 2025

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)Enhancing Chinese abbreviation prediction with LLM generation and contrastive evaluationInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10376861:4Online publication date: 18-Jul-2024

            View Options

            View options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media