Nothing Special   »   [go: up one dir, main page]

Skip to main content

Self-adapted Positional Encoding in the Transformer Encoder for Named Entity Recognition

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2023 (ICANN 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14259))

Included in the following conference series:

  • 1161 Accesses

Abstract

The task of named entity recognition (NER) is fundamental to natural language processing (NLP), as it forms the basis for various downstream applications such as question answering, text summarization, and machine translation. With the development of Transformer architecture, it has gained popularity in NLP due to its ability to model parallel distant contextual dependencies. Although positional encoding is crucial in transformer-based NER models for capturing the sequential feature of natural language and improving their accuracy in NER, most approach, which uses a fixed mathematical formula to assign a unique vector to each position, is a hard-coded encoding To address this issue, a self-adapted positional encoding module called self-adapter is proposed in a Transformer model. The proposed self-adapter incorporates two information fusers aimed at enhancing the embedding representational ability of the model. The first information fuser integrates information across different positions, enhancing the embedding representational ability for different ranges. The second information fuser integrates diverse dimensional information for one position, resulting in improved embedding representation. Besides, We modify the calculation of the attention score to enhance the utilization of the self-adapter. A mathematical analysis based on Fourier series is presented to demonstrate the effectiveness of the proposed method. This approach allows for dynamic positional encoding adjustment, facilitating adaptation to varying contextual inputs and more flexibility to capture word relationships. To evaluate the model, four NER datasets, including one English and three Chinese datasets, are used. The results show that the self-adapter substantially improves the Transformer’s performance in the NER task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. In: Proceedings of the Transactions of the Association for Computational Linguistics, pp. 357–370 (2016)

    Google Scholar 

  2. Yang, J., Liang, S., Zhang, Y.: Design challenges and misconceptions in neural sequence labeling. In: COLING, pp. 3879–3889 (2018)

    Google Scholar 

  3. Yao, L., Liu, H., Liu, Y., Li, X., Anwar, M.W.: Biomedical named entity recognition based on deep neutral network. Int. J. Hybrid Inf. Technol. 8(8), 279–288 (2015)

    Google Scholar 

  4. Ma, X., Hovy, E.: End-to-end sequence labeling via bidirectional LSTM-CNNS-CRF. In: ACL (2016)

    Google Scholar 

  5. Lin, B.Y., Xu, F.F., Luo, Z., Zhu, K.: Multi-channel bilstm-crf model for emerging named entity recognition in social media. In: Proceedings of the 3rd Workshop on Noisy User-generated Text, pp. 160–165 (2017)

    Google Scholar 

  6. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res., 2493–2537 (2011)

    Google Scholar 

  7. Strubell, E., Verga, P., Belanger, D., McCallum, A.: Fast and accurate entity recognition with iterated dilated convolutions. In: ACL (2017)

    Google Scholar 

  8. Žukov-Gregorič, A., Bachrach, Y., Coope, S.: Named entity recognition with parallel recurrent neural networks. In: ACL (2018)

    Google Scholar 

  9. Zhai, F., Potdar, S., Xiang, B., Zhou, B.: Neural models for sequence chunking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, pp. 3365–3371 (2017)

    Google Scholar 

  10. Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)

    Google Scholar 

  11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)

    Google Scholar 

  12. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training. Technical Report, OpenAI (2018)

    Google Scholar 

  13. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)

    MathSciNet  Google Scholar 

  14. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  15. Yan, H., Deng, B., Li, X., Qiu, X.: TENER: adapting transformer encoder for named entity recognition. arXiv preprint: arXiv:1911.04474 (2019)

  16. Li, X., Yan, H., Qiu, X., Huang, X.: FLAT: Chinese NER using flat-lattice transformer. In: ACL, pp. 6836–6842 (2020)

    Google Scholar 

  17. Jin, Z., He, X., Wu, X., Zhao, X.: A hybrid transformer approach for Chinese NER with features augmentation. Expert Syst. Appl. 209, 118385 (2022)

    Article  Google Scholar 

  18. Neishi, M., Yoshinaga, N.: On the relation between position information and sentence length in neural machine translation. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 328–338 (2019)

    Google Scholar 

  19. Li, Y., Si, S., Li, G., Hsieh, C.J., Bengio, S.: Learnable fourier features for multi-dimensional spatial positional encoding. In: Advances in Neural Information Processing Systems, vol. 34, pp. 15816–15829 (2021)

    Google Scholar 

  20. Wang, B., et al.: On position embeddings in BERT. In: ICLR (2021)

    Google Scholar 

  21. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: ACL, pp. 2978–2988 (2019)

    Google Scholar 

  22. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint: arXiv:1301.3781 (2013)

  23. Nguyen, T.H., Sil, A., Dinu, G., Florian, R.: Toward mention detection robustness with recurrent neural networks. arXiv preprint: arXiv:1602.07749 (2016)

  24. Zheng, S., Wang, F., Bao, H., Hao, Y., Zhou, P., Xu, B.: Joint extraction of entities and relations based on a novel tagging scheme. In: ACL (2017)

    Google Scholar 

  25. Li, P.H., Dong, R.P., Wang, Y.S., Chou, J.C., Ma, W.Y.: Leveraging linguistic structures for named entity recognition with bidirectional recursive neural networks. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2664–2669 (2017)

    Google Scholar 

  26. Peters, M.E., et al.: Deep contextualized word representations. In: NAACL-HLT, pp. 2227–2237 (2018)

    Google Scholar 

  27. Kuru, O., Can, O.A., Yuret, D.: CharNER: character-level named entity recognition. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 911–921 (2016)

    Google Scholar 

  28. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: NAACL, pp. 260–270 (2016)

    Google Scholar 

  29. Zhang, Y., Yang, J.: Chinese NER using lattice LSTM. In: ACL, pp. 1554–1564 (2018)

    Google Scholar 

  30. Guo, Q., Qiu, X., Liu, P., Shao, Y., Xue, X., Zhang, Z.: Star-transformer. In: NAACL, pp. 1315–1325 (2019)

    Google Scholar 

  31. Arfken, G.B., Weber, H.J.: Mathematical methods for physicists (1999)

    Google Scholar 

  32. Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: NAACL, pp. 142–147 (2003)

    Google Scholar 

  33. Peng, N., Dredze, M.: Named entity recognition for Chinese social media with jointly trained embeddings. In: EMNLP, pp. 548–554 (2015)

    Google Scholar 

  34. Levow, G.A.: The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of the Fifth Workshop on Chinese Language Processing, SIGHAN@COLING/ACL 2006, Sydney, Australia, pp. 108–117 (2006)

    Google Scholar 

  35. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint: arXiv:1508.01991 (2015)

  36. Chiu, J.P., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. In: TACL, vol. 4, pp. 357–370 (2016)

    Google Scholar 

  37. Akhundov, A., Trautmann, D., Groh, G.: Sequence labeling: a practical approach. arXiv preprint: arXiv:1808.03926 (2018)

  38. Liu, P., Chang, S., Huang, X., Tang, J., Cheung, J.C.K.: Contextualized non-local neural networks for sequence learning. In: AAAI, vol. 33, no. 01, pp. 6762–6769 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haixian Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huangliang, K., Li, X., Yin, T., Peng, B., Zhang, H. (2023). Self-adapted Positional Encoding in the Transformer Encoder for Named Entity Recognition. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14259. Springer, Cham. https://doi.org/10.1007/978-3-031-44223-0_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44223-0_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44222-3

  • Online ISBN: 978-3-031-44223-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics