Abstract
The task of named entity recognition (NER) is fundamental to natural language processing (NLP), as it forms the basis for various downstream applications such as question answering, text summarization, and machine translation. With the development of Transformer architecture, it has gained popularity in NLP due to its ability to model parallel distant contextual dependencies. Although positional encoding is crucial in transformer-based NER models for capturing the sequential feature of natural language and improving their accuracy in NER, most approach, which uses a fixed mathematical formula to assign a unique vector to each position, is a hard-coded encoding To address this issue, a self-adapted positional encoding module called self-adapter is proposed in a Transformer model. The proposed self-adapter incorporates two information fusers aimed at enhancing the embedding representational ability of the model. The first information fuser integrates information across different positions, enhancing the embedding representational ability for different ranges. The second information fuser integrates diverse dimensional information for one position, resulting in improved embedding representation. Besides, We modify the calculation of the attention score to enhance the utilization of the self-adapter. A mathematical analysis based on Fourier series is presented to demonstrate the effectiveness of the proposed method. This approach allows for dynamic positional encoding adjustment, facilitating adaptation to varying contextual inputs and more flexibility to capture word relationships. To evaluate the model, four NER datasets, including one English and three Chinese datasets, are used. The results show that the self-adapter substantially improves the Transformer’s performance in the NER task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. In: Proceedings of the Transactions of the Association for Computational Linguistics, pp. 357–370 (2016)
Yang, J., Liang, S., Zhang, Y.: Design challenges and misconceptions in neural sequence labeling. In: COLING, pp. 3879–3889 (2018)
Yao, L., Liu, H., Liu, Y., Li, X., Anwar, M.W.: Biomedical named entity recognition based on deep neutral network. Int. J. Hybrid Inf. Technol. 8(8), 279–288 (2015)
Ma, X., Hovy, E.: End-to-end sequence labeling via bidirectional LSTM-CNNS-CRF. In: ACL (2016)
Lin, B.Y., Xu, F.F., Luo, Z., Zhu, K.: Multi-channel bilstm-crf model for emerging named entity recognition in social media. In: Proceedings of the 3rd Workshop on Noisy User-generated Text, pp. 160–165 (2017)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res., 2493–2537 (2011)
Strubell, E., Verga, P., Belanger, D., McCallum, A.: Fast and accurate entity recognition with iterated dilated convolutions. In: ACL (2017)
Žukov-Gregorič, A., Bachrach, Y., Coope, S.: Named entity recognition with parallel recurrent neural networks. In: ACL (2018)
Zhai, F., Potdar, S., Xiang, B., Zhou, B.: Neural models for sequence chunking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, pp. 3365–3371 (2017)
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training. Technical Report, OpenAI (2018)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Yan, H., Deng, B., Li, X., Qiu, X.: TENER: adapting transformer encoder for named entity recognition. arXiv preprint: arXiv:1911.04474 (2019)
Li, X., Yan, H., Qiu, X., Huang, X.: FLAT: Chinese NER using flat-lattice transformer. In: ACL, pp. 6836–6842 (2020)
Jin, Z., He, X., Wu, X., Zhao, X.: A hybrid transformer approach for Chinese NER with features augmentation. Expert Syst. Appl. 209, 118385 (2022)
Neishi, M., Yoshinaga, N.: On the relation between position information and sentence length in neural machine translation. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 328–338 (2019)
Li, Y., Si, S., Li, G., Hsieh, C.J., Bengio, S.: Learnable fourier features for multi-dimensional spatial positional encoding. In: Advances in Neural Information Processing Systems, vol. 34, pp. 15816–15829 (2021)
Wang, B., et al.: On position embeddings in BERT. In: ICLR (2021)
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: ACL, pp. 2978–2988 (2019)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint: arXiv:1301.3781 (2013)
Nguyen, T.H., Sil, A., Dinu, G., Florian, R.: Toward mention detection robustness with recurrent neural networks. arXiv preprint: arXiv:1602.07749 (2016)
Zheng, S., Wang, F., Bao, H., Hao, Y., Zhou, P., Xu, B.: Joint extraction of entities and relations based on a novel tagging scheme. In: ACL (2017)
Li, P.H., Dong, R.P., Wang, Y.S., Chou, J.C., Ma, W.Y.: Leveraging linguistic structures for named entity recognition with bidirectional recursive neural networks. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2664–2669 (2017)
Peters, M.E., et al.: Deep contextualized word representations. In: NAACL-HLT, pp. 2227–2237 (2018)
Kuru, O., Can, O.A., Yuret, D.: CharNER: character-level named entity recognition. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 911–921 (2016)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: NAACL, pp. 260–270 (2016)
Zhang, Y., Yang, J.: Chinese NER using lattice LSTM. In: ACL, pp. 1554–1564 (2018)
Guo, Q., Qiu, X., Liu, P., Shao, Y., Xue, X., Zhang, Z.: Star-transformer. In: NAACL, pp. 1315–1325 (2019)
Arfken, G.B., Weber, H.J.: Mathematical methods for physicists (1999)
Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: NAACL, pp. 142–147 (2003)
Peng, N., Dredze, M.: Named entity recognition for Chinese social media with jointly trained embeddings. In: EMNLP, pp. 548–554 (2015)
Levow, G.A.: The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of the Fifth Workshop on Chinese Language Processing, SIGHAN@COLING/ACL 2006, Sydney, Australia, pp. 108–117 (2006)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint: arXiv:1508.01991 (2015)
Chiu, J.P., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. In: TACL, vol. 4, pp. 357–370 (2016)
Akhundov, A., Trautmann, D., Groh, G.: Sequence labeling: a practical approach. arXiv preprint: arXiv:1808.03926 (2018)
Liu, P., Chang, S., Huang, X., Tang, J., Cheung, J.C.K.: Contextualized non-local neural networks for sequence learning. In: AAAI, vol. 33, no. 01, pp. 6762–6769 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Huangliang, K., Li, X., Yin, T., Peng, B., Zhang, H. (2023). Self-adapted Positional Encoding in the Transformer Encoder for Named Entity Recognition. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14259. Springer, Cham. https://doi.org/10.1007/978-3-031-44223-0_43
Download citation
DOI: https://doi.org/10.1007/978-3-031-44223-0_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44222-3
Online ISBN: 978-3-031-44223-0
eBook Packages: Computer ScienceComputer Science (R0)