Abstract
Bidirectional Encoder Representations from Transformers (BERT) is a pre-training model that uses the encoder component of a bidirectional transformer and converts an input sentence or input sentence pair into word enbeddings. The performance of various natural language processing systems has been greatly improved by BERT. However, for a real task, it is necessary to consider how BERT is used based on the type of task. The standerd method for document classification by BERT is to treat the word embedding of special token [CLS] as a feature vector of the document, and to fine-tune the entire model of the classifier, including a pre-training model. However, after normalizing each the feature vector consisting of the mean vector of word embeddings outputted by BERT for the document, and the feature vectors according to the bag-of-words model, we create a vector concatenating them. Our proposed method involves using the concatenated vector as the feature vector of the document.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
The language model is also a type of pre-training model.
- 3.
Scaled Dot-Product Attention.
- 4.
References
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: ACL 2018, pp. 328–339 (2018)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014, pp. 1746–1751 (2014)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI 2015, pp. 2267–2273 (2015)
Peters, M., et al.: Deep contextualized word representations. In: NAACL 2018, pp. 2227–2237 (2018)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training. Technical report, OpenAI (2018)
Shinnou, H., Asahara, M., Kanako Komiya, M.S.: nwjc2vec: word embedding data constructed from NINJAL web Japanese corpus. Nat. Lang. Process. 24(5), 705–720 (2017). (in Japanese)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tanaka, H., Shinnou, H., Cao, R., Bai, J., Ma, W. (2020). Document Classification by Word Embeddings of BERT. In: Nguyen, LM., Phan, XH., Hasida, K., Tojo, S. (eds) Computational Linguistics. PACLING 2019. Communications in Computer and Information Science, vol 1215. Springer, Singapore. https://doi.org/10.1007/978-981-15-6168-9_13
Download citation
DOI: https://doi.org/10.1007/978-981-15-6168-9_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6167-2
Online ISBN: 978-981-15-6168-9
eBook Packages: Computer ScienceComputer Science (R0)