Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Free access
Just Accepted

Exploring Thematic Diversity in Classical Chinese Poetry: A Novel Dataset and a BERT-enhanced Ensemble Learning Approach

Online AM: 07 August 2024 Publication History

Abstract

Classical Chinese poetry, as an essential aspect of cultural heritage, exhibits rich theme diversity often overlooked in natural language processing research. To address this gap, we aim to explore the classification of thematic categories within this literary domain. We curate a dataset of 2,918 annotated poems spanning seven common themes and propose a BERT-based ensemble learning approach for effective classification. Although this method integrates existing models, it achieves an accuracy and F1 score of over 72% in the 7-class task, surpassing established baselines, and providing a baseline for future research. The experimental findings reveal the effectiveness of ensemble strategies in improving individual base model performance and highlight the potential of the MLP-based ensemble technique. The study contributes to a deeper understanding of thematic categories and textual features in classical Chinese poetry, and offers an automated classification system for classical Chinese poems.

References

[1]
Zhou Ai, Zhang Yijia, Wei Hao, and Lu Mingyu. 2021. LDA-Transformer Model in Chinese Poetry Authorship Attribution. In Information Retrieval, Hongfei Lin, Min Zhang, and Liang Pang (Eds.). Springer International Publishing, Cham, 59–73. https://doi.org/10.1007/978-3-030-88189-4_5
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.0473
[3]
Leo Breiman. 1996. Bagging predictors. Machine learning 24 (1996), 123–140. https://doi.org/10.1007/BF00058655
[4]
J Briskilal and CN Subalalitha. 2022. An ensemble model for classifying idioms and literal texts using BERT and RoBERTa. Information Processing & Management 59, 1 (2022), 102756. https://doi.org/10.1016/j.ipm.2021.102756
[5]
Zong-Qi Cai. 2018. How to read Chinese poetry: A guided anthology. Columbia University Press.
[6]
Xinru Cui, Jinxu Qi, Hao Tan, and Feng Chen. 2017. Comparison of ancient and modern Chinese based on complex weighted networks. PLOS ONE 12, 11 (11 2017), 1–14. https://doi.org/10.1371/journal.pone.0187854
[7]
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu. 2020. Revisiting Pre-Trained Models for Chinese Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. Association for Computational Linguistics, Online, 657–668. https://www.aclweb.org/anthology/2020.findings-emnlp.58
[8]
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, and Ziqing Yang. 2021. Pre-Training With Whole Word Masking for Chinese BERT. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (nov 2021), 3504–3514. https://doi.org/10.1109/TASLP.2021.3124365
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
[10]
Hantian Ding, Jinrui Yang, Yuqian Deng, Hongming Zhang, and Dan Roth. 2022. Towards Open-Domain Topic Classification. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations, Hannaneh Hajishirzi, Qiang Ning, and Avi Sil (Eds.). Association for Computational Linguistics, Hybrid: Seattle, Washington + Online, 90–98. https://doi.org/10.18653/v1/2022.naacl-demo.10
[11]
Cunxiao Du, Zhaozheng Chen, Fuli Feng, Lei Zhu, Tian Gan, and Liqiang Nie. 2019. Explicit Interaction Model towards Text Classification. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence (Honolulu, Hawaii, USA) (AAAI’19/IAAI’19/EAAI’19). AAAI Press, Article 780, 8 pages. https://doi.org/10.1609/aaai.v33i01.33016359
[12]
Yoav Freund and Robert E Schapire. 1997. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. System Sci. 55, 1 (1997), 119–139. https://doi.org/10.1006/jcss.1997.1504
[13]
M.A. Ganaie, Minghui Hu, A.K. Malik, M. Tanveer, and P.N. Suganthan. 2022. Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence 115 (2022), 105151. https://doi.org/10.1016/j.engappai.2022.105151
[14]
Tian Gao, Shanliang Zhu, Jing Liu, Jun Shen, Jialie Shen, Shuguo Yang, and Pengcheng Xiong. 2021. A new context-aware approach for automatic Chinese poetry generation. Knowledge-Based Systems 232 (2021), 107409. https://doi.org/10.1016/j.knosys.2021.107409
[15]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
[16]
Jingrui Hou and Ping Wang. 2023. Assemble the shallow or integrate a deep? Toward a lightweight solution for glyph-aware Chinese text classification. Plos one 18, 7 (2023), e0289204. https://doi.org/10.1371/journal.pone.0289204
[17]
Yufang Hou and Anette Frank. 2015. Analyzing Sentiment in Classical Chinese Poetry. In Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH). Association for Computational Linguistics, Beijing, China, 15–24. https://doi.org/10.18653/v1/W15-3703
[18]
Renfen Hu and Yuchen Zhu. 2015. Automatic classification of tang poetry themes. Acta Scientiarum Naturalium Universitatis Pekinensis 2 (2015), 262–268. https://doi.org/10.13209/j.0479-8023.2015.039
[19]
Michael I. Jordan. 1997. Chapter 25 - Serial Order: A Parallel Distributed Processing Approach. In Neural-Network Models of Cognition, John W. Donahoe and Vivian Packard Dorsel (Eds.). Advances in Psychology, Vol. 121. North-Holland, 471–495. https://doi.org/10.1016/S0166-4115(97)80111-2
[20]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1746–1751. https://doi.org/10.3115/v1/D14-1181
[21]
Sabina Knight. 2012. Chinese literature: A very short introduction. Oxford University Press.
[22]
Rudolf Kruse, Sanaz Mostaghim, Christian Borgelt, Christian Braune, and Matthias Steinbrecher. 2022. Multi-layer Perceptrons. Springer International Publishing, Cham, 53–124. https://doi.org/10.1007/978-3-030-42227-1_5
[23]
Qian Li, Hao Peng, Jianxin Li, Congying Xia, Renyu Yang, Lichao Sun, Philip S. Yu, and Lifang He. 2022. A Survey on Text Classification: From Traditional to Deep Learning. ACM Transactions on Intelligent Systems and Technology 13, 2, Article 31 (apr 2022), 41 pages. https://doi.org/10.1145/3495162
[24]
Ahmed Cherif Mazari, Nesrine Boudoukhani, and Abdelhamid Djeffal. 2023. BERT-based ensemble learning for multi-aspect hate speech detection. Cluster Computing (2023), 1–15. https://doi.org/10.1007/s10586-022-03956-x
[25]
Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, and Jianfeng Gao. 2021. Deep Learning–Based Text Classification: A Comprehensive Review. Comput. Surveys 54, 3, Article 62 (apr 2021), 40 pages. https://doi.org/10.1145/3439726
[26]
Aytuğ Onan. 2018. An ensemble scheme based on language function analysis and feature engineering for text genre classification. Journal of Information Science 44, 1 (2018), 28–47. https://doi.org/10.1177/01655515166779
[27]
Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and S. S. Iyengar. 2018. A Survey on Deep Learning: Algorithms, Techniques, and Applications. Comput. Surveys 51, 5, Article 92 (sep 2018), 36 pages. https://doi.org/10.1145/3234150
[28]
Jason Protass. 2021. The poetry demon: song-Dynasty monks on verse and the way. University of Hawaii Press.
[29]
Yizhan Shao, Tong Shao, Minghao Wang, Peng Wang, and Jie Gao. 2021. A Sentiment and Style Controllable Approach for Chinese Poetry Generation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM ’21). Association for Computing Machinery, New York, NY, USA, 4784–4788. https://doi.org/10.1145/3459637.3481964
[30]
Yabo Shen, Yong Ma, Chunguo Li, Shidang Li, Mingliang Gu, Chaojin Zhang, Yun Jin, and Yingli Shen. 2019. Sentiment Analysis for Tang Poetry Based on Imagery Aided and Classifier Fusion. In Artificial Intelligence for Communications and Networks, Shuai Han, Liang Ye, and Weixiao Meng (Eds.). Springer International Publishing, Cham, 283–290.
[31]
Chang Su, Shupin Liu, and Chalian Luo. 2023. MISC: A Multimodal Approach for Sentiment Classification of Classical Chinese Poetry. In Advanced Intelligent Computing Technology and Applications, De-Shuang Huang, Prashan Premaratne, Baohua Jin, Boyang Qu, Kang-Hyun Jo, and Abir Hussain (Eds.). Springer Nature Singapore, Singapore, 432–442. https://doi.org/10.1007/978-981-99-4752-2_36
[32]
Yongrui Tang, Xumei Wang, Peng Qi, and Yan Sun. 2020. A Neural Network-based Sentiment Analysis Scheme for Tang Poetry. In 2020 International Wireless Communications and Mobile Computing (IWCMC). 1783–1788. https://doi.org/10.1109/IWCMC48107.2020.9148542
[33]
Huishuang Tian, Kexin Yang, Dayiheng Liu, and Jiancheng Lv. 2021. AnchiBERT: A pre-trained model for ancient Chinese language understanding and generation. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
[34]
Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM) 3, 3 (2007), 1–13. https://doi.org/10.4018/jdwm.2007070101
[35]
University of Illinois Library. [n. d.]. Copyright Reference Guide: Public Domain. https://guides.library.illinois.edu/c.php?g=46308&p=294952. Accessed: January 4, 2024.
[36]
Qing Wang, Xiumei Wang, Weiping Liu, and Guannan Chen. 2021. Predicting the Chinese Poetry Prosodic Based on a Developed BERT Model. In 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE). 583–586. https://doi.org/10.1109/ICBAIE52039.2021.9390025
[37]
Yousheng Wang. 2010. On principles of compilation and thoughts of poetic in Selection of the Song Poems. Journal of Central South University (Social Science) 16, 02 (2010), 109–115.
[38]
Chen Xia and Zhou Jing. 2018. English translation of classical Chinese poetry. Orbis Litterarum 73, 4 (2018), 361–373. https://doi.org/10.1111/oli.12184
[39]
Jing Xuan, Zhongshi He, Liangyan Li, He Weidong, Fei Guo, Hang Zhang, and Qiong Wu. 2018. Brain-oriented Cconvolutional Neural Network Computer Style Recognition of Classical Chinese Poetry. Neuroquantology 16 (2018), 107–115. Issue 4. https://doi.org/10.14704/nq.2018.16.4.1214
[40]
Rui Yan, Han Jiang, Mirella Lapata, Shou-De Lin, Xueqiang Lv, and Xiaoming Li. 2013. I, Poet: Automatic Chinese Poetry Composition through a Generative Summarization Framework under Constrained Optimization. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (Beijing, China) (IJCAI ’13). AAAI Press, 2197–2203.
[41]
Chyan Yang and Hsien-Jyh Liao. 2010. Using the Robots. txt and Robots Meta tags to implement online copyright and a related amendment. Library hi tech 28, 1 (2010), 94–106. https://doi.org/10.1108/07378831011026715
[42]
Cheng Yang, Maosong Sun, Xiaoyuan Yi, and Wenhao Li. 2018. Stylistic Chinese Poetry Generation via Unsupervised Style Disentanglement. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3960–3969. https://doi.org/10.18653/v1/D18-1430
[43]
Xin Ye, Hongxia Dai, Lu-an Dong, and Xinyue Wang. 2021. Multi-view ensemble learning method for microblog sentiment classification. Expert Systems with Applications 166 (2021), 113987. https://doi.org/10.1016/j.eswa.2020.113987
[44]
Wen-Chao Yeh, Yung-Chun Chang, Yu-Hsuan Li, and Wei-Chieh Chang. 2019. Rhyming Knowledge-Aware Deep Neural Network for Chinese Poetry Generation. In 2019 International Conference on Machine Learning and Cybernetics (ICMLC). 1–6. https://doi.org/10.1109/ICMLC48188.2019.8949208
[45]
Hang Zhang and Zhongshi He. 2020. Brain-oriented Cconvolutional Neural Network Computer Style Recognition of Classical Chinese Poetry. Modern Computer 2 (2020), 12–17,23. https://doi.org/10.3969/j.issn.1007-1423.2020.02.003
[46]
Wei Zhang, Hao Wang, Min Song, and Sanhong Deng. 2023. A method of constructing a fine-grained sentiment lexicon for the humanities computing of classical chinese poetry. Neural Computing and Applications 35, 3 (2023), 2325–2346. https://doi.org/10.1007/s00521-022-07690-8
[47]
Xingxing Zhang and Mirella Lapata. 2014. Chinese Poetry Generation with Recurrent Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 670–680. https://doi.org/10.3115/v1/D14-1074
[48]
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-Level Convolutional Networks for Text Classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS’15). MIT Press, Cambridge, MA, USA, 649–657.
[49]
Jiaqi Zhao, Ting Bai, Yuting Wei, and Bin Wu. 2022. PoetryBERT: Pre-training with Sememe Knowledge for Classical Chinese Poetry. In International Conference on Data Mining and Big Data. Springer, 369–384. https://doi.org/10.1007/978-981-19-8991-9_26
[50]
Zhe Zhao, Hui Chen, Jinbin Zhang, Xin Zhao, Tao Liu, Wei Lu, Xi Chen, Haotang Deng, Qi Ju, and Xiaoyong Du. 2019. UER: An Open-Source Toolkit for Pre-training Models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, Sebastian Padó and Ruihong Huang (Eds.). Association for Computational Linguistics, Hong Kong, China, 241–246. https://doi.org/10.18653/v1/D19-3041
[51]
Zhe Zhao, Yudong Li, Cheng Hou, Jing Zhao, Rong Tian, Weijie Liu, Yiren Chen, Ningyuan Sun, Haoyan Liu, Weiquan Mao, et al. 2023. TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Danushka Bollegala, Ruihong Huang, and Alan Ritter (Eds.). Association for Computational Linguistics, Toronto, Canada, 217–225. https://doi.org/10.18653/v1/2023.acl-demo.20
[52]
Zhenjie Zhao and Xiaojuan Ma. 2020. ShadowPlay2.5D: A 360-Degree Video Authoring Tool for Immersive Appreciation of Classical Chinese Poetry. Journal on Computing and Cultural Heritage 13, 1, Article 5 (feb 2020), 20 pages. https://doi.org/10.1145/3352590
[53]
Ai Zhou, Yijia Zhang, and Mingyu Lu. 2022. C-Transformer model in Chinese poetry authorship attribution. International Journal of Innovative Computing, Information and Control 18 (2022), 901–916. https://doi.org/10.24507/ijicic.18.03.901
[54]
Zhi-Hua Zhou. 2015. Ensemble Learning. Springer US, Boston, MA, 411–416. https://doi.org/10.1007/978-1-4899-7488-4_293
[55]
Zhenying Zhuo and Xiaohua Liu. 2010. Translation of Chinese Ancient Poetry (Annotated Bilingual Edition with Pinyin). JiNan University Press.

Index Terms

  1. Exploring Thematic Diversity in Classical Chinese Poetry: A Novel Dataset and a BERT-enhanced Ensemble Learning Approach

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Journal on Computing and Cultural Heritage
      Journal on Computing and Cultural Heritage  Just Accepted
      EISSN:1556-4711
      Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Online AM: 07 August 2024
      Accepted: 04 June 2024
      Revised: 30 April 2024
      Received: 26 May 2023

      Check for updates

      Author Tags

      1. Thematic Classification
      2. Classical Chinese Poetry
      3. Ensemble Learning
      4. Pre-trained Language Model

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 154
        Total Downloads
      • Downloads (Last 12 months)154
      • Downloads (Last 6 weeks)36
      Reflects downloads up to 19 Nov 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media