Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3436369.3436462acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccprConference Proceedingsconference-collections
research-article

Chinese Word Segmentation Based on Bi-GRU Integrating Dictionary Information

Published: 11 January 2021 Publication History

Abstract

Chinese word segmentation (CWS) is an important and essential pre-processing step for Chinese language processing tasks. To date, various models based on deep neural networks have been extensively applied in CWS. Most of them learn from large scale labeled data. However, these models typically lack the capability of processing rare words and OOV words. In this paper, we use character embedding and bigram embedding as the inputs of Bi-GRU model and construct a feature vector to capture dictionary information of characters. we use the multi-head attention mechanism to get the weight of different features which is the inputs of a parallel Bi-GRU network. To evaluate the performance of the proposed model, we conducted experiments on PKU and MSR datasets. The experimental results of datasets show that our model achieves the-state-of-art performance compared to several baselines.

References

[1]
Xue N. Chinese word segmentation as character tagging. Computational Linguistics and Chinese Language Processing, 8(1): 29--48. 2003.
[2]
Stephen A. Della Pietra, Stephen A. Della Pietra, Stephen A. Della Pietra. A maximum entropy approach to natural language processing[M]. MIT Press, 1996.
[3]
Fuchun Peng, Fangfang Feng, and Andrew McCallum. Chinese segmentation and new word detection using conditional random fields. In Proceedings of the 20th international conference on Computational Linguistics, page 562. Association for Computational Linguistics.2004.
[4]
Lafferty, J.; McCallum, A.; and Pereira, F. Conditional random fields: Probabilistic models for segmentingand labeling sequence data.2001.
[5]
Chung J, Gulcehre C, Cho K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling [J]. arXiv: Neural and Evolutionary Computing, 2014.
[6]
Yao Y, Huang Z. Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation[J]. 2016.
[7]
Wang C, Xu B. Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation[J]. 2017.
[8]
Deng X, Sun Y. An improved embedding matching model for Chinese word segmentation[C]// 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD). 2018.
[9]
Chen, X.; Qiu, X.; Zhu, C.; and Huang, X. Gated recursive neural network for chinese word segmentation. In ACL (1), 1744--1753.2015a.
[10]
Chen, X.; Qiu, X.; Zhu, C.; Liu, P.; and Huang, X. Long short-term memory neural networks for chinese word segmentation. In EMNLP.2015b.
[11]
Chen, X.; Shi, Z.; Qiu, X.; and Huang, X. Adversarial multi-criteria learning for chinese word segmentation.In ACL.2017.
[12]
Cai D, Zhao H, Zhang Z, et al. Fast and Accurate Neural Word Segmentation for Chinese[J]. 2017.
[13]
Cai D, and Zhao H.Neural word segmentation learning for chinese. In ACL.2016.
[14]
Yang J, Zhang Y and Dong F.Neural word segmentation with rich pretraining. In ACL.2017.
[15]
Bao Z, Li S, Xu W. Neural domain adaptation for Chinese word segmentation[C]// 2017 International Conference on Asian Language Processing (IALP). IEEE, 2017.
[16]
Liu J, Wu F, Wu C, et al. Neural Chinese word segmentation with dictionary [J]. Neurocomputing, 2019, 338(APR.21): 46--54.
[17]
Zhou J, Wang J, Liu G. Multiple Character Embeddings for Chinese Word Segmentation[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. 2019.
[18]
Duan S, Zhao H. (2019). Attention Is All You Need for Chinese Word Segmentation.
[19]
Ma J, Ganchev K, Weiss D. State-of-the-art Chinese Word Segmentation with Bi-LSTMs[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018.
[20]
Siqi Huang, Bo Yan, Dongmei Zhang. A Detection Algorithm of Malicious Domain Based on Deep Learning and Multi-Head Attention Mechanism[C].
[21]
Zhang M, Zhang Y, Fu G. Transition-Based Neural Word Segmentation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016.

Index Terms

  1. Chinese Word Segmentation Based on Bi-GRU Integrating Dictionary Information

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICCPR '20: Proceedings of the 2020 9th International Conference on Computing and Pattern Recognition
    October 2020
    552 pages
    ISBN:9781450387835
    DOI:10.1145/3436369
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Beijing University of Technology

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 January 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Chinese word segmentation
    2. bigram embedding
    3. deep learning
    4. feature vector
    5. multi-head attention

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICCPR 2020

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 42
      Total Downloads
    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 24 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media