Abstract
Chinese Spelling Check (CSC) aims to detect and correct the spelling errors in Chinese. Most Chinese spelling errors are misused semantically, phonetically or graphically similar characters. Previous state-of-the-art works on the CSC task pursue transitions from misspelled sentences to correct sentences directly. However, the spelling errors, especially the continuous incorrect characters, usually confuse the meaning of the semantic context. It is difficult to make correct modifications for CSC models based on the error contextual information. To address this issue, we propose a simple but effective pipeline for CSC by searching the most appropriate candidate sentences as the original correct sentence. Specifically, candidate sentences are generated based on possible error characters with the confusion set. Then we reevaluate the candidate sentences to find the best in terms of character probabilities and similarity compared to the original error characters. Besides, we extend the widely used confusion set (The code and data are available at https://github.com/zuoyecihua/CSC.). Simply applying the confusion set as a filter will bring large performance improvement. The experimental results show that our approach outperforms previous methods and performs well on bi-gram errors.
This work is supported by the National Natural Science Foundation of China (No. 61672276, No. 51975294).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Afli, H., Qiu, Z., Way, A., Sheridan, P.: Using SMT for OCR error correction of historical texts. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 962–966 (2016)
Belinkov, Y., Bisk, Y.: Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173 (2017)
Chang, T.H., Chen, H.C., Yang, C.H.: Introduction to a proofreading tool for Chinese spelling check task of SIGHAN-8. In: Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pp. 50–55 (2015)
Cheng, X., et al.: SpellGCN: incorporating phonological and visual similarities into language models for Chinese spelling check. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 871–881. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-main.81, https://aclanthology.org/2020.acl-main.81
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Dey, R., Salem, F.M.: Gate-variants of gated recurrent unit (GRU) neural networks. In: 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1597–1600. IEEE (2017)
Gao, J., et al.: A large scale ranker-based system for search query spelling correction (2010)
Guo, Z., Ni, Y., Wang, K., Zhu, W., Xie, G.: Global attention decoder for Chinese spelling error correction. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1419–1428 (2021)
Hong, Y., Yu, X., He, N., Liu, N., Liu, J.: FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. In: Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pp. 160–169 (2019)
Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., Wilson, A.G.: Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. OpenReview.net (2017). https://openreview.net/forum?id=SJU4ayYgl
Liu, C.L., Lai, M.H., Chuang, Y.H., Lee, C.Y.: Visually and phonologically similar characters in incorrect simplified Chinese words. In: Coling 2010: Posters, pp. 739–747 (2010)
Liu, X., Cheng, K., Luo, Y., Duh, K., Matsumoto, Y.: A hybrid Chinese spelling correction using language model and statistical machine translation with reranking. In: Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, pp. 54–58 (2013)
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in Adam (2018)
Martins, B., Silva, M.J.: Spelling correction for search engine queries. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 372–383. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30228-5_33
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8026–8037 (2019)
Sakaguchi, K., Mizumoto, T., Komachi, M., Matsumoto, Y.: Joint English spelling error correction and POS tagging for language learners writing. In: Proceedings of COLING 2012, pp. 2357–2374 (2012)
Wang, B., Che, W., Wu, D., Wang, S., Hu, G., Liu, T.: Dynamic connected networks for chinese spelling check. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2437–2446 (2021)
Wang, D., Song, Y., Li, J., Han, J., Zhang, H.: A hybrid approach to automatic corpus generation for Chinese spelling check. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2517–2527 (2018)
Wang, D., Tay, Y., Zhong, L.: Confusionset-guided pointer networks for Chinese spelling check. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5780–5785 (2019)
Wang, Y.R., Liao, Y.F.: Word vector/conditional random field-based Chinese spelling error detection for SIGHAN-2015 evaluation. In: Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pp. 46–49 (2015)
Wu, S.H., Liu, C.L., Lee, L.H.: Chinese spelling check evaluation at SIGHAN bake-off 2013. In: Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, pp. 35–42 (2013)
Xiong, J., Zhang, Q., Zhang, S., Hou, J., Cheng, X.: Hanspeller: a unified framework for Chinese spelling correction. In: International Journal of Computational Linguistics and Chinese Language Processing, vol. 20, No. 1, June 2015-Special Issue on Chinese as a Foreign Language (2015)
Xu, H.D., et al.: Read, listen, and see: leveraging multimodal information helps Chinese spell checking. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 716–728. Association for Computational Linguistics, Online, August 2021. https://doi.org/10.18653/v1/2021.findings-acl.64, https://aclanthology.org/2021.findings-acl.64
Zhang, R., et al.: Correcting Chinese spelling errors with phonetic pre-training. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2250–2261 (2021)
Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 882–890. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-main.82, https://aclanthology.org/2020.acl-main.82
Zou, H., et al.: On embedding sequence correlations in attributed network for semi-supervised node classification. Inf. Sci. 562, 385–397 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, S., Shang, L. (2022). Improve Chinese Spelling Check by Reevaluation. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13282. Springer, Cham. https://doi.org/10.1007/978-3-031-05981-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-05981-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05980-3
Online ISBN: 978-3-031-05981-0
eBook Packages: Computer ScienceComputer Science (R0)