Nothing Special   »   [go: up one dir, main page]

Skip to main content

Improve Chinese Spelling Check by Reevaluation

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13282))

Included in the following conference series:

  • 1726 Accesses

Abstract

Chinese Spelling Check (CSC) aims to detect and correct the spelling errors in Chinese. Most Chinese spelling errors are misused semantically, phonetically or graphically similar characters. Previous state-of-the-art works on the CSC task pursue transitions from misspelled sentences to correct sentences directly. However, the spelling errors, especially the continuous incorrect characters, usually confuse the meaning of the semantic context. It is difficult to make correct modifications for CSC models based on the error contextual information. To address this issue, we propose a simple but effective pipeline for CSC by searching the most appropriate candidate sentences as the original correct sentence. Specifically, candidate sentences are generated based on possible error characters with the confusion set. Then we reevaluate the candidate sentences to find the best in terms of character probabilities and similarity compared to the original error characters. Besides, we extend the widely used confusion set (The code and data are available at https://github.com/zuoyecihua/CSC.). Simply applying the confusion set as a filter will bring large performance improvement. The experimental results show that our approach outperforms previous methods and performs well on bi-gram errors.

This work is supported by the National Natural Science Foundation of China (No. 61672276, No. 51975294).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/BYVoid/OpenCC.

References

  1. Afli, H., Qiu, Z., Way, A., Sheridan, P.: Using SMT for OCR error correction of historical texts. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 962–966 (2016)

    Google Scholar 

  2. Belinkov, Y., Bisk, Y.: Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173 (2017)

  3. Chang, T.H., Chen, H.C., Yang, C.H.: Introduction to a proofreading tool for Chinese spelling check task of SIGHAN-8. In: Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pp. 50–55 (2015)

    Google Scholar 

  4. Cheng, X., et al.: SpellGCN: incorporating phonological and visual similarities into language models for Chinese spelling check. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 871–881. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-main.81, https://aclanthology.org/2020.acl-main.81

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423

  6. Dey, R., Salem, F.M.: Gate-variants of gated recurrent unit (GRU) neural networks. In: 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1597–1600. IEEE (2017)

    Google Scholar 

  7. Gao, J., et al.: A large scale ranker-based system for search query spelling correction (2010)

    Google Scholar 

  8. Guo, Z., Ni, Y., Wang, K., Zhu, W., Xie, G.: Global attention decoder for Chinese spelling error correction. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1419–1428 (2021)

    Google Scholar 

  9. Hong, Y., Yu, X., He, N., Liu, N., Liu, J.: FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. In: Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pp. 160–169 (2019)

    Google Scholar 

  10. Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., Wilson, A.G.: Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018)

  11. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. OpenReview.net (2017). https://openreview.net/forum?id=SJU4ayYgl

  12. Liu, C.L., Lai, M.H., Chuang, Y.H., Lee, C.Y.: Visually and phonologically similar characters in incorrect simplified Chinese words. In: Coling 2010: Posters, pp. 739–747 (2010)

    Google Scholar 

  13. Liu, X., Cheng, K., Luo, Y., Duh, K., Matsumoto, Y.: A hybrid Chinese spelling correction using language model and statistical machine translation with reranking. In: Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, pp. 54–58 (2013)

    Google Scholar 

  14. Loshchilov, I., Hutter, F.: Fixing weight decay regularization in Adam (2018)

    Google Scholar 

  15. Martins, B., Silva, M.J.: Spelling correction for search engine queries. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 372–383. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30228-5_33

    Chapter  Google Scholar 

  16. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8026–8037 (2019)

    Google Scholar 

  17. Sakaguchi, K., Mizumoto, T., Komachi, M., Matsumoto, Y.: Joint English spelling error correction and POS tagging for language learners writing. In: Proceedings of COLING 2012, pp. 2357–2374 (2012)

    Google Scholar 

  18. Wang, B., Che, W., Wu, D., Wang, S., Hu, G., Liu, T.: Dynamic connected networks for chinese spelling check. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2437–2446 (2021)

    Google Scholar 

  19. Wang, D., Song, Y., Li, J., Han, J., Zhang, H.: A hybrid approach to automatic corpus generation for Chinese spelling check. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2517–2527 (2018)

    Google Scholar 

  20. Wang, D., Tay, Y., Zhong, L.: Confusionset-guided pointer networks for Chinese spelling check. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5780–5785 (2019)

    Google Scholar 

  21. Wang, Y.R., Liao, Y.F.: Word vector/conditional random field-based Chinese spelling error detection for SIGHAN-2015 evaluation. In: Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pp. 46–49 (2015)

    Google Scholar 

  22. Wu, S.H., Liu, C.L., Lee, L.H.: Chinese spelling check evaluation at SIGHAN bake-off 2013. In: Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, pp. 35–42 (2013)

    Google Scholar 

  23. Xiong, J., Zhang, Q., Zhang, S., Hou, J., Cheng, X.: Hanspeller: a unified framework for Chinese spelling correction. In: International Journal of Computational Linguistics and Chinese Language Processing, vol. 20, No. 1, June 2015-Special Issue on Chinese as a Foreign Language (2015)

    Google Scholar 

  24. Xu, H.D., et al.: Read, listen, and see: leveraging multimodal information helps Chinese spell checking. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 716–728. Association for Computational Linguistics, Online, August 2021. https://doi.org/10.18653/v1/2021.findings-acl.64, https://aclanthology.org/2021.findings-acl.64

  25. Zhang, R., et al.: Correcting Chinese spelling errors with phonetic pre-training. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2250–2261 (2021)

    Google Scholar 

  26. Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 882–890. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-main.82, https://aclanthology.org/2020.acl-main.82

  27. Zou, H., et al.: On embedding sequence correlations in attributed network for semi-supervised node classification. Inf. Sci. 562, 385–397 (2021)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Shang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S., Shang, L. (2022). Improve Chinese Spelling Check by Reevaluation. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13282. Springer, Cham. https://doi.org/10.1007/978-3-031-05981-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05981-0_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05980-3

  • Online ISBN: 978-3-031-05981-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics