Abstract
The paper proposes a sentence alignment method based on maximum entropy model using anchor sentences to align ancient and modern Chinese sentences in historical classics. The method selects the sentence pairs with the same phrases at the beginning or the end of the sentence or with the same time phrases as anchor sentence pairs, which are employed to divide the paragraph into several sections. Then, the sentences in each section are aligned using dynamic programming algorithm according to the entropy calculated by maximum entropy model. The maximum entropy model employs improved Chinese co-occurrence character feature, length feature and sentence alignment mode feature. The Chinese co-occurrence characters feature is improved by giving different weights to characters in different position based on the contribution to align sentences. In the experiment performed on ShiJi, the precision and recall of the proposed method reaches 95.9 % and 95.6 % respectively, which outperforms other sentence alignment methods significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sima, Q.: (Han dynasty): Shiji. Zhong Hua Book Company, Beijing (2006)
Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Proceedings of 29th Annual Conference of the Association for Computational Linguistics, ACL 1991, pp. 169–176, Stroudsburg, PA, USA (1991)
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. In: Proceedings of 29th Annual Conference of the Association for Computational Linguistics, MIT, MA, USA, vol. 19(1), pp. 75–102 (1993)
Kay, M., Roscheisen, M.: Text-translation alignment. Comput. Linguist. 19(1), 121–142 (1993)
Chen, S.F.: Aligning sentences in bilingual corpora using lexical information. In: Proceedings of 31st Annual Meeting of the Association for Computational Linguistics, pp. 9–16. ACL, Stroudsburg (1993)
Simard, M., Foster, G.F., Isabelle, P.: Using cognates to align sentences in bilingual corpora. In: Proceedings of the 1993 Conference of the Centre for Advanced Studies on Collaborative research, pp. 1071–1082. IBM Press, Indianapolis (1993)
Wu, D.K.: Aligning a parallel English-Chinese corpus statistically with lexical criteria. In: Proceedings of the 32nd Annual Conference of the Association for Computational Linguistics, pp. 80–87. ACL, Stroudsburg, USA (1994)
Liu, Y., Wang, N.: Research on classical and modern Chinese sentence alignment. Comput. Appl. Softw. 30(11), 127–130 (2013)
Lin, Z.: Alignment for Ancient-Modern Chinese Bi-text. Beijing University of Posts and Telecommunications, Beijing (2007)
Zhou, Y.Q.: Maximum Entropy Method and its Applications in Natural Language Processing. Fudan University, Shanghai (2004)
Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. In: Proceedings of the 34th Annual Conference of the Association for Computational Linguistics, pp. 39–71. ACL, Stroudsburg (1996)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, pp. 323–369. MIT Press, Cambridge (2001)
Tian, S.W., Turgun, I., Yu, L., et al.: Chinese-Uyhur sentence alignment based on hybrid strategy. computer. Science 37(4), 215–218 (2010)
Wang, X., Ren, F.: Chinese-Japanese clause alignment. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9042, pp. 400–412. Springer, Heidelberg (2005). doi:10.1007/978-3-540-30586-6_43
Zhu, G.J., Guo, D.W., Liu, X.: Probability Theory and Mathematical Statistics. National Defence Industry Press, Beijing (2010)
Watson, B.: Records of the Grand Historian: Qin Dynasty. Chinese University of Hong Kong Press, Hong Kong (1993)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 61402068) and Support Program of Outstanding Young Scholar in Liaoning Universities. (No. LJQ2015004).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Che, C., Guo, W., Zhang, J. (2016). Sentence Alignment Method Based on Maximum Entropy Model Using Anchor Sentences. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-47674-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47673-5
Online ISBN: 978-3-319-47674-2
eBook Packages: Computer ScienceComputer Science (R0)