Nothing Special   »   [go: up one dir, main page]

Skip to main content

Sentence Alignment Method Based on Maximum Entropy Model Using Anchor Sentences

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (NLP-NABD 2016, CCL 2016)

Abstract

The paper proposes a sentence alignment method based on maximum entropy model using anchor sentences to align ancient and modern Chinese sentences in historical classics. The method selects the sentence pairs with the same phrases at the beginning or the end of the sentence or with the same time phrases as anchor sentence pairs, which are employed to divide the paragraph into several sections. Then, the sentences in each section are aligned using dynamic programming algorithm according to the entropy calculated by maximum entropy model. The maximum entropy model employs improved Chinese co-occurrence character feature, length feature and sentence alignment mode feature. The Chinese co-occurrence characters feature is improved by giving different weights to characters in different position based on the contribution to align sentences. In the experiment performed on ShiJi, the precision and recall of the proposed method reaches 95.9 % and 95.6 % respectively, which outperforms other sentence alignment methods significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Sima, Q.: (Han dynasty): Shiji. Zhong Hua Book Company, Beijing (2006)

    Google Scholar 

  2. Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Proceedings of 29th Annual Conference of the Association for Computational Linguistics, ACL 1991, pp. 169–176, Stroudsburg, PA, USA (1991)

    Google Scholar 

  3. Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. In: Proceedings of 29th Annual Conference of the Association for Computational Linguistics, MIT, MA, USA, vol. 19(1), pp. 75–102 (1993)

    Google Scholar 

  4. Kay, M., Roscheisen, M.: Text-translation alignment. Comput. Linguist. 19(1), 121–142 (1993)

    Google Scholar 

  5. Chen, S.F.: Aligning sentences in bilingual corpora using lexical information. In: Proceedings of 31st Annual Meeting of the Association for Computational Linguistics, pp. 9–16. ACL, Stroudsburg (1993)

    Google Scholar 

  6. Simard, M., Foster, G.F., Isabelle, P.: Using cognates to align sentences in bilingual corpora. In: Proceedings of the 1993 Conference of the Centre for Advanced Studies on Collaborative research, pp. 1071–1082. IBM Press, Indianapolis (1993)

    Google Scholar 

  7. Wu, D.K.: Aligning a parallel English-Chinese corpus statistically with lexical criteria. In: Proceedings of the 32nd Annual Conference of the Association for Computational Linguistics, pp. 80–87. ACL, Stroudsburg, USA (1994)

    Google Scholar 

  8. Liu, Y., Wang, N.: Research on classical and modern Chinese sentence alignment. Comput. Appl. Softw. 30(11), 127–130 (2013)

    Google Scholar 

  9. Lin, Z.: Alignment for Ancient-Modern Chinese Bi-text. Beijing University of Posts and Telecommunications, Beijing (2007)

    Google Scholar 

  10. Zhou, Y.Q.: Maximum Entropy Method and its Applications in Natural Language Processing. Fudan University, Shanghai (2004)

    Google Scholar 

  11. Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. In: Proceedings of the 34th Annual Conference of the Association for Computational Linguistics, pp. 39–71. ACL, Stroudsburg (1996)

    Google Scholar 

  12. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, pp. 323–369. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  13. Tian, S.W., Turgun, I., Yu, L., et al.: Chinese-Uyhur sentence alignment based on hybrid strategy. computer. Science 37(4), 215–218 (2010)

    Google Scholar 

  14. Wang, X., Ren, F.: Chinese-Japanese clause alignment. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9042, pp. 400–412. Springer, Heidelberg (2005). doi:10.1007/978-3-540-30586-6_43

    Chapter  Google Scholar 

  15. Zhu, G.J., Guo, D.W., Liu, X.: Probability Theory and Mathematical Statistics. National Defence Industry Press, Beijing (2010)

    Google Scholar 

  16. Watson, B.: Records of the Grand Historian: Qin Dynasty. Chinese University of Hong Kong Press, Hong Kong (1993)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61402068) and Support Program of Outstanding Young Scholar in Liaoning Universities. (No. LJQ2015004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Che .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Che, C., Guo, W., Zhang, J. (2016). Sentence Alignment Method Based on Maximum Entropy Model Using Anchor Sentences. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47674-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47673-5

  • Online ISBN: 978-3-319-47674-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics