Nothing Special   »   [go: up one dir, main page]

skip to main content
10.3115/981574.981576dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Aligning sentences in bilingual corpora using lexical information

Published: 22 June 1993 Publication History

Abstract

In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ignore word identities and only consider sentence length (Brown et al., 1991b; Gale and Church, 1991). Our algorithm constructs a simple statistical word-to-word translation model on the fly during alignment. We find the alignment that maximizes the probability of generating the corpus with this translation model. We have achieved an error rate of approximately 0.4% on Canadian Hansard data, which is a significant improvement over previous results. The algorithm is language independent.

References

[1]
(Bellman, 1957) Richard Bellman. Dynamic Programming. Princeton University Press, Princeton N.J., 1957.
[2]
(Brown et al., 1990) Peter F. Brown, John Cocke, Stephen A. DellaPietra, Vincent J. DellaPietra, Frederick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. A statistical approach to machine translation. Computational Linguistics, 16(2):79--85, June 1990.
[3]
(Brown et al., 1991a) Peter F. Brown, Stephen A. DellaPietra, Vincent J. DellaPietra, and Robert L. Mercer. Word sense disambiguation using statistical methods. In Proceedings 29th Annual Meeting of the ACL, pages 265--270, Berkeley, CA, June 1991.
[4]
(Brown et al., 1991b) Peter F. Brown, Jennifer C. Lai, and Robert L. Mercer. Aligning sentences in parallel corpora. In Proceedings 29th Annual Meeting of the ACL, pages 169--176, Berkeley, CA, June 1991.
[5]
(Brown et al., 1993) Peter F. Brown, Stephen A. DellaPietra, Vincent J. DellaPietra, and Robert L. Mercer. The mathematics of machine translation: Parameter estimation. Computational Linguistics, 1993. To appear.
[6]
(Catizone et al., 1989) Roberta Catizone, Graham Russell, and Susan Warwick. Deriving translation data from bilingual texts. In Proceedings of the First International Acquisition Workshop, Detroit, Michigan, August 1989.
[7]
(Chen, 1993) Stanley F. Chen. Aligning sentences in bilingual corpora using lexical information. Technical Report TR-12-93, Harvard University, 1993.
[8]
(Dagan et al., 1991) Ido Dagan, Alon Itai, and Ulrike Schwall. Two languages are more informative than one. In Proceedings of the 29th Annual Meeting of the ACL, pages 130--137, 1991.
[9]
(Dempster et al., 1977) A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(B):1--38, 1977.
[10]
(Gale and Church, 1991) William A. Gale and Kenneth W. Church. A program for aligning sentences in bilingual corpora. In Proceedings of the 29th Annual Meeting of the ACL, Berkeley, California, June 1991.
[11]
(Gale et al., 1992) William A. Gale, Kenneth W. Church, and David Yarowsky. Using bilingual materials to develop word sense disambiguation methods. In Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, pages 101--112, Montréal, Canada, June 1992.
[12]
(Kay, 1991) Martin Kay. Text-translation alignment. In ACH/ALLC '91: "Making Connections" Conference Handbook, Tempe, Arizona, March 1991.
[13]
(Klavans and Tzoukermann, 1990) Judith Klavans and Evelyne Tzoukermann. The bicord system. In COLING-90, pages 174--179, Helsinki, Finland, August 1990.
[14]
(Sadler, 1989) V. Sadler. The Bilingual Knowledge Bank - A New Conceptual Basis for MT. BSO/Research, Utrecht, 1989.
[15]
(Warwick and Russell, 1990) Susan Warwick and Graham Russell. Bilingual concordancing and bilingual lexicography. In EURALEX 4th International Congress, Málaga, Spain, 1990.

Cited By

View all
  • (2023)North Korean Neural Machine Translation through South Korean ResourcesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360894722:9(1-22)Online publication date: 22-Sep-2023
  • (2019)Learning to align question and answer utterances in customer service conversation with recurrent pointer networksProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.3301134(134-141)Online publication date: 27-Jan-2019
  • (2019)Leveraging Additional Resources for Improving Statistical Machine Translation on Asian Low-Resource LanguagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/331493618:3(1-22)Online publication date: 17-Jun-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '93: Proceedings of the 31st annual meeting on Association for Computational Linguistics
June 1993
320 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 22 June 1993

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)9
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)North Korean Neural Machine Translation through South Korean ResourcesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360894722:9(1-22)Online publication date: 22-Sep-2023
  • (2019)Learning to align question and answer utterances in customer service conversation with recurrent pointer networksProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.3301134(134-141)Online publication date: 27-Jan-2019
  • (2019)Leveraging Additional Resources for Improving Statistical Machine Translation on Asian Low-Resource LanguagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/331493618:3(1-22)Online publication date: 17-Jun-2019
  • (2014)An Efficient Framework for Extracting Parallel Sentences from Non-Parallel CorporaFundamenta Informaticae10.5555/2608442.2608445130:2(179-199)Online publication date: 1-Apr-2014
  • (2011)Building a web-based parallel corpus and filtering out machine-translated textProceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web10.5555/2024236.2024259(136-144)Online publication date: 24-Jun-2011
  • (2011)An Expectation Maximization algorithm for textual unit alignmentProceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web10.5555/2024236.2024258(128-135)Online publication date: 24-Jun-2011
  • (2011)Extracting parallel paragraphs and sentences from english-persian translated documentsProceedings of the 7th Asia conference on Information Retrieval Technology10.1007/978-3-642-25631-8_52(574-583)Online publication date: 18-Dec-2011
  • (2010)Fast-ChampollionProceedings of the 23rd International Conference on Computational Linguistics: Posters10.5555/1944566.1944647(710-718)Online publication date: 23-Aug-2010
  • (2010)Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corporaProceedings of the 23rd International Conference on Computational Linguistics: Posters10.5555/1944566.1944576(81-89)Online publication date: 23-Aug-2010
  • (2010)Improving corpus comparability for bilingual lexicon extraction from comparable corporaProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873854(644-652)Online publication date: 23-Aug-2010
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media