Nothing Special   »   [go: up one dir, main page]

skip to main content
10.3115/980845.980889dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

An experiment in hybrid dictionary and statistical sentence alignment

Published: 10 August 1998 Publication History

Abstract

The task of aligning sentences in parallel corpora of two languages has been well studied using pure statistical or linguistic models. We developed a linguistic method based on lexical matching with a bilingual dictionary and two statistical methods based on sentence length ratios and sentence offset probabilities. This paper seeks to further our knowledge of the alignment task by comparing the performance of the alignment models when used separately and together, i.e. as a hybrid system. Our results show that for our English-Japanese corpus of newspaper articles, the hybrid system using lexical matching and sentence length ratios outperforms the pure methods.

References

[1]
P. Brown, J. Lai, and R. Mercer. 1991. Aligning sentences in parallel corpora. In 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, USA.
[2]
S. Chen. 1993. Aligning sentences in bilingual corpora using lexical information. 31st Annual Meeting of the Association of Computational Linguistics, Ohio, USA, 22--26 June.
[3]
K. Church. 1993. Char_align: a program for aligning parallel texts at the character level. In 31st Annual Meeting of the Association for Computational Linguistics, Ohio, USA, pages 1--8, 22--26 June.
[4]
N. Collier, H. Hirakawa, and A. Kumano. 1998a. Creating a noisy parallel corpus from newswire articles using multi-lingual information retrieval. Trans. of Information Processing Society of Japan (to appear).
[5]
N. Collier, H. Hirakawa, and A. Kumano. 1998b. Machine translation vs. dictionary term translation - a comparison for English-Japanese news article alignment. In Proceedings of COLING-ACL'98, University of Montreal, Canada, 10th August.
[6]
P. Fung and D. Wu. 1994. Statistical augmentation of a Chinese machine readable dictionary. In Second Annual Workshop on Very Large Corpora, pages 69--85, August.
[7]
W. Gale and K. Church. 1991. A program for aligning sentences in bilingual corpora. In Proceedings of the 29th Annual Conference of the Association for Computational Linguistics (ACL-91), Berkeley, California, pages 177--184.
[8]
W. Gale and K. Church. 1993. A program for aligning sentences in a bilingual corpora. Computational Linguistics, 19(1):75--102.
[9]
M. Kay and M. Röshcheisen. 1993. Text-translation alignment. Computational Linguistics, 19:121--142.
[10]
T. Utsuro, H. Ikeda, M. Yamane, Y. Matsumoto, and N. Nagao. 1994. Bilingual text matching using bilingual dictionary and statistics. In COLING-94, 15th International Conference, Kyoto, Japan, volume 2, August 5--9.
[11]
D. Wu. 1994. Aligning a parallel English-Chinese corpus statistically with lexical criteria. In 32nd Annual Meeting of the Association for Computational Linguistics, New Mexico, USA, pages 80--87, June 27--30.

Cited By

View all
  • (2019)Has There Been a Revolution in Machine Translation?Machine Translation10.1023/A:101314092128016:1(1-19)Online publication date: 1-Jun-2019
  • (2006)Automatic extraction of bilingual word pairs using inductive chain learning in various languagesInformation Processing and Management: an International Journal10.1016/j.ipm.2005.11.00442:5(1294-1315)Online publication date: 1-Sep-2006

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1
August 1998
768 pages

Sponsors

  • Government of Canada
  • Université de Montréal

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 10 August 1998

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)7
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Has There Been a Revolution in Machine Translation?Machine Translation10.1023/A:101314092128016:1(1-19)Online publication date: 1-Jun-2019
  • (2006)Automatic extraction of bilingual word pairs using inductive chain learning in various languagesInformation Processing and Management: an International Journal10.1016/j.ipm.2005.11.00442:5(1294-1315)Online publication date: 1-Sep-2006

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media