Nothing Special   »   [go: up one dir, main page]

skip to main content
10.3115/976909.979656dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

A portable algorithm for mapping bitext correspondence

Published: 07 July 1997 Publication History

Abstract

The first step in most empirical work in multilingual NLP is to construct maps of the correspondence between texts and their translations (bitext maps). The Smooth Injective Map Recognizer (SIMR) algorithm presented here is a generic pattern recognition algorithm that is particularly well-suited to mapping bitext correspondence. SIMR is faster and significantly more accurate than other algorithms in the literature. The algorithm is robust enough to use on noisy texts, such as those resulting from OCR input, and on translations that are not very literal. SIMR encapsulates its language-specific heuristics, so that it can be ported to any language pair with a minimal effort.

References

[1]
P. F. Brown, J. C. Lai & R. L. Mercer, "Aligning Sentences in Parallel Corpora," Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, 1991.
[2]
P. F. Brown, S. Della Pietra, V. Della Pietra, & R. Mercer, "The Mathematics of Statistical Machine Translation: Parameter Estimation", Computational Linguistics 19:2, 1993.
[3]
R. Catizone, G. Russell & S. Warwick "Deriving Translation Data from Bilingual Texts," Proceedings of the First International Lexical Acquisition Workshop, Detroit, MI, 1993.
[4]
S. Chen, "Aligning Sentences in Bilingual Corpora Using Lexical Information," Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, OH, 1993.
[5]
K. W. Church, "Char_align: A Program for Aligning Parallel Texts at the Character Level," Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, OH, 1993.
[6]
I. Dagan, K. Church, & W. Gale, "Robust Word Alignment for Machine Aided Translation," Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus, OH, 1993.
[7]
F. Debili & E. Sammouda "Appariement des Phrases de Textes Bilingues," Proceedings of the 14th International Conference on Computational Linguistics, Nantes, France, 1992.
[8]
G. Foster, P. Isabelle & P. Plamondon, "Word Completion: A First Step Toward Target-Text Mediated IMT," Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, Denmark, 1996.
[9]
P. Fung, "Compiling Bilingual Lexicon Entries from a Non-Parallel English-Chinese Corpus," Proceedings of the Third Workshop on Very Large Corpora, Boston, MA, 1995.
[10]
W. Gale & K. W. Church, "A Program for Aligning Sentences in Bilingual Corpora," Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, 1991a.
[11]
W. Gale & K. W. Church, "Identifying Word Correspondences in Parallel Texts," Proceedings of the DARPA SNL Workshop, 1991b.
[12]
B. Harris, "Bi-Text, a New Concept in Translation Theory," Language Monthly #54, 1988.
[13]
M. Kay & M. Rööscheisen "Text-Translation Alignment," Computational Linguistics 19:1, 1993.
[14]
E. Macklovitch, "Peut-on verifier automatiquement la coherence terminologique?" Proceedings of the IVes Journées scientifiques, Lexicommatique et Dictionnairiques, organized by AUPELF-UREF, Lyon, France, 1995.
[15]
I. D. Melamed "Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons," Proceedings of the Third Workshop on Very Large Corpora, Boston, MA, 1995.
[16]
I. D. Melamed, "A Geometric Approach to Mapping Bitext Correspondence," Proceedings of the First Conference on Empirical Methods in Natural Language Processing (EMNLP'96), Philadelphia, PA, 1996a.
[17]
I. D. Melamed "Automatic Detection of Omissions in Translations," Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, Denmark, 1996b.
[18]
I. D. Melamed, "Automatic Construction of Clean Broad-Coverage Translation Lexicons," Proceedings of the Conference of the Association for Machine Translation in the Americas, Montreal, Canada, 1996c.
[19]
I. D. Melamed, "A Word-to-Word Model of Translational Equivalence," Proceedings of the 35th Conference of the Association for Computational Linguistics, Madrid, Spain, 1997. (in this volume)
[20]
P. Resnik & I. D. Melamed, "Semi-Automatic Acquisition of Domain-Specific Translation Lexicons," Proceedings of the 7th ACL Conference on Applied Natural Language Processing, Washington, DC, 1997.
[21]
P. Resnik, "Evaluating Multilingual Gisting of Web Pages," UMIACS-TR-97-39, University of Maryland, 1997.
[22]
S. Sato & M. Nagao, "Toward Memory-Based Translation," Proceedings of the 13th International Conference on Computational Linguistics, 1990.
[23]
S. Sato, "CTM: An Example-Based Translation Aid System," Proceedings of the 14th International Conference on Computational Linguistics, Nantes, France, 1992.
[24]
SIGIR Workshop on Cross-linguistic Multilingual Information Retrieval, Zurich, 1996.
[25]
M. Simard, G. F. Foster & P. Isabelle, "Using Cognates to Align Sentences in Bilingual Corpora," in Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, Montreal, Canada, 1992.
[26]
M. Simard & P. Plamondon, "Bilingual Sentence Alignment: Balancing Robustness and Accuracy," Proceedings of the Conference of the Association for Machine Translation in the Americas, Montreal, Canada, 1996.
[27]
R. V. V. Vidal, Applied Simulated Annealing, Springer-Verlag, Heidelberg, Germany, 1993.

Cited By

View all
  • (2012)Extraction of bilingual cognates from wikipediaProceedings of the 10th international conference on Computational Processing of the Portuguese Language10.1007/978-3-642-28885-2_7(63-72)Online publication date: 17-Apr-2012
  • (2010)Improving corpus comparability for bilingual lexicon extraction from comparable corporaProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873854(644-652)Online publication date: 23-Aug-2010
  • (2009)Chinese-Uyghur sentence alignmentProceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora10.5555/1690339.1690350(38-45)Online publication date: 6-Aug-2009
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '98/EACL '98: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
July 1997
543 pages

Sponsors

  • Directorate General XIII (European Commission)
  • Universidad Complutense de Madrid
  • Universidad Autónoma de Madrid
  • Universidad Nacional de Educación a Distancia
  • Universidad Politécnica de Madrid

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 07 July 1997

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)14
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2012)Extraction of bilingual cognates from wikipediaProceedings of the 10th international conference on Computational Processing of the Portuguese Language10.1007/978-3-642-28885-2_7(63-72)Online publication date: 17-Apr-2012
  • (2010)Improving corpus comparability for bilingual lexicon extraction from comparable corporaProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873854(644-652)Online publication date: 23-Aug-2010
  • (2009)Chinese-Uyghur sentence alignmentProceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora10.5555/1690339.1690350(38-45)Online publication date: 6-Aug-2009
  • (2008)Learning Spanish-Galician translation equivalents using a comparable corpus and a bilingual dictionaryProceedings of the 9th international conference on Computational linguistics and intelligent text processing10.5555/1787578.1787622(423-433)Online publication date: 17-Feb-2008
  • (2006)Multilingual collocation extractionProceedings of the Workshop on Multilingual Language Resources and Interoperability10.5555/1613162.1613168(40-49)Online publication date: 23-Jul-2006
  • (2001)Automatic verb classification using multilingual resourcesProceedings of the 2001 workshop on Computational Natural Language Learning - Volume 710.5555/1117822.1455625Online publication date: 6-Jul-2001
  • (2001)Empirically estimating order constraints for content planning in generationProceedings of the 39th Annual Meeting on Association for Computational Linguistics10.3115/1073012.1073035(172-179)Online publication date: 6-Jul-2001
  • (2000)Chinese-Korean word alignment based on linguistic comparisonProceedings of the 38th Annual Meeting on Association for Computational Linguistics10.3115/1075218.1075268(392-399)Online publication date: 3-Oct-2000
  • (1999)Bitext maps and alignment via pattern recognitionComputational Linguistics10.5555/973215.97321825:1(107-130)Online publication date: 1-Mar-1999
  • (1999)Encoding a parallel corpus for automatic terminology extractionProceedings of the ninth conference on European chapter of the Association for Computational Linguistics10.3115/977035.977083(275-276)Online publication date: 8-Jun-1999
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media