research-article

Matching Graph, a Method for Extracting Parallel Information from Comparable Corpora

Authors:

Somayeh Bakhshaei,

Reza Safabakhsh,

Shahram KhadiviAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 19, Issue 1

Article No.: 11, Pages 1 - 29

https://doi.org/10.1145/3329713

Published: 25 July 2019 Publication History

Abstract

Comparable corpora are valuable alternatives for the expensive parallel corpora. They comprise informative parallel fragments that are useful resources for different natural language processing tasks. In this work, a generative model is proposed for efficient extraction of parallel fragments from a pair of comparable documents. The core of the proposed model is a graph called the Matching Graph. The ability of the Matching Graph to be trained on a small initial seed makes it a proper model for language pairs suffering from the scarce resource problem. Experiments show that the Matching Graph performs significantly better than other recently published models. According to the experiments on English-Persian and Arabic-Persian language pairs, the extracted parallel fragments can be used instead of parallel data for training statistical machine translation systems. Results reveal that the extracted fragments in the best case are able to retrieve about 90% of the information of a statistical machine translation system that is trained on a parallel corpus. Moreover, it is shown that using the extracted fragments as additional information for training statistical machine translation systems leads to an improvement of about 2% for English-Persian and about 1% for Arabic-Persian translation on BLEU score.

References

[1]

H. Afli, L. Barrault, and H. Schwenk. 2014. Multimodal comparable corpora for machine translation. In Proceedings of the 7th International Workshop on Building and Using Comparable Corpora, Building Resources for Machine Translation Research, Co-located with LREC 2014. 22--27.

[2]

H. Afli, L. Barrault, and H. Schwenk. 2016. Building and using multimodal comparable corpora for machine translation. Natural Language Engineering 22, 4 (2016), 603--625.

[3]

A. Aker, Y. Feng, and R. Gaizauskas. 2012. Automatic bilingual phrase extraction from comparable corpora. In Proceedings of the 24th International Conference on Computational Linguistics. 23--32.

[4]

D. Andrade, T. Matsuzaki, and J. Tsujii. 2012. Statistical extraction and comparison of pivot words for bilingual lexicon extension. ACM Transactions on Asian Language Information Processing 11, 2 (June 2012), 1--31.

Digital Library

[5]

M. Apidianaki, N. Ljubesi, and D. Fiser. 2013. Cross-lingual WSD for translation extraction from comparable corpora. In Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Co-located with ACL 2013. Association for Computational Linguistics, Sofia Bulgaria, 1--10.

[6]

S. Bakhshaei, S. Khadivi, N. Riahi, and H. Sameti. 2010. A study to find influential parameters on a Farsi-English statistical machine translation system. In 5th International Symposium on Telecommunications (IST’10). IEEE, 985--991.

[7]

S. Bakhshaei, R. Safabakhsh, and S. Khadivi. 2019. Extracting parallel fragments from comparable documents using a generative model. Computer Speech 8 Language 53 (2019), 25--42.

[8]

Y. Bengio, A. Courville, and P. Vincent. 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1798--1828.

Digital Library

[9]

S. Bergsma and B. Van Durme. 2011. Learning bilingual lexicons using the visual similarity of labeled web images. In Proceedings of IJCAI -International Joint Conference on Artificial Intelligence. 1764--1769.

Digital Library

[10]

J. Boyd-Graber and D. M. Blei. 2009. Multilingual topic models for unaligned text. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. 75--82.

Digital Library

[11]

P. Brown, V. Pietra, S. Pietra, and R. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 2 (1993), 263--311.

Digital Library

[12]

G. Celuex and J. Diebolt. 1985. The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problems. Computational Statistics 2 (1985), 73--82.

[13]

C. Chu, T. Nakazawa, and S. Kurohashi. 2013. Accurate parallel fragment extraction from quasi-comparable corpora using alignment model and translation lexicon. In Proceedings of IJCNLP. 1144--1150.

[14]

C. Chu, T. Nakazawa, and S. Kurohashi. 2014. Improving statistical machine translation accuracy using bilingual lexicon extraction with paraphrases. In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing. 2014.

[15]

C. Chu, T. Nakazawa, and S. Kurohashi. 2015. Integrated parallel sentence and fragment extraction from comparable corpora: A case study on Chinese--Japanese Wikipedia. ACM Transactions on Asian Language Information Processing 15, 2 (Dec. 2015), 1--22.

Digital Library

[16]

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, (Aug. 2011), 2493--2537.

Digital Library

[17]

H. Daume III and J. Jagarlamudi. 2011. Domain adaptation for machine translation by mining unseen words. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 407--412.

Digital Library

[18]

A. De Gispert and J. B. Marino. 2006. Catalan-English statistical machine translation without parallel corpus: Bridging through Spanish. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06). 65--68.

[19]

H. Dejean, E. Gaussier, and F. Sadat. 2002. Bilingual terminology extraction: An approach based on a multilingual thesaurus applicable to comparable corpora. In Proceedings of the 19th International Conference on Computational Linguistics COLING. 218--224.

[20]

M. Diab and S. Finch. 2000. A statistical word-level translation model for comparable corpora. Content-Based Multimedia Information Access 2 (2000), 1500--1508.

Digital Library

[21]

M. Dong, Y. Liu, H. Luan, M. Sun, T. Izuha, and D. Zhang. 2015. Iterative learning of parallel lexicons and phrases from non-parallel corpora. In Proceedings of the 24th International Conference on Artificial Intelligence. 1250--1256.

Digital Library

[22]

A. El Kholy, N. Habash, G. Leusch, E. Matusov, and H. Sawaf. 2013. Language independent connectivity strength features for phrase pivot statistical machine translation. In Proceedings of the 51st Annual Meeting of the ACL. 412--418.

[23]

M. Erdmann, K. Nakayama, T. Hara, and S. Nishio. 2008. An approach for extracting bilingual terminology from wikipedia. In Proceedings of the International Conference on Database Systems for Advanced Applications. 380--392.

Digital Library

[24]

M. Faruqui and C. Dyer. 2014. Improving vector space word representations using multilingual correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 462--471.

[25]

D. Fiser and N. Ljubesic. 2011. Bilingual lexicon extraction from comparable corpora for closely related languages. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’11). 125--131.

[26]

X. Fu, W. Wei, S. Lu, Z. Chen, and B. Xu. 2013. Phrase-based parallel fragments extraction from comparable corpora. In Proceedings of IJCNLP. 972--976.

[27]

P. Fung and P. Cheung. 2004. Multi-level bootstrapping for extracting parallel sentences from a quasi-comparable corpus. In Proceedings of the 20th International Conference on Computational Linguistics. 1051.

Digital Library

[28]

P. Fung and K. McKeown. 1997. Finding terminology translations from non-parallel corpora. In Proceedings of the 5th Annual Workshop on Very Large Corpora. 192--202.

[29]

P. Fung and L. Y. Yee. 1998. An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of the 17th International Conference on Computational Linguistics. 414--420.

Digital Library

[30]

N. Garera, C. Callison-Burch, and D. Yarowsky. 2009. Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences. In Proceedings of the 13th Conference on Computational Natural Language Learning. 129--137.

Digital Library

[31]

S. Gouws, Y. Bengio, and G. Corrado. 2015. BilBOWA: Fast bilingual distributed representations without word alignments. In JMLR Workshop and Conference Proceedings of the 32nd International Conference on Machine Learning. 748--756.

Digital Library

[32]

A. Haghighi, P. Liang, T. Berg-Kirkpatrick, and D. Klein. 2008. Learning bilingual lexicons from monolingual corpora. In Proceedings of ACL. 771--779.

[33]

K. M. Hammouda and M. S. Kamel. 2004. Document similarity using a phrase indexing graph model. Knowledge and Information Systems 6, 6 (2004), 710--727.

Digital Library

[34]

D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16, 12 (2004), 2639--2664.

Digital Library

[35]

A. Hazem, E. Morin, and S. P. Saldarriaga. 2011. Bilingual lexicon extraction from comparable corpora as metasearch. In Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web (BUCC’11). Association for Computational Linguistics, Stroudsburg, PA, 35--43.

Digital Library

[36]

S. Hewavitharana and S. Vogel. 2013. Extracting parallel phrases from comparable data. Building and Using Comparable Corpora. Springer, Berlin, Heidelberg, 191–204.

Digital Library

[37]

A. Irvine and C. Callison-Burch. 2013. Combining bilingual and comparable corpora for low resource machine translation. In Proceedings of the 8th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Sofia, Bulgaria, 262--270.

[38]

A. Irvine and C. Callison-Burch. 2013. Supervised bilingual lexicon induction with multiple monolingual signals. In Proceedings of HLT-NAACL. 518--523.

[39]

G. Jeh and J. Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 538--543.

Digital Library

[40]

H. Kaji, S. Tamamura, and D. Erdenebat. 2008. Automatic construction of a Japanese-Chinese dictionary via English. In Proceedings of LREC, Marrakech (Morocco). 699--706.

[41]

A. Kilgarriff. 2000. WordNet: An electronic lexical database. JSTOR 76 (2000), 706--708.

[42]

A. Klementiev, A. Irvine, C. Callison-Burch, and D. Yarowsky. 2012. Toward statistical machine translation without parallel corpora. In Proceedings of the 13th Conference of the European Chapter of the ACL. 130--140.

Digital Library

[43]

P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, and Others. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. 177--180.

Digital Library

[44]

P. Koehn and K. Knight. 2002. Learning a translation lexicon from monolingual corpora. In Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition. 9--16.

Digital Library

[45]

G. Kontonatsios, I. Korkontzelos, J. Tsujii, and S. Ananiadou. 2014. Combining string and context similarity for bilingual term alignment from comparable corpora. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, Doha, Qatar, 1701--1712.

[46]

G. Kontonatsios, I. Korkontzelos, J. Tsujii, and S. Ananiadou. 2014. Using a random forest classifier to compile bilingual dictionaries of technical terms from comparable corpora. In Proceedings of the 14th Conference of the European Chapter of the ACL. Association for Computational Linguistics, Gothenburg, Sweden, 111--116.

[47]

S. Kumar, F. J. Och, and W. Macherey. 2007. Improving word alignment with bridge languages. In Proceedings of EMNLP-CoNLL. 42--50.

[48]

F. Laws, L. Michelbacher, B. Dorow, C. Scheible, U. Heid, and H. Schutze. 2010. A linguistically grounded graph model for bilingual lexicon extraction. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 614--622.

Digital Library

[49]

B. Li and E. Gaussier. 2010. Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In Proceedings of the 23rd International Conference on Computational Linguistics. 644--652.

Digital Library

[50]

B. Li, E. Gaussier, and A. Aizawa. 2011. Clustering comparable corpora for bilingual lexicon extraction. In Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies, Portland, Oregon, USA. Association for Computational Linguistics, 473--478.

Digital Library

[51]

J. Li, J. Li, X. Fu, M. A. Masud, and J. Z. Huang. 2016. Learning distributed word representation with multi-contextual mixed embedding. Knowledge-Based Systems 106 (2016), 220--230.

Digital Library

[52]

A. Linard, B. Daille, and E. Morin. 2015. Attempting to bypass alignment from comparable corpora via pivot language. In Proceedings of ACL-IJCNLP 2015. 32--37.

[53]

C. Liu, Y. Liu, H. Luan, M. Sun, and H. Yu. 2016. Agreement-based learning of parallel lexicons and phrases from non-parallel corpora. In Proceedings of the 54th Annual Meeting of the ACL. 1024--1033.

[54]

G. S. Mann and D. Yarowsky. 2001. Multipath translation lexicon induction via bridge languages. In Proceedings of the 2nd Meeting of the North American Chapter of the ACL on Language Technologies. 1--8.

Digital Library

[55]

A. K. McCallum. 2002. Mallet: A machine learning for language toolkit.

[56]

T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv:hep-ph/1301.3781

[57]

T. Mikolov, Q. V. Le, and I. Sutskever. 2013. Exploiting similarities among languages for machine translation. arXiv:hep-ph/1309.4168

[58]

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems (2013), 3111--3119.

Digital Library

[59]

D. Mimno, H. M. Wallach, J. Naradowsky, D. A. Smith, and A. McCallum. 2009. Polylingual topic models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 880--889.

Digital Library

[60]

E. Minkov and W. Cohen. 2012. Graph based similarity measures for synonym extraction from parsed text. In Workshop Proceedings of TextGraphs-7 on Graph-bBased Methods for Natural Language Processing. 20--24.

Digital Library

[61]

D. Munteanu and D. Marcu. 2005. Improving machine translation performance by exploiting non-parallel corpora. Computational Linguistics 31, 4 (2005), 477--504.

Digital Library

[62]

D. S. Munteanu and D. Marcu. 2006. Extracting parallel sub-sentential fragments from non-parallel corpora. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL. 81--88.

Digital Library

[63]

P. Muthukrishnan, D. Radev, and Q. Mei. 2011. Simultaneous similarity learning and feature-weight learning for document clustering. In Proceedings of Textgraphs-6: Graph-Based Methods for Natural Language Processing. 42--50.

Digital Library

[64]

A. H. Nasution, Y. Murakami, and T. Ishida. 2017. A generalized constraint approach to bilingual dictionary induction for low-resource language families. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 17, 2 (Nov. 2017), 1--29.

Digital Library

[65]

R. Navigli and P. S. Ponzetto. 2012. BabelNetXplorer: A platform for multilingual lexical knowledge base access and exploration. In Proceedings of the 21st International Conference on World Wide Web. 393--396.

Digital Library

[66]

R. M. Neal. 2000. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9, 2 (2000), 249--265.

[67]

J. Niehues and A. Waibel. 2011. Using Wikipedia to translate domain-specific terms in SMT. In Proceedings of the 7th IWSLT, Marcello Federico, Mei-Yuh Hwang, Margit Rödder, and Sebastian Stüker (Eds.). 230--237.

[68]

S. Nielsen. 2000. The stochastic EM algorithm: Estimation and asymptotic results. Bernoulli 6, 3 (2000), 457--489.

[69]

F. J. Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on ACL. 160--167.

Digital Library

[70]

P. G. Otero and I. G. Lopez. 2010. Wikipedia as multilingual source of comparable corpora. In Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, LREC. 21--25.

[71]

S. Pal, P. Pakray, A. Gelbukh, and J. van Genabith. 2015. Mining parallel resources for machine translation from comparable corpora. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. 534--544.

[72]

S. Pal, P. Pakray, S. Naskar, and Others. 2014. Automatic building and using parallel resources for SMT from comparable corpora. In Proceedings of the 3rd Workshop on Hybrid Approaches to Translation (HyTra)@ EACL. 48--57.

[73]

K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 311--318.

Digital Library

[74]

C. Quirk, R. Udupa, and A. Menezes. 2007. Generative models of noisy translations with applications to parallel fragment extraction. In Proceedings of MT Summit XI. 337--384.

[75]

R. Rahimi, A. Shakery, J. Dadashkarimi, M. Ariannezhad, M. Dehghani, and H. N. Esfahani. 2016. Building a multi-domain comparable corpus using a learning to rank method. Natural Language Engineering 22, 4 (2016), 627--653.

[76]

R. Rapp. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 33rd Annual Meeting on ACL. 320--322.

Digital Library

[77]

R. Rapp. 1999. Automatic identification of word translations from unrelated English and German corpora. In Proceedings of the 37th Annual Meeting of the ACL. 519--526.

Digital Library

[78]

R. Rapp, S. Sharoff, and P. Zweigenbaum. 2016. Recent advances in machine translation using comparable corpora. Natural Language Engineering 22, 4 (2016), 501--516.

[79]

R. Rapp and M. Zock. 2009. Automatic dictionary expansion using non-parallel corpora. Advances in Data Analysis, Data Handling and Business Intelligence (2009), 317--325.

[80]

M. Razmara, M. Siahbani, R. Haffari, and A. Sarkar. 2013. Graph propagation for paraphrasing out-of-vocabulary words in statistical machine translation. In Proceedings of ACL (1). 1105--1115.

[81]

A. Saluja, H. Hassan, K. Toutanova, and C. Quirk. 2014. Graph-based semi-supervised learning of translation models from monolingual data. In Proceedings of the 52nd Annual Meeting of the ACL. Association for Computational Linguistics, Baltimore, Maryland, 676--686.

[82]

A. Saluja and J. Navratil. 2013. Graph-based unsupervised learning of word similarities using heterogeneous feature types. In Proceedings of TextGraphs@ EMNLP. Citeseer, 29--38.

[83]

C. Schafer and D. Yarowsky. 2002. Inducing translation lexicons via diverse similarity measures and bridge languages. In Proceedings of the 6th Conference on Natural Language Learning. 1--7.

Digital Library

[84]

M. Shamsfard, A. Hesabi, H. Fadaei, N. Mansoory, A. Famian, S. Bagherbeigi, E. Fekri, M. Monshizadeh, and S. M. Assi. 2010. Semi automatic development of farsnet; the Persian wordnet. In Proceedings of 5th Global WordNet Conference.

[85]

D. Shezaf and A. Rappoport. 2010. Bilingual lexicon generation using non-aligned signatures. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 98--107.

Digital Library

[86]

B. Snyder, R. Barzilay, and K. Knight. 2010. A statistical model for lost language decipherment. In Proceedings of the 48th Annual Meeting of the ACL. 1048--1057.

Digital Library

[87]

S. Soderland, O. Etzioni, S. Weld, D., M. Skinner, J. Bilmes, and Others. 2009. Compiling a massive, multilingual dictionary via probabilistic inference. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP-Volume 1. 262--270.

Digital Library

[88]

A. Tamura, T. Watanabe, and E. Sumita. 2012. Bilingual lexicon extraction from comparable corpora using label propagation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Jeju Island, Korea, 24--36.

Digital Library

[89]

L. Tang, T. Y. Wang, Y. B. Chen, and T. Y. Wang. 2015. Problems of alignment in Paraconc for a case study. In Proceedings of the 2014 Asia-Pacific Conference on Computer Science and Applications (CSAC’14). 57--62.

[90]

G. Tholpadi, C. Bhattacharyya, and S. Shevade. 2017. Corpus-based translation induction in Indian languages using auxiliary language corpora from Wikipedia. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16, 3 (March 2017), 1--25.

Digital Library

[91]

J. Turian, L. Ratinov, and Y. Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the ACL. 384--394.

Digital Library

[92]

I. Vulić, W. De Smet, and M. Moens. 2011. Identifying word translations from comparable corpora using latent topic models. In Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 479--484.

Digital Library

[93]

I. Vulić and M. Moens. 2012. Detecting highly confident word translations from comparable corpora without any prior knowledge. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL’12). Association for Computational Linguistics, Stroudsburg, PA, 449--459.

Digital Library

[94]

K. Wolk and K. Marasek. 2014. Building subject-aligned comparable corpora and mining it for truly parallel sentence pairs, Vol. 18. Elsevier, 126--132.

[95]

H. Wu and H. Wang. 2007. Pivot language approach for phrase-based statistical machine translation. Machine Translation 21, 3 (2007), 165--181.

Digital Library

[96]

L. Xiang, Y. Zhou, and C. Zong. 2013. An efficient framework to extract parallel units from comparable data. Natural Language Processing and Chinese Computing (2013), 151--163.

[97]

K. Yu and J. Tsujii. 2009. Bilingual dictionary extraction from wikipedia. In Proceedings of Machine Translation Summit XII. 121--124.

[98]

C. Zhang and T. Zhao. 2015. Bilingual lexicon extraction using locally weighted linear regression from comparable corpora. In Proceedings of the International Conference on Asian Language (IALP’15). 13--16.

[99]

Y. Zhang, M. M. Rahman, A. Braylan, B. Dang, H. Chang, H. Kim, Q. McNamara, A. Angert, E. Banner, V. Khetan, and T. McDonnell. 2016. Neural information retrieval: A literature review. (2016). arXiv:arXiv:1611.06792.

[100]

Z. Zhu, M. Li, L. Chen, and Z. Yang. 2013. Building comparable corpora based on bilingual LDA model. In Proceedings of ACL (2). 278--282.

[101]

Z. Zhu, X. Zeng, S. Zheng, X. Sun, S. Wang, and S. Weng. 2016. A mutual iterative enhancement model for simultaneous comparable corpora and bilingual lexicons construction. In Proceedings of the 9th Workshop on Building and Using Comparable Corpora, Co-located with LREC 2016. 27--33.

Index Terms

Matching Graph, a Method for Extracting Parallel Information from Comparable Corpora
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Dictionaries
    2. Retrieval tasks and goals
      1. Information extraction
2. Mathematics of computing
  1. Probability and statistics
    1. Nonparametric statistics

Recommendations

Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns ...
Extracting Translation Equivalents from Bilingual Comparable Corpora

An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexicon---which is used to bridge contexts in ...
Extracting translations from comparable corpora for Cross-Language Information Retrieval using the language modeling framework

Proposing a language modeling method to extract translations from comparable corpora.Comparing two similarity functions for deriving bilingual word correlations.Improving translation quality by integrating co-occurrence relations into word ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 19, Issue 1

January 2020

345 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3338846

Editor:
Imed Zitouni
Microsoft, USA

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019

Accepted: 01 April 2019

Revised: 01 February 2019

Received: 01 July 2018

Published in TALLIP Volume 19, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
105
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents