Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Matching Graph, a Method for Extracting Parallel Information from Comparable Corpora

Published: 25 July 2019 Publication History

Abstract

Comparable corpora are valuable alternatives for the expensive parallel corpora. They comprise informative parallel fragments that are useful resources for different natural language processing tasks. In this work, a generative model is proposed for efficient extraction of parallel fragments from a pair of comparable documents. The core of the proposed model is a graph called the Matching Graph. The ability of the Matching Graph to be trained on a small initial seed makes it a proper model for language pairs suffering from the scarce resource problem. Experiments show that the Matching Graph performs significantly better than other recently published models. According to the experiments on English-Persian and Arabic-Persian language pairs, the extracted parallel fragments can be used instead of parallel data for training statistical machine translation systems. Results reveal that the extracted fragments in the best case are able to retrieve about 90% of the information of a statistical machine translation system that is trained on a parallel corpus. Moreover, it is shown that using the extracted fragments as additional information for training statistical machine translation systems leads to an improvement of about 2% for English-Persian and about 1% for Arabic-Persian translation on BLEU score.

References

[1]
H. Afli, L. Barrault, and H. Schwenk. 2014. Multimodal comparable corpora for machine translation. In Proceedings of the 7th International Workshop on Building and Using Comparable Corpora, Building Resources for Machine Translation Research, Co-located with LREC 2014. 22--27.
[2]
H. Afli, L. Barrault, and H. Schwenk. 2016. Building and using multimodal comparable corpora for machine translation. Natural Language Engineering 22, 4 (2016), 603--625.
[3]
A. Aker, Y. Feng, and R. Gaizauskas. 2012. Automatic bilingual phrase extraction from comparable corpora. In Proceedings of the 24th International Conference on Computational Linguistics. 23--32.
[4]
D. Andrade, T. Matsuzaki, and J. Tsujii. 2012. Statistical extraction and comparison of pivot words for bilingual lexicon extension. ACM Transactions on Asian Language Information Processing 11, 2 (June 2012), 1--31.
[5]
M. Apidianaki, N. Ljubesi, and D. Fiser. 2013. Cross-lingual WSD for translation extraction from comparable corpora. In Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Co-located with ACL 2013. Association for Computational Linguistics, Sofia Bulgaria, 1--10.
[6]
S. Bakhshaei, S. Khadivi, N. Riahi, and H. Sameti. 2010. A study to find influential parameters on a Farsi-English statistical machine translation system. In 5th International Symposium on Telecommunications (IST’10). IEEE, 985--991.
[7]
S. Bakhshaei, R. Safabakhsh, and S. Khadivi. 2019. Extracting parallel fragments from comparable documents using a generative model. Computer Speech 8 Language 53 (2019), 25--42.
[8]
Y. Bengio, A. Courville, and P. Vincent. 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1798--1828.
[9]
S. Bergsma and B. Van Durme. 2011. Learning bilingual lexicons using the visual similarity of labeled web images. In Proceedings of IJCAI -International Joint Conference on Artificial Intelligence. 1764--1769.
[10]
J. Boyd-Graber and D. M. Blei. 2009. Multilingual topic models for unaligned text. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. 75--82.
[11]
P. Brown, V. Pietra, S. Pietra, and R. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 2 (1993), 263--311.
[12]
G. Celuex and J. Diebolt. 1985. The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problems. Computational Statistics 2 (1985), 73--82.
[13]
C. Chu, T. Nakazawa, and S. Kurohashi. 2013. Accurate parallel fragment extraction from quasi-comparable corpora using alignment model and translation lexicon. In Proceedings of IJCNLP. 1144--1150.
[14]
C. Chu, T. Nakazawa, and S. Kurohashi. 2014. Improving statistical machine translation accuracy using bilingual lexicon extraction with paraphrases. In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing. 2014.
[15]
C. Chu, T. Nakazawa, and S. Kurohashi. 2015. Integrated parallel sentence and fragment extraction from comparable corpora: A case study on Chinese--Japanese Wikipedia. ACM Transactions on Asian Language Information Processing 15, 2 (Dec. 2015), 1--22.
[16]
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, (Aug. 2011), 2493--2537.
[17]
H. Daume III and J. Jagarlamudi. 2011. Domain adaptation for machine translation by mining unseen words. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 407--412.
[18]
A. De Gispert and J. B. Marino. 2006. Catalan-English statistical machine translation without parallel corpus: Bridging through Spanish. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06). 65--68.
[19]
H. Dejean, E. Gaussier, and F. Sadat. 2002. Bilingual terminology extraction: An approach based on a multilingual thesaurus applicable to comparable corpora. In Proceedings of the 19th International Conference on Computational Linguistics COLING. 218--224.
[20]
M. Diab and S. Finch. 2000. A statistical word-level translation model for comparable corpora. Content-Based Multimedia Information Access 2 (2000), 1500--1508.
[21]
M. Dong, Y. Liu, H. Luan, M. Sun, T. Izuha, and D. Zhang. 2015. Iterative learning of parallel lexicons and phrases from non-parallel corpora. In Proceedings of the 24th International Conference on Artificial Intelligence. 1250--1256.
[22]
A. El Kholy, N. Habash, G. Leusch, E. Matusov, and H. Sawaf. 2013. Language independent connectivity strength features for phrase pivot statistical machine translation. In Proceedings of the 51st Annual Meeting of the ACL. 412--418.
[23]
M. Erdmann, K. Nakayama, T. Hara, and S. Nishio. 2008. An approach for extracting bilingual terminology from wikipedia. In Proceedings of the International Conference on Database Systems for Advanced Applications. 380--392.
[24]
M. Faruqui and C. Dyer. 2014. Improving vector space word representations using multilingual correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 462--471.
[25]
D. Fiser and N. Ljubesic. 2011. Bilingual lexicon extraction from comparable corpora for closely related languages. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’11). 125--131.
[26]
X. Fu, W. Wei, S. Lu, Z. Chen, and B. Xu. 2013. Phrase-based parallel fragments extraction from comparable corpora. In Proceedings of IJCNLP. 972--976.
[27]
P. Fung and P. Cheung. 2004. Multi-level bootstrapping for extracting parallel sentences from a quasi-comparable corpus. In Proceedings of the 20th International Conference on Computational Linguistics. 1051.
[28]
P. Fung and K. McKeown. 1997. Finding terminology translations from non-parallel corpora. In Proceedings of the 5th Annual Workshop on Very Large Corpora. 192--202.
[29]
P. Fung and L. Y. Yee. 1998. An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of the 17th International Conference on Computational Linguistics. 414--420.
[30]
N. Garera, C. Callison-Burch, and D. Yarowsky. 2009. Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences. In Proceedings of the 13th Conference on Computational Natural Language Learning. 129--137.
[31]
S. Gouws, Y. Bengio, and G. Corrado. 2015. BilBOWA: Fast bilingual distributed representations without word alignments. In JMLR Workshop and Conference Proceedings of the 32nd International Conference on Machine Learning. 748--756.
[32]
A. Haghighi, P. Liang, T. Berg-Kirkpatrick, and D. Klein. 2008. Learning bilingual lexicons from monolingual corpora. In Proceedings of ACL. 771--779.
[33]
K. M. Hammouda and M. S. Kamel. 2004. Document similarity using a phrase indexing graph model. Knowledge and Information Systems 6, 6 (2004), 710--727.
[34]
D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16, 12 (2004), 2639--2664.
[35]
A. Hazem, E. Morin, and S. P. Saldarriaga. 2011. Bilingual lexicon extraction from comparable corpora as metasearch. In Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web (BUCC’11). Association for Computational Linguistics, Stroudsburg, PA, 35--43.
[36]
S. Hewavitharana and S. Vogel. 2013. Extracting parallel phrases from comparable data. Building and Using Comparable Corpora. Springer, Berlin, Heidelberg, 191–204.
[37]
A. Irvine and C. Callison-Burch. 2013. Combining bilingual and comparable corpora for low resource machine translation. In Proceedings of the 8th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Sofia, Bulgaria, 262--270.
[38]
A. Irvine and C. Callison-Burch. 2013. Supervised bilingual lexicon induction with multiple monolingual signals. In Proceedings of HLT-NAACL. 518--523.
[39]
G. Jeh and J. Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 538--543.
[40]
H. Kaji, S. Tamamura, and D. Erdenebat. 2008. Automatic construction of a Japanese-Chinese dictionary via English. In Proceedings of LREC, Marrakech (Morocco). 699--706.
[41]
A. Kilgarriff. 2000. WordNet: An electronic lexical database. JSTOR 76 (2000), 706--708.
[42]
A. Klementiev, A. Irvine, C. Callison-Burch, and D. Yarowsky. 2012. Toward statistical machine translation without parallel corpora. In Proceedings of the 13th Conference of the European Chapter of the ACL. 130--140.
[43]
P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, and Others. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. 177--180.
[44]
P. Koehn and K. Knight. 2002. Learning a translation lexicon from monolingual corpora. In Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition. 9--16.
[45]
G. Kontonatsios, I. Korkontzelos, J. Tsujii, and S. Ananiadou. 2014. Combining string and context similarity for bilingual term alignment from comparable corpora. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, Doha, Qatar, 1701--1712.
[46]
G. Kontonatsios, I. Korkontzelos, J. Tsujii, and S. Ananiadou. 2014. Using a random forest classifier to compile bilingual dictionaries of technical terms from comparable corpora. In Proceedings of the 14th Conference of the European Chapter of the ACL. Association for Computational Linguistics, Gothenburg, Sweden, 111--116.
[47]
S. Kumar, F. J. Och, and W. Macherey. 2007. Improving word alignment with bridge languages. In Proceedings of EMNLP-CoNLL. 42--50.
[48]
F. Laws, L. Michelbacher, B. Dorow, C. Scheible, U. Heid, and H. Schutze. 2010. A linguistically grounded graph model for bilingual lexicon extraction. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 614--622.
[49]
B. Li and E. Gaussier. 2010. Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In Proceedings of the 23rd International Conference on Computational Linguistics. 644--652.
[50]
B. Li, E. Gaussier, and A. Aizawa. 2011. Clustering comparable corpora for bilingual lexicon extraction. In Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies, Portland, Oregon, USA. Association for Computational Linguistics, 473--478.
[51]
J. Li, J. Li, X. Fu, M. A. Masud, and J. Z. Huang. 2016. Learning distributed word representation with multi-contextual mixed embedding. Knowledge-Based Systems 106 (2016), 220--230.
[52]
A. Linard, B. Daille, and E. Morin. 2015. Attempting to bypass alignment from comparable corpora via pivot language. In Proceedings of ACL-IJCNLP 2015. 32--37.
[53]
C. Liu, Y. Liu, H. Luan, M. Sun, and H. Yu. 2016. Agreement-based learning of parallel lexicons and phrases from non-parallel corpora. In Proceedings of the 54th Annual Meeting of the ACL. 1024--1033.
[54]
G. S. Mann and D. Yarowsky. 2001. Multipath translation lexicon induction via bridge languages. In Proceedings of the 2nd Meeting of the North American Chapter of the ACL on Language Technologies. 1--8.
[55]
A. K. McCallum. 2002. Mallet: A machine learning for language toolkit.
[56]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv:hep-ph/1301.3781
[57]
T. Mikolov, Q. V. Le, and I. Sutskever. 2013. Exploiting similarities among languages for machine translation. arXiv:hep-ph/1309.4168
[58]
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems (2013), 3111--3119.
[59]
D. Mimno, H. M. Wallach, J. Naradowsky, D. A. Smith, and A. McCallum. 2009. Polylingual topic models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 880--889.
[60]
E. Minkov and W. Cohen. 2012. Graph based similarity measures for synonym extraction from parsed text. In Workshop Proceedings of TextGraphs-7 on Graph-bBased Methods for Natural Language Processing. 20--24.
[61]
D. Munteanu and D. Marcu. 2005. Improving machine translation performance by exploiting non-parallel corpora. Computational Linguistics 31, 4 (2005), 477--504.
[62]
D. S. Munteanu and D. Marcu. 2006. Extracting parallel sub-sentential fragments from non-parallel corpora. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL. 81--88.
[63]
P. Muthukrishnan, D. Radev, and Q. Mei. 2011. Simultaneous similarity learning and feature-weight learning for document clustering. In Proceedings of Textgraphs-6: Graph-Based Methods for Natural Language Processing. 42--50.
[64]
A. H. Nasution, Y. Murakami, and T. Ishida. 2017. A generalized constraint approach to bilingual dictionary induction for low-resource language families. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 17, 2 (Nov. 2017), 1--29.
[65]
R. Navigli and P. S. Ponzetto. 2012. BabelNetXplorer: A platform for multilingual lexical knowledge base access and exploration. In Proceedings of the 21st International Conference on World Wide Web. 393--396.
[66]
R. M. Neal. 2000. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9, 2 (2000), 249--265.
[67]
J. Niehues and A. Waibel. 2011. Using Wikipedia to translate domain-specific terms in SMT. In Proceedings of the 7th IWSLT, Marcello Federico, Mei-Yuh Hwang, Margit Rödder, and Sebastian Stüker (Eds.). 230--237.
[68]
S. Nielsen. 2000. The stochastic EM algorithm: Estimation and asymptotic results. Bernoulli 6, 3 (2000), 457--489.
[69]
F. J. Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on ACL. 160--167.
[70]
P. G. Otero and I. G. Lopez. 2010. Wikipedia as multilingual source of comparable corpora. In Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, LREC. 21--25.
[71]
S. Pal, P. Pakray, A. Gelbukh, and J. van Genabith. 2015. Mining parallel resources for machine translation from comparable corpora. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. 534--544.
[72]
S. Pal, P. Pakray, S. Naskar, and Others. 2014. Automatic building and using parallel resources for SMT from comparable corpora. In Proceedings of the 3rd Workshop on Hybrid Approaches to Translation (HyTra)@ EACL. 48--57.
[73]
K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 311--318.
[74]
C. Quirk, R. Udupa, and A. Menezes. 2007. Generative models of noisy translations with applications to parallel fragment extraction. In Proceedings of MT Summit XI. 337--384.
[75]
R. Rahimi, A. Shakery, J. Dadashkarimi, M. Ariannezhad, M. Dehghani, and H. N. Esfahani. 2016. Building a multi-domain comparable corpus using a learning to rank method. Natural Language Engineering 22, 4 (2016), 627--653.
[76]
R. Rapp. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 33rd Annual Meeting on ACL. 320--322.
[77]
R. Rapp. 1999. Automatic identification of word translations from unrelated English and German corpora. In Proceedings of the 37th Annual Meeting of the ACL. 519--526.
[78]
R. Rapp, S. Sharoff, and P. Zweigenbaum. 2016. Recent advances in machine translation using comparable corpora. Natural Language Engineering 22, 4 (2016), 501--516.
[79]
R. Rapp and M. Zock. 2009. Automatic dictionary expansion using non-parallel corpora. Advances in Data Analysis, Data Handling and Business Intelligence (2009), 317--325.
[80]
M. Razmara, M. Siahbani, R. Haffari, and A. Sarkar. 2013. Graph propagation for paraphrasing out-of-vocabulary words in statistical machine translation. In Proceedings of ACL (1). 1105--1115.
[81]
A. Saluja, H. Hassan, K. Toutanova, and C. Quirk. 2014. Graph-based semi-supervised learning of translation models from monolingual data. In Proceedings of the 52nd Annual Meeting of the ACL. Association for Computational Linguistics, Baltimore, Maryland, 676--686.
[82]
A. Saluja and J. Navratil. 2013. Graph-based unsupervised learning of word similarities using heterogeneous feature types. In Proceedings of TextGraphs@ EMNLP. Citeseer, 29--38.
[83]
C. Schafer and D. Yarowsky. 2002. Inducing translation lexicons via diverse similarity measures and bridge languages. In Proceedings of the 6th Conference on Natural Language Learning. 1--7.
[84]
M. Shamsfard, A. Hesabi, H. Fadaei, N. Mansoory, A. Famian, S. Bagherbeigi, E. Fekri, M. Monshizadeh, and S. M. Assi. 2010. Semi automatic development of farsnet; the Persian wordnet. In Proceedings of 5th Global WordNet Conference.
[85]
D. Shezaf and A. Rappoport. 2010. Bilingual lexicon generation using non-aligned signatures. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 98--107.
[86]
B. Snyder, R. Barzilay, and K. Knight. 2010. A statistical model for lost language decipherment. In Proceedings of the 48th Annual Meeting of the ACL. 1048--1057.
[87]
S. Soderland, O. Etzioni, S. Weld, D., M. Skinner, J. Bilmes, and Others. 2009. Compiling a massive, multilingual dictionary via probabilistic inference. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP-Volume 1. 262--270.
[88]
A. Tamura, T. Watanabe, and E. Sumita. 2012. Bilingual lexicon extraction from comparable corpora using label propagation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Jeju Island, Korea, 24--36.
[89]
L. Tang, T. Y. Wang, Y. B. Chen, and T. Y. Wang. 2015. Problems of alignment in Paraconc for a case study. In Proceedings of the 2014 Asia-Pacific Conference on Computer Science and Applications (CSAC’14). 57--62.
[90]
G. Tholpadi, C. Bhattacharyya, and S. Shevade. 2017. Corpus-based translation induction in Indian languages using auxiliary language corpora from Wikipedia. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16, 3 (March 2017), 1--25.
[91]
J. Turian, L. Ratinov, and Y. Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the ACL. 384--394.
[92]
I. Vulić, W. De Smet, and M. Moens. 2011. Identifying word translations from comparable corpora using latent topic models. In Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 479--484.
[93]
I. Vulić and M. Moens. 2012. Detecting highly confident word translations from comparable corpora without any prior knowledge. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL’12). Association for Computational Linguistics, Stroudsburg, PA, 449--459.
[94]
K. Wolk and K. Marasek. 2014. Building subject-aligned comparable corpora and mining it for truly parallel sentence pairs, Vol. 18. Elsevier, 126--132.
[95]
H. Wu and H. Wang. 2007. Pivot language approach for phrase-based statistical machine translation. Machine Translation 21, 3 (2007), 165--181.
[96]
L. Xiang, Y. Zhou, and C. Zong. 2013. An efficient framework to extract parallel units from comparable data. Natural Language Processing and Chinese Computing (2013), 151--163.
[97]
K. Yu and J. Tsujii. 2009. Bilingual dictionary extraction from wikipedia. In Proceedings of Machine Translation Summit XII. 121--124.
[98]
C. Zhang and T. Zhao. 2015. Bilingual lexicon extraction using locally weighted linear regression from comparable corpora. In Proceedings of the International Conference on Asian Language (IALP’15). 13--16.
[99]
Y. Zhang, M. M. Rahman, A. Braylan, B. Dang, H. Chang, H. Kim, Q. McNamara, A. Angert, E. Banner, V. Khetan, and T. McDonnell. 2016. Neural information retrieval: A literature review. (2016). arXiv:arXiv:1611.06792.
[100]
Z. Zhu, M. Li, L. Chen, and Z. Yang. 2013. Building comparable corpora based on bilingual LDA model. In Proceedings of ACL (2). 278--282.
[101]
Z. Zhu, X. Zeng, S. Zheng, X. Sun, S. Wang, and S. Weng. 2016. A mutual iterative enhancement model for simultaneous comparable corpora and bilingual lexicons construction. In Proceedings of the 9th Workshop on Building and Using Comparable Corpora, Co-located with LREC 2016. 27--33.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 1
January 2020
345 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3338846
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019
Accepted: 01 April 2019
Revised: 01 February 2019
Received: 01 July 2018
Published in TALLIP Volume 19, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. English
  2. Information extraction
  3. Persian
  4. and Arabic languages
  5. comparable corpora
  6. generative model
  7. natural language processing
  8. parallel fragments
  9. statistical machine translation

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 105
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media