A Study of Analogical Density in Various Corpora at Various Granularity
<p>The eight equivalent forms of a same analogy <span class="html-italic">A</span> : <span class="html-italic">B</span> :: <span class="html-italic">C</span> : <span class="html-italic">D</span>.</p> "> Figure 2
<p>Examples of analogies in sentences with equivalent analogies derived from the properties of analogies mentioned in <a href="#sec3dot1-information-12-00314" class="html-sec">Section 3.1</a>.</p> "> Figure 3
<p>Average token and type lengths (in character) on the English part of the Tatoeba corpus after tokenisation using BPE and unigram with different sizes of vocabulary. We do not provide the figures with vocabulary sizes from 4 k onwards because only BPE is able to produce tokenisation with the mentioned parameter.</p> "> Figure 4
<p>Number of clusters (<b>a</b>) and analogies (<b>b</b>) extracted from the corpora in English. Below that, Analogical density (<b>c</b>) and the proportion of sentences appear in analogy (<b>d</b>) for the corpora in English. Please take note of the logarithmic scale on the ordinate for analogical density (micro (<math display="inline"><semantics> <mi>μ</mi> </semantics></math>): <math display="inline"><semantics> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>6</mn> </mrow> </msup> </semantics></math>, nano (n): <math display="inline"><semantics> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>9</mn> </mrow> </msup> </semantics></math>, pico (p): <math display="inline"><semantics> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>12</mn> </mrow> </msup> </semantics></math>, femto (f): <math display="inline"><semantics> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>15</mn> </mrow> </msup> </semantics></math>. atto (a): <math display="inline"><semantics> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>18</mn> </mrow> </msup> </semantics></math>, zepto (z): <math display="inline"><semantics> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>21</mn> </mrow> </msup> </semantics></math> and yocto (y): <math display="inline"><semantics> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>24</mn> </mrow> </msup> </semantics></math>. The tokenisation schemes on the abscissae are sorted according to the average length of tokens in ascending order.</p> "> Figure 5
<p>Same as <a href="#information-12-00314-f004" class="html-fig">Figure 4</a> but for the Tatoeba corpora in English with and without masking. There are two masking methods: least frequent (<span class="html-italic">Tatoeba-least</span>) and most frequent (<span class="html-italic">Tatoeba-most</span>). The tokenisation schemes on the abscissae are sorted according to the length of average tokens in ascending order. Caution: The maximum value of the ordinates are different with <a href="#information-12-00314-f004" class="html-fig">Figure 4</a>.</p> "> Figure 6
<p>Analogical density against the average number of tokens per line (respective to their tokenisation schemes) for the Tatoeba and Multi30K corpora in several languages. The three plots are without masking (<b>a</b>), with masking for the least frequent (<b>b</b>) and the most frequent (<b>c</b>).</p> ">
Abstract
:1. Introduction
- World-knowledge or pragmatic sort, as inIndonesia : Jakarta :: Brazil : Brasilia(state/capital);
- Semantic sort, as inglove : hand :: envelope : letter(container/content);
- Grammatical sort, as inchild : children :: man : men(singular/plural); and
- Formal sort, or level of form, as inhe : her :: dance : dancer (suffixing with r).
1.1. Motivation and Justification
1.2. Contributions
- We introduce a precise notion of analogical density and measure the analogical density of various corpora;
- We characterise texts that are more likely to have a higher analogical density;
- We investigate the effect of using different tokenisation schemes and the effect of masking tokens by their frequency on the analogical density of various corpora;
- We investigate the impact of the average length of sentences on their analogical density of corpora; and
- Based on previously mentioned results, we propose general rules to increase the analogical density of a given corpus.
1.3. Organisation of the Paper
2. Number of Analogies in a Text and Analogical Density
2.1. Analogical Density
- symmetry of conformity: A : B :: C : D ⇔ C : D :: A : B;
- exchange of the means: A : B :: C : D ⇔ A : C :: B : D.
2.2. Proportion of Sentences Appearing in Analogy
2.3. Meaning of the Measure and Gauging
2.4. Restrictions
3. Analogy
3.1. Properties of Analogy
- Reflexivity of conformityA : B :: A : B is always a valid analogy for any A and B.slow : slower :: slow : slower
- Symmetry of conformityLet A : B :: C : D be a valid analogy; then, equivalently, C : D :: A : B is also valid.slow : slower :: high : higher ⇔ high : higher :: slow : slower
- Exchange of the meansLet A : B :: C : D be a valid analogy; then, equivalently, A : C :: B : D is also valid.slow : slower :: high : higher ⇔ slow : high :: slower : higher
3.2. Ratio between Strings
3.3. Conformity of Ratios between Strings
3.4. Analogical Cluster: Cluster of Similar Ratios
- horizontally: between and ;
- vertically: between and .
4. Survey on the Data
- Tatoeba (available at: tatoeba.org accessed on 20 September 2020) is a collection of sentences that are translations provided through collaborative works online (crowd-sourcing). It covers hundreds of languages. However, the amount of data between languages are not balanced because it also depends on the number of members who are native speakers of that language. Sentences contained in Tatoeba corpus are usually short. These sentences are mostly about daily life conversations. Table 3 shows the statistics of Tatoeba corpus used in the experiments.
- Multi30K (available at: github.com/multi30k/dataset accessed on 20 September 2020) [22,23,24] is a collection of image descriptions (captions) provided in several languages. This dataset is mainly used for multilingual image description and multimodal machine translation tasks. It is an extension of Flickr30K [25], and more data are added from time to time, for example, the COCO dataset (available at: cocodataset.org accessed on 20 September 2020). Table 4 shows the statistics of Multi30K corpus.
- CommonCrawl (available at: commoncrawl.org accessed on 20 September 2020) is a crawled web archive and dataset. Due to its nature as web archives, this corpus covers a lot of topics. In this paper, we used the version that is provided as training data for the Shared Task: Machine Translation of WMT-2015 (available at: statmt.org/wmt15/translation-task.html accessed on September 2020). Table 5 shows the statistics on the CommonCrawl corpus.
- Europarl (available at: statmt.org/europarl/ accessed on September 2020) [14] is a corpus that contains transcriptions of the European Parliament in 11 European languages. It was first introduced for Statistical Machine Translation and is still used as the basic corpus for machine translation tasks. In this paper, we use version 7. Table 6 shows the statistics on Europarl corpus.
Aligning Sentences across Languages
5. Tokenisation
5.1. Character
5.2. Sub-Word
5.2.1. Token Length
5.2.2. Sampling
5.3. Word
6. Masking
- least frequent: tokens which belong to the N least frequent types (caution: tokens are repeated. Types are counted only once) are masked with one same label, while all the other types are kept as it is. In this paper, we ranked the types according to their frequency in the corpus. After that, we masked all tokens in the corpus that belong to the least frequent types for which the accumulated frequency is half of the total number of tokens in the corpus. All other tokens are kept. If several types in the same rank (frequency) exist, then we just keep randomly picking one of them until the accumulated frequency is half of the total number of tokens.
- most frequent: same as above but with the token with N most frequent types instead (opposite of the least frequent).
7. Results and Analysis
7.1. Effect of Tokenisation on Analogical Density
7.2. Impact of Average Length of Sentences on Analogical Density
8. Further Discussion
8.1. Vocabulary Size of Sub-Word Tokenisation
8.2. Masking Ratio
8.3. The Level of Analogy: Surface Form and Distributional Semantics
9. Conclusions
- Corpora with a higher Type–Token Ratio tend to have higher analogical densities.
- We naturally found that the analogical density goes down from the character to word. However, this is not true when tokens are masked based on their frequencies.
- Masking tokens with lower frequencies leads to higher analogical densities.
- Use sub-word tokenisation, and vary the size of the vocabulary to maximise the Type–Token Ratio.
- If the task allows for it, mask tokens with lower frequencies and vary the threshold to maximise the Type–Token Ratio again.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hathout, N. Acquisition of morphological families and derivational series from a machine readable dictionary. arXiv 2009, arXiv:0905.1609. [Google Scholar]
- Lavallée, J.F.; Langlais, P. Morphological acquisition by formal analogy. In Morpho Challenge 2009; Knowledge 4 All Foundation Ltd.: Surrey, UK, 2009. [Google Scholar]
- Blevins, J.P.; Blevins, J. (Eds.) Analogy in Grammar: Form and Acquisition. Oxford Scholarship Online. 2009. Available online: https://oxford.universitypressscholarship.com/view/10.1093/acprof:oso/9780199547548.001.0001/acprof-9780199547548 (accessed on 25 July 2021).
- Fam, R.; Lepage, Y. A study of the saturation of analogical grids agnostically extracted from texts. In Proceedings of the Computational Analogy Workshop at the 25th International Conference on Case-Based Reasoning (ICCBR-CA-17), Trondheim, Norway, 26–28 June 2017; pp. 11–20. Available online: http://ceur-ws.org/Vol-2028/paper1.pdf (accessed on 25 July 2021).
- Wang, W.; Fam, R.; Bao, F.; Lepage, Y.; Gao, G. Neural Morphological Segmentation Model for Mongolian. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN-2019), Budapest, Hungary, 14–19 July 2019; pp. 1–7. [Google Scholar]
- Langlais, P.; Patry, A. Translating Unknown Words by Analogical Learning. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-07), Prague, Czech Republic, 28–30 June 2007; pp. 877–886. Available online: https://aclanthology.org/D07-1092 (accessed on 25 July 2021).
- Lindén, K. Entry Generation by Analogy—Encoding New Words for Morphological Lexicons. North. Eur. J. Lang. Technol. 2009, 1, 1–25. [Google Scholar] [CrossRef]
- Fam, R.; Purwarianti, A.; Lepage, Y. Plausibility of word forms generated from analogical grids in Indonesian. In Proceedings of the 16th International Conference on Computer Applications (ICCA-18), Beirut, Lebanon, 25–26 July 2018; pp. 179–184. [Google Scholar]
- Hathout, N.; Namer, F. Automatic Construction and Validation of French Large Lexical Resources: Reuse of Verb Theoretical Linguistic Descriptions. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.549.5396&rep=rep1&type=pdf (accessed on 25 July 2021).
- Hathout, N. Acquistion of the Morphological Structure of the Lexicon Based on Lexical Similarity and Formal Analogy. In Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing, Manchester, UK, 24 August 2008; pp. 1–8. Available online: https://aclanthology.org/W08-2001 (accessed on 25 July 2021).
- Lepage, Y.; Denoual, E. Purest ever example-based machine translation: Detailed presentation and assessment. Mach. Transl. 2005, 19, 251–282. [Google Scholar] [CrossRef]
- Takezawa, T.; Sumita, E.; Sugaya, F.; Yamamoto, H.; Yamamoto, S. Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), Las Palmas, Spain, 29–31 May 2002; Available online: http://www.lrec-conf.org/proceedings/lrec2002/pdf/305.pdf (accessed on 25 July 2021).
- Lepage, Y. Lower and Higher Estimates of the Number of “True Analogies” between Sentences Contained in a Large Multilingual Corpus. Available online: https://aclanthology.org/C04-1106.pdf (accessed on 25 July 2021).
- Koehn, P. Europarl: A Parallel Corpus for Statistical Machine Translation. In Conference Proceedings: The Tenth Machine Translation Summit; AAMT: Phuket, Thailand, 2005; pp. 79–86. [Google Scholar]
- Lepage, Y. Solving Analogies on Words: An Algorithm. In Proceedings of the 17th International Conference on Computational Linguistics (COLING 1998), Montreal, QC, Canada, 10–14 August 1998; Volume 1, pp. 728–734. [Google Scholar]
- Stroppa, N.; Yvon, F. An Analogical Learner for Morphological Analysis. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), Ann Arbor, MI, USA, 29–30 June 2005; pp. 120–127. [Google Scholar]
- Langlais, P.; Yvon, F. Scaling up Analogical Learning. In Proceedings of the Coling 2008: Companion Volume: Posters, Manchester, UK, 18–22 August 2008; pp. 51–54. [Google Scholar]
- Beesley, K.R. Consonant Spreading in Arabic Stems. COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics. Available online: https://aclanthology.org/C98-1018.pdf (accessed on 25 July 2021).
- Wintner, S. Chapter Morphological Processing of Semitic Languages. In Natural Language Processing of Semitic Languages; Springer: Berlin/Heidelberg, Germany, 2014; pp. 43–66. [Google Scholar]
- Gil, D. From Repetition to Reduplication in Riau Indonesian. In Studies on Reduplication; De Gruyter: Berlin, Germany, 2011; pp. 31–64. [Google Scholar]
- Lepage, Y. Analogies Between Binary Images: Application to Chinese Characters. In Computational Approaches to Analogical Reasoning: Current Trends; Prade, H., Richard, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 25–57. [Google Scholar]
- Elliott, D.; Frank, S.; Sima’an, K.; Specia, L. Multi30K: Multilingual English-German Image Descriptions. In Proceedings of the 5th Workshop on Vision and Language, Berlin, Germany, 7–12 August 2016; pp. 70–74. [Google Scholar]
- Elliott, D.; Frank, S.; Barrault, L.; Bougares, F.; Specia, L. Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description. In Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers, Copenhagen, Denmark, 7–8 September 2017; pp. 215–233. [Google Scholar]
- Barrault, L.; Bougares, F.; Specia, L.; Lala, C.; Elliott, D.; Frank, S. Findings of the Third Shared Task on Multimodal Machine Translation. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, Brussels, Belgium, 31 October–1 November 2018; pp. 304–323. [Google Scholar]
- Young, P.; Lai, A.; Hodosh, M.; Hockenmaier, J. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2014, 2, 67–78. [Google Scholar] [CrossRef]
- Kudo, T. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 66–75. Available online: https://aclanthology.org/P18-1007 (accessed on 25 July 2021).
- Sennrich, R.; Haddow, B.; Birch, A. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 1715–1725. Available online: https://aclanthology.org/P16-1162 (accessed on 25 July 2021).
- Provilkov, I.; Emelianenko, D.; Voita, E. BPE-Dropout: Simple and Effective Subword Regularization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Stroudsburg, PA, USA, 5–10 July 2020; pp. 1882–1892. Available online: https://aclanthology.org/2020.acl-main.170 (accessed on 25 July 2021).
- Koehn, P.; Knight, K. Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm. In Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, Austin, TX, USA, 30 July–3 August 2000; pp. 711–715. [Google Scholar]
Prefix | Symbol | Factor |
---|---|---|
centi | c | |
mili | m | |
micro | ||
nano | n | |
pico | p | |
femto | f | |
atto | a | |
zepto | z | |
yocto | y |
en | fr | de | cs | pl | fi | |
---|---|---|---|---|---|---|
Tatoeba | √ | √ | √ | √ | √ | |
Multi30K | √ | √ | √ | √ | ||
CommonCrawl | √ | √ | √ | √ | ||
Europarl | √ | √ | √ | √ | √ |
- | en | fr | de | cs | pl | fi |
---|---|---|---|---|---|---|
# of lines | 7964 | |||||
# of tokens | 51,279 | 54,430 | 50,375 | - | 41,892 | 39,907 |
# of types | 4152 | 5740 | 5639 | - | 7796 | 8634 |
Avg. tokens per line | 6.44 ± 2.80 | 6.83 ± 3.20 | 6.33 ± 2.85 | - | 5.26 ± 2.44 | 5.01 ± 2.10 |
Avg. token length | 3.34 ± 2.14 | 3.69 ± 2.52 | 4.04 ± 2.59 | - | 4.26 ± 2.89 | 4.75 ± 3.15 |
Avg. type length | 6.32 ± 2.27 | 7.12 ± 2.49 | 7.55 ± 3.01 | - | 7.28 ± 2.44 | 8.09 ± 2.86 |
Type-Token-Ratio | 0.08 | 0.11 | 0.11 | - | 0.19 | 0.22 |
Hapax size (%) | 48.80 | 56.17 | 55.68 | - | 62.11 | 66.50 |
- | en | fr | de | cs | pl | fi |
---|---|---|---|---|---|---|
# of lines | 30,014 | |||||
# of tokens | 392,978 | 471,352 | 374,490 | 308,367 | - | - |
# of types | 10,373 | 11,376 | 19,112 | 22,787 | - | - |
Avg. tokens per line | 13.09 ± 4.10 | 15.70 ± 5.91 | 12.48 ± 4.23 | 10.27 ± 3.60 | - | - |
Avg. token length | 3.85 ± 2.40 | 3.93 ± 2.47 | 4.86 ± 2.97 | 4.34 ± 2.71 | - | - |
Avg. type length | 6.92 ± 2.41 | 7.41 ± 2.42 | 9.91 ± 3.91 | 7.52 ± 2.40 | - | - |
Type-Token-Ratio | 0.03 | 0.02 | 0.05 | 0.07 | ||
Hapax size (%) | 41.94 | 42.15 | 58.05 | 53.50 | - | - |
- | en | fr | de | cs | pl | fi |
---|---|---|---|---|---|---|
# of lines | 27,379 | |||||
# of tokens | 582,530 | 647,747 | 539,655 | 548,214 | - | - |
# of types | 27,880 | 33,592 | 43,073 | 47,788 | - | - |
Avg. tokens per line | 21.28 ± 13.08 | 23.66 ± 16.71 | 19.71 ± 15.15 | 20.02 ± 15.43 | - | - |
Avg. token length | 4.40 ± 2.84 | 4.54 ± 3.20 | 5.29 ±4.08 | 4.70 ± 3.14 | - | - |
Avg. type length | 7.30 ± 3.06 | 7.53 ± 3.03 | 9.34 ± 4.55 | 7.53 ± 2.81 | - | - |
Type-Token-Ratio | 0.05 | 0.05 | 0.08 | 0.09 | - | - |
Hapax size (%) | 49.15 | 49.91 | 56.93 | 53.09 | - | - |
- | en | fr | de | cs | pl | fi |
---|---|---|---|---|---|---|
# of lines | 186,303 | |||||
# of tokens | 5,596,191 | 6,195,568 | 5,401,483 | - | 4,805,319 | 3,963,855 |
# of types | 39,042 | 52,662 | 108,526 | - | 108,777 | 188,718 |
Avg. tokens per line | 30.04 ± 15.51 | 33.26 ± 17.42 | 28.99 ± 15.12 | - | 25.79 ± 13.60 | 21.28 ± 10.97 |
Avg. token length | 4.54 ± 2.98 | 4.66 ± 3.25 | 5.55 ± 4.04 | - | 5.70 ± 3.85 | 6.90 ± 4.54 |
Avg. type length | 8.37 ± 3.37 | 8.67 ± 3.11 | 12.66 ± 5.32 | - | 9.61 ± 3.24 | 12.83 ± 4.78 |
Type-Token-Ratio | 0.01 | 0.01 | 0.02 | - | 0.02 | 0.05 |
Hapax size (%) | 36.08 | 36.12 | 50.37 | - | 39.81 | 52.62 |
Corpus | Lang. | Example Sentences |
---|---|---|
Tatoeba | en | the store is closing at 7. |
fr | le magasin ferme à 7 heures. | |
de | der laden schließt um sieben. | |
pl | sklep jest zamknięty od 19. | |
fi | kauppa menee kiinni kello seitsemän. | |
Multi30K | en | a boy in white plays baseball. |
fr | un garçon en blanc joue au baseball. | |
de | ein weiß gekleideter junge spielt baseball. | |
cs | chlapec v bílém hraje baseball. | |
CommonCrawl | en | remember—this is a brief selection of photographs—much more you can see the individual entries on the blog. |
fr | n’ oubliez pas—il s’ agit d’ une brève sélection de photographies—bien plus que vous pouvez voir les entrées individuelles sur le blog. | |
de | denken sie daran—das ist eine kleine auswahl von fotografien—viel mehr können sie die einzelnen einträge auf dem blog zu sehen. | |
cs | pamatujte si—to je stručný výběr fotografií—mnohem více můžete vidět jednotlivé položky na blogu. | |
Europarl | en | (de) madam president, in terms of european integration, it is without doubt a good thing that one of the new eu countries, in this case the czech republic, held the council presidency. |
fr | (de) madame la présidente, en termes d’ intégration européenne, il est certes une bonne chose que la présidence du conseil revienne à l’ un des nouveaux états membres de l’ ue, en l’ espèce, la république tchèque. | |
de | (de) frau präsidentin ! im sinne der europäischen integration war es zweifellos begrüßenswert, dass mit tschechien eines der neuen eu-länder die ratspräsidentschaft innehatte. | |
pl | (de) pani przewodnicząca ! jeżeli chodzi o integrację europejską, bez wątpienia dobrą rzeczą jest to, że jeden z nowych krajów, a mianowicie republika czeska, sprawował prezydencję rady. | |
fi | (de) arvoisa puhemies, euroopan yhdentymisen kannalta on epäilemättä hyvä asia, että yksi eu:n uusista jäsenvaltioista, tässä tapauksessa tšekin tasavalta, toimi neuvoston puheenjohtajana. |
Original | The Store Is Closing at 7. |
---|---|
character | t h e _ s t o r e _ i s _ c l o s i n g _ a t _ 7 _. |
unigram | _the _stor e _is _c l o s ing _at _ 7 _. |
BPE | _the _st ore _is _cl os ing _at _ 7 _. |
word | the store is closing at 7. |
Tokenisation | Masking | Example Sentence | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
word | - | the | store | is | closing | at | 7 | . | ||||
least | the | ... | is | ... | ... | ... | . | |||||
most | ... | store | ... | closing | at | 7 | ... | |||||
BPE | - | _the | _st | ore | _is | _cl | os | ing | _at | _ | 7 | _. |
least | _the | ... | ... | _is | ... | ... | ing | ... | ... | ... | _. | |
most | ... | _st | ore | ... | _cl | os | ... | _at | _ | 7 | ... |
Ratio (%) | Masked Sentence | ||||||
---|---|---|---|---|---|---|---|
0 | the | store | is | closing | at | 7 | . |
25 | ... | store | is | closing | at | 7 | ... |
50 | ... | store | ... | closing | at | 7 | ... |
75 | ... | store | ... | closing | ... | ... | ... |
100 | ... | ... | ... | ... | ... | ... | ... |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fam, R.; Lepage, Y. A Study of Analogical Density in Various Corpora at Various Granularity. Information 2021, 12, 314. https://doi.org/10.3390/info12080314
Fam R, Lepage Y. A Study of Analogical Density in Various Corpora at Various Granularity. Information. 2021; 12(8):314. https://doi.org/10.3390/info12080314
Chicago/Turabian StyleFam, Rashel, and Yves Lepage. 2021. "A Study of Analogical Density in Various Corpora at Various Granularity" Information 12, no. 8: 314. https://doi.org/10.3390/info12080314
APA StyleFam, R., & Lepage, Y. (2021). A Study of Analogical Density in Various Corpora at Various Granularity. Information, 12(8), 314. https://doi.org/10.3390/info12080314