Estimating Sentence-like Structure in Synthetic Languages Using Information Topology
<p>Information flow can be considered in terms of the incremental changes in relative distributions over time. Here, we visualize the concept of multiple curved Riemannian manifold spaces formed in a language sequence. Can this probabilistic structure be used to reveal some aspects of the language sequence structure?</p> "> Figure 2
<p>A view of contrasting pairs of probability distribution trajectories from a natural sequence. Each curve represents points on distributions of successive points plotted against each other. Hence, changes in the underlying probability characteristics can be visualized across the sequence and can be considered in terms of traversing a Riemannian manifold. Can this view of information flow be used to analyze structure in synthetic language sequences?</p> "> Figure 3
<p>The cumulative incremental Wasserstein distance for n-grams is shown here for a range of sentences in the Brown News corpus. Here, each sentence is marked by the vertical red lines. It can be observed that the curvature increases rapidly for each sentence to a limit before each new sentence begins. The curvature of the Wasserstein distance increases rapidly for each sentence and then tapers off. This can be understood in terms of the tangent angle of the Wasserstein distance, which measures the decreasing change in incremental information as each sentence progresses. The x axis is shown in terms of information-carrying symbols, and the y-axis is in terms of cumulative incremental Wasserstein distance.</p> "> Figure 4
<p>A method of analyzing structure in synthetic language is shown using information topology. In this measure, the elliptical region indicates the end of a sentence. This is found as the constrained limit between the information flow and the decreasing change in curvature of the information flow. This is given by the probabilistic curvature measurements of the cumulative incremental tangent angle of the estimated Wasserstein-1 distance (y axis) and the cumulative incremental information (x axis). The results are shown for a range of known sentences in the Brown News corpus. The red hatched region defines the bound of the information flow and predicts the sentence end-points.</p> "> Figure 5
<p>A diagrammatic representation of the information topological algorithm measuring the probabilistic curvature measurements between short segments of symbolic sequences. The curvature diminishes to a bound on the information flow, predicting the sentence end-point.</p> "> Figure 6
<p>The probabilistic curvature measurements of the cumulative incremental tangent angle Wasserstein-1 distance (y axis) and the cumulative incremental information (x axis) are shown for 200 known sentences in the Brown News corpus. The clustering shows evidence of the expected information change for each sentence. The red hatched region defines the bounds of the information flow and predicts the sentence end-points.</p> "> Figure 7
<p>The trajectories of the probabilistic curvature measurements of the cumulative incremental tangent angle Wasserstein distance (y axis) and the cumulative incremental information (x axis) are shown for 10 known sentences in the Brown News corpus. The sentence end-points are detected when the trajectory crosses into the red hatched region. The results indicate the potential of the approach for determining the sentence bounds.</p> "> Figure 8
<p>A performance criteria to measure the effectiveness of the proposed information topological sentence model is obtained by comparing the probability distributions of sentence lengths resulting from the proposed information topology sentence bound model when compared against an estimated probabilistic synthetic language model based on a Zipf–Mandelbrot–Li distribution on sentence length [<a href="#B44-entropy-24-00859" class="html-bibr">44</a>]. The distribution of the estimated model provides a reasonably similar distribution to the actual data obtained from the Brown News corpora (1000 sentence result shown).</p> ">
Abstract
:1. Introduction
2. Analyzing Language Using Information Topology
2.1. Statistical Manifolds
2.2. Contrasting Distributions on a Riemannian Manifold
2.3. Normalized Ollivier–Ricci Curvature
2.4. Information Topology Manifold
3. Information-Theoretic Sentences
3.1. Incremental Relative Information
3.2. Curvature of Incremental Tangent Normalized Wasserstein Distance
Algorithm 1 Proposed information topology SLU estimation algorithm |
|
3.3. F-Measure Performance Analysis
3.4. An Information-Theoretic Performance Measure
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Lengyel, G.; Nagy, M.; Fiser, J. Statistically defined visual chunks engage object-based attention. Nat. Commun. 2021, 12, 1–12. [Google Scholar] [CrossRef] [PubMed]
- Rogers, L.L.; Park, S.H.; Vickery, T.J. Visual statistical learning is modulated by arbitrary and natural categories. Psychon. Bull. Rev. 2021, 28, 1281–1288. [Google Scholar] [CrossRef] [PubMed]
- Frank, S.L.; Bod, R.; Christiansen, M.H. How hierarchical is language use? Proc. R. Soc. B Biol. Sci. 2012, 279, 4522–4531. [Google Scholar] [CrossRef]
- Poeppel, D.; Emmorey, K.; Hickok, G.; Pylkkänen, L. Towards a New Neurobiology of Language. J. Neurosci. 2012, 32, 14125–14131. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Koedinger, K.R.; Anderson, J.R. Abstract planning and perceptual chunks: Elements of expertise in geometry. Cogn. Sci. 1990, 14, 511–550. [Google Scholar] [CrossRef]
- Guoxiang, D.; Linlin, J. The lexical approach for language teaching based on the corpus language analysis. In Proceedings of the 2011 IEEE 3rd International Conference on Communication Software and Networks, Xi’an, China, 27–29 May 2011; pp. 665–668. [Google Scholar]
- Nishida, H. The influence of chunking on reading comprehension: Investigating the acquisition of chunking skill. J. Asia TEFL 2013, 10, 163–183. [Google Scholar]
- Krishnamurthy, R. Language as chunks, not words. In Proceedings of the JALT2002 Conference Proceedings: Waves of the Future; Swanson, M., Hill, K., Eds.; JALT: Tokyo, Japan, 2003; pp. 288–294. [Google Scholar]
- Ma, L.; Li, Y. On the Cognitive Characteristics of Language Chunks. In Proceedings of the International Conference on Social Science, Education Management and Sports Education, Beijing, China, 10–11 April 2015; Atlantis Press: Amsterdam, The Netherlands, 2015; pp. 198–200. [Google Scholar]
- Jia, L.; Duan, G. Role of the prefabricated chunks in the working memory of oral interpretation. In Proceedings of the 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), Yichang, China, 21–23 April 2012; pp. 541–543. [Google Scholar]
- Levinson, S.C. Turn-taking in human communication–origins and implications for language processing. Trends Cogn. Sci. 2016, 20, 6–14. [Google Scholar] [CrossRef] [Green Version]
- Reed, C.M.; Durlach, N.I. Note on information transfer rates in human communication. Presence 1998, 7, 509–518. [Google Scholar] [CrossRef]
- Pal, S.; Naskar, S.K.; Bandyopadhyay, S. A hybrid word alignment model for phrase-based statistical machine translation. In Proceedings of the Second Workshop on Hybrid Approaches to Translation, Sofia, Bulgaria, 8 August 2013; pp. 94–101. [Google Scholar]
- Liu, Y.; Stolcke, A.; Shriberg, E.; Harper, M. Comparing and combining generative and posterior probability models: Some advances in sentence boundary detection in speech. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004; pp. 64–71. [Google Scholar]
- Ruppenhofer, J.; Rehbein, I. Detecting the boundaries of sentence-like units on spoken German. In Proceedings of the Preliminary 15th Conference on Natural Language Processing (KONVENS 2019), Erlangen, Germany, 9–11 October 2019; Friedrich-Alexander-Universität Erlangen-Nürnberg; German Society for Computational Linguistics & Language Technology: Erlangen, Germany, 2019; pp. 130–139. [Google Scholar]
- Matusov, E.; Mauser, A.; Ney, H. Automatic sentence segmentation and punctuation prediction for spoken language translation. In Proceedings of the Third International Workshop on Spoken Language Translation, Kyoto, Japan, 27–28 November 2006. [Google Scholar]
- Gotoh, Y.; Renals, S. Information extraction from broadcast news. Philos. Trans. R. Soc. London. Ser. A Math. Phys. Eng. Sci. 2000, 358, 1295–1310. [Google Scholar] [CrossRef] [Green Version]
- Read, J.; Dridan, R.; Oepen, S.; Solberg, L.J. Sentence boundary detection: A long solved problem? In Proceedings of the COLING 2012: Posters, Mumbai, India, 8–15 December 2012; pp. 985–994. [Google Scholar]
- Sanchez, G. Sentence Boundary Detection in Legal Text. In Proceedings of the Natural Legal Language Processing Workshop 2019, Minneapolis, Minnesota, 7 June 2019; Association for Computational Linguistics: Minneapolis, Minnesota, 2019; pp. 31–38. [Google Scholar]
- Griffis, D.; Shivade, C.; Fosler-Lussier, E.; Lai, A.M. A quantitative and qualitative evaluation of sentence boundary detection for the clinical domain. AMIA Summits Transl. Sci. Proc. 2016, 2016, 88. [Google Scholar]
- Kolár, J.; Liu, Y. Automatic sentence boundary detection in conversational speech: A cross-lingual evaluation on English and Czech. In Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14–19 March 2010; pp. 5258–5261. [Google Scholar]
- Jelinek, F. Continuous Speech Recognition by Statistical Methods. Proc. IEEE 1976, 64, 532–556. [Google Scholar] [CrossRef]
- Wallach, H.M. Conditional random fields: An introduction. In Technical Report MIS-CIS-04-21; Now Publishers: Tokyo, Japan, 2004. [Google Scholar]
- Kreuzthaler, M.; Schulz, S. Detection of sentence boundaries and abbreviations in clinical narratives. BMC Med. Inform. Decis. Mak. 2015, 15 (Suppl. 2), S4. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wanjari, N.; Dhopavkar, G.; Zungre, N.B. Sentence boundary detection for Marathi language. Procedia Comput. Sci. 2016, 78, 550–555. [Google Scholar] [CrossRef] [Green Version]
- Ramesh, V.; Kolonin, A. Interpretable natural language segmentation based on link grammar. In Proceedings of the 2020 Science and Artificial Intelligence Conference (S.A.I.ence), Novosibirsk, Russia, 14–15 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 25–32. [Google Scholar]
- Mori, S.; Nobuyasu, I.; Nishimura, M. An automatic sentence boundary detector based on a structured language model. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002), Denver, CO, USA, 16–20 September 2002. [Google Scholar]
- Liu, Y.; Shriberg, E. Comparing evaluation metrics for sentence boundary detection. In Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’07, Honolulu, HI, USA, 15–20 April 2007; IEEE: Piscataway, NJ, USA, 2007; Volume 4, pp. IV–185. [Google Scholar]
- Back, A.D.; Wiles, J. An Information Theoretic Approach to Symbolic Learning in Synthetic Languages. Entropy 2022, 24, 259. [Google Scholar] [CrossRef]
- Piantadosi, S.T.; Fedorenko, E. Infinitely productive language can arise from chance under communicative pressure. J. Lang. Evol. 2017, 2, 141–147. [Google Scholar] [CrossRef]
- Back, A.D.; Angus, D.; Wiles, J. Transitive Entropy—A Rank Ordered Approach for Natural Sequences. IEEE J. Sel. Top. Signal Process. 2020, 14, 312–321. [Google Scholar] [CrossRef]
- Sandler, W.; Meir, I.; Padden, C.; Aronoff, M. The emergence of grammar: Systematic structure in a new language. Proc. Natl. Acad. Sci. USA 2005, 102, 2661–2665. [Google Scholar] [CrossRef] [Green Version]
- Nowak, M.; Plotkin, J.; Jansen, V. The evolution of syntactic communication. Nature 2000, 404, 495–498. [Google Scholar] [CrossRef]
- Amari, S.I. Information geometry of the EM and em algorithms for neural networks. Neural Netw. 1995, 8, 1379–1408. [Google Scholar] [CrossRef]
- Cichocki, A.; Amari, S.I. Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities. Entropy 2010, 12, 1532–1568. [Google Scholar] [CrossRef] [Green Version]
- Shannon, C.E. A Mathematical Theory of Communication (Parts I and II). Bell Syst. Tech. J. 1948, XXVII, 379–423. [Google Scholar] [CrossRef] [Green Version]
- Wang, Q.; Suen, C.Y. Analysis and Design of a Decision Tree Based on Entropy Reduction and Its Application to Large Character Set Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1984, 6, 406–417. [Google Scholar] [CrossRef]
- Kim, J.; André, E. Emotion Recognition Based on Physiological Changes in Music Listening. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 2067–2083. [Google Scholar] [CrossRef] [PubMed]
- Shore, J.E.; Gray, R. Minimum Cross-Entropy Pattern Classification and Cluster Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1982, 4, 11–17. [Google Scholar] [CrossRef] [PubMed]
- Shekar, B.H.; Kumari, M.S.; Mestetskiy, L.; Dyshkant, N. Face recognition using kernel entropy component analysis. Neurocomputing 2011, 74, 1053–1057. [Google Scholar] [CrossRef]
- Hampe, J.; Schreiber, S.; Krawczak, M. Entropy-based SNP selection for genetic association studies. Hum. Genet. 2003, 114, 36–43. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Xiang, Y.; Deng, H.; Sun, Z. An Entropy-based Index for Fine-scale Mapping of Disease Genes. J. Genet. Genom. 2007, 34, 661–668. [Google Scholar] [CrossRef]
- Gianvecchio, S.; Wang, H. An Entropy-Based Approach to Detecting Covert Timing Channels. IEEE Trans. Dependable Secur. Comput. 2011, 8, 785–797. [Google Scholar] [CrossRef]
- Back, A.D.; Angus, D.; Wiles, J. Determining the Number of Samples Required to Estimate Entropy in Natural Sequences. IEEE Trans. Inf. Theory 2019, 65, 4345–4352. [Google Scholar] [CrossRef]
- Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Rao, C. Information and accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 1945, 37, 81. [Google Scholar]
- Amari, S. Differential geometry of curved exponential families-curvatures and information loss. Ann. Stat. 1982, 10, 357–385. [Google Scholar] [CrossRef]
- Amari, S.I. Information Geometry and Its Applications; Applied Mathematical Sciences; Springer: New York, NY, USA; Tokyo, Japan, 2016; Volume 194. [Google Scholar]
- Shannon, C.E. A Mathematical Theory of Communication (Part III). Bell Syst. Tech. J. 1948, XXVII, 623–656. [Google Scholar] [CrossRef]
- Sluis, R.A.; Angus, D.; Wiles, J.; Back, A.; Gibson, T.A.; Liddle, J.; Worthy, P.; Copland, D.; Angwin, A.J. An Automated Approach to Examining Pausing in the Speech of People with Dementia. Am. J. Alzheimer’s Dis. Other Dementias 2020, 35, 1533317520939773. [Google Scholar] [CrossRef] [PubMed]
- Ollivier, Y. A visual introduction to Riemannian curvatures and some discrete generalizations. In Analysis and Geometry of Metric Measure Spaces: Lecture Notes of the 50th Séminaire de Mathématiques Supérieures (SMS), Montréal, 2011; Dafni, G., McCann, R., Stancu, A., Eds.; AMS: Providence, RI, USA, 2013; pp. 197–219. [Google Scholar]
- Ni, C.C.; Lin, Y.Y.; Gao, J.; Gu, D.; Saucan, E. Ricci Curvature of the Internet Topology. In Proceedings of the IEEE Conference on Computer Communications INFOCOM 2015, Hong Kong, China, 26 April–1 May 2015; IEEE Computer Society: Washington, DC, USA, 2015. [Google Scholar]
- Sandhu, R.; Georgiou, T.; Reznik, E.; Zhu, L.; Kolesov, I.; Senbabaoglu, Y.; Tannenbaum, A. Graph Curvature for Differentiating Cancer Networks. Sci. Rep. 2015, 5, 12323. [Google Scholar] [CrossRef] [Green Version]
- Whidden, C.; Matsen IV, F.A. Ricci-Ollivier Curvature of the Rooted Phylogenetic Subtree-Prune-Regraft Graph. arXiv 2015, arXiv:1504.00304. [Google Scholar]
- Back, A.D.; Wiles, J. Entropy Estimation Using a Linguistic Zipf-Mandelbrot-Li Model for Natural Sequences. Entropy 2021, 23, 1100. [Google Scholar] [CrossRef]
- Calhoun, S. The centrality of metrical structure in signaling information structure: A probabilistic perspective. Language 2010, 86, 1–42. [Google Scholar] [CrossRef]
- Chater, N.; Manning, C.D. Probabilistic models of language processing and acquisition. Trends Cogn. Sci. 2006, 10, 335–344. [Google Scholar] [CrossRef]
- Courville, A.C.; Daw, N.D.; Touretzky, D.S. Bayesian theories of conditioning in a changing world. Trends Cogn. Sci. 2006, 10, 294–300. [Google Scholar] [CrossRef]
- Meyniel, F.; Dehaene, S. Brain networks for confidence weighting and hierarchical inference during probabilistic learning. Proc. Natl. Acad. Sci. USA 2017, 114, E3859–E3868. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kiss, T.; Strunk, J. Unsupervised multilingual sentence boundary detection. Comput. Linguist. 2006, 32, 485–525. [Google Scholar] [CrossRef]
- Choi, S.; Cichocki, A.; Park, H.M.; Lee, S.Y. Blind source separation and independent component analysis: A review. Neural Inf. Process.-Lett. Rev. 2005, 6, 1–57. [Google Scholar]
- Francis, W.N.; Kucera, H. Brown Corpus Manual—Manual of Information to Accompany A Standard Corpus of Present-Day Edited American English, for Use with Digital Computers; Department of Linguistics: Macquarie Park, NSW, Australia, 1979. [Google Scholar]
- Local, J.; Kelly, J. Projection and ’silences’: Notes on phonetic and conversational structure. Hum. Stud. 1986, 9, 185–204. [Google Scholar] [CrossRef]
- Moon, T.K. The expectation-maximization algorithm. IEEE Signal Process. Mag. 1996, 13, 47–60. [Google Scholar] [CrossRef]
- Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef] [Green Version]
- Chinchor, N.; Dungca, G. Four scorers and seven years ago: The scoring method for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6): Proceedings of a Conference Held in Columbia, Maryland, Columbia, MD, USA, 6–8 November 1995. [Google Scholar]
- Makhoul, J.; Kubala, F.; Schwartz, R.; Weischedel, R. Performance Measures For Information Extraction. In Proceedings of the DARPA Broadcast News Workshop, Washington, DC, USA, 28 February–3 March 1999; pp. 249–252. [Google Scholar]
- Rijsbergen, V.; Joost, C.K. Information Retrieval, 2nd ed.; Butterworths: London, UK, 1979. [Google Scholar]
- Chawla, N.V. Data mining for imbalanced datasets: An overview. In Data Mining and Knowledge Discovery Handbook; Springer: New York, NY, USA, 2009; pp. 875–886. [Google Scholar]
- Kulkarni, A.; Chong, D.; Batarseh, F.A. 5—Foundations of data imbalance and solutions for a data democracy. In Data Democracy; Batarseh, F.A., Yang, R., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 83–106. [Google Scholar]
- Nechaev, Y.; Ruan, W.; Kiss, I. Towards NLU model robustness to ASR errors at scale. In Proceedings of the KDD 2021 Workshop on Data-Efficient Machine Learning, Singapore, 15 August 2021. [Google Scholar]
- Li, W. Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Trans. Inf. Theory 1992, 38, 1842–1845. [Google Scholar] [CrossRef] [Green Version]
- Li, W. Zipf’s Law Everywhere. Glottometrics 2002, 5, 14–21. [Google Scholar]
- Montemurro, M.A. Beyond the Zipf-Mandelbrot law in quantitative linguistics. Physica A 2001, 300, 567–578. [Google Scholar] [CrossRef] [Green Version]
- Mandelbrot, B. The Fractal Geometry of Nature; W. H. Freeman: New York, NY, USA, 1983. [Google Scholar]
Model | |
---|---|
KS | 78.91 |
BW | 68.09 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Back, A.D.; Wiles, J. Estimating Sentence-like Structure in Synthetic Languages Using Information Topology. Entropy 2022, 24, 859. https://doi.org/10.3390/e24070859
Back AD, Wiles J. Estimating Sentence-like Structure in Synthetic Languages Using Information Topology. Entropy. 2022; 24(7):859. https://doi.org/10.3390/e24070859
Chicago/Turabian StyleBack, Andrew D., and Janet Wiles. 2022. "Estimating Sentence-like Structure in Synthetic Languages Using Information Topology" Entropy 24, no. 7: 859. https://doi.org/10.3390/e24070859
APA StyleBack, A. D., & Wiles, J. (2022). Estimating Sentence-like Structure in Synthetic Languages Using Information Topology. Entropy, 24(7), 859. https://doi.org/10.3390/e24070859