Abstract
Distributional models of natural language use vectors to provide a contextual foundation for meaning representation. These models rely on large quantities of real data, such as corpora of documents, and have found applications in natural language tasks, such as word similarity, disambiguation, indexing, and search. Compositional distributional models extend the distributional ones from words to phrases and sentences. Logical operators are usually treated as noise by these models and no systematic treatment is provided so far. In this paper, we show how skew lattices and their encoding in upper triangular matrices provide a logical foundation for compositional distributional models. In this setting, one can model commutative as well as non-commutative logical operations of conjunction and disjunction. We provide theoretical foundations, a case study, and experimental results for an entailment task on real data.
K. Cvetko-Vah acknowledges the financial support from the Slovenian Research Agency (research core funding No. P1-0222). M. Sadrzadeh, D. Kartsaklis and B. Blundell acknowledge financial support from AFOSR International Scientific Collaboration Grant FA9550-14-1-0079.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abramsky, S., Coecke, B.: A categorical semantics of quantum protocols. In: Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science (LiCS 2004). IEEE Computer Science Press (2004). arXiv:quant-ph/0402130
Berendsen, J., Jansen, D.N., Schmaltz, J., Vaandrager, F.W.: The axiomatization of override and update. J. Appl. Log. 8, 141–150 (2010)
Chomsky, N.: Three models for the description of language. IRE Trans. Inf. Theory 2, 113–124 (1956)
Coecke, B., Sadrzadeh, M., Clark, S.: Mathematical foundations for distributed compositional model of meaning. Lambek Festschr. Linguist. Anal. 36, 345–384 (2010)
Cvetko-Vah, K.: Skew lattices of matrices in rings. Algebra Univers. 53, 471–479 (2005)
Cvetko-Vah, K., Salibra, A.: The connection of skew Boolean algebras and discriminator varieties to church algebras. Algebra Univers. 73, 369–390 (2015)
Cvetko-Vah, K., Leech, J., Spinks, M.: Skew lattices and binary operations on functions. J. Appl. Log. 11, 253–265 (2013)
Firth, J.: A synopsis of linguistic theory 1930–1955. In: Studies in Linguistic Analysis (1957)
Galatos, N., Jipsen, P., Kowalski, T., Ono, H.: Residuated Lattices: An Algebraic Glimpse at Substructural Logics. Studies in Logic and the Foundations of Mathematics, vol. 151. Elsevier, Amsterdam (2007)
Harris, Z.: Distributional structure. Word 10, 146–162 (1954)
Jordan, P.: Über nichtkommutative verbände. Arch. Math. 2, 56–59 (1949)
Kartsaklis, D., Sadrzadeh, M.: A compositional distributional inclusion hypothesis. In: Amblard, M., de Groote, P., Pogodalla, S., Retoré, C. (eds.) LACL 2016. LNCS, vol. 10054, pp. 116–133. Springer, Heidelberg (2016). doi:10.1007/978-3-662-53826-5_8
Kartsaklis, D., Sadrzadeh, M.: Distributional inclusion hypothesis for tensor-based composition. In: COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, Osaka, Japan, 11–16 December 2016, pp. 2849–2860. ACL (2016)
Kotlerman, L., Dagan, I., Szpektor, I., Zhitomirsky-Geffet, M.: Directional distributional similarity for lexical inference. Nat. Lang. Eng. 16(4), 359–389 (2010)
Lambek, J.: Type grammar revisited. In: Lecomte, A., Lamarche, F., Perrier, G. (eds.) LACL 1997. LNCS, vol. 1582, pp. 1–27. Springer, Heidelberg (1999). doi:10.1007/3-540-48975-4_1
Lambek, J.: The mathematics of sentence structure. Am. Math. Mon. 65, 154–170 (1958)
Leech, J.: Skew lattices in rings. Algebra Univers. 26, 48–72 (1989)
Leech, J.: Skew Boolean algebras. Algebra Univers. 27, 497–506 (1990)
Leech, J.: Normal skew lattices. Semigroup Forum 44, 1–8 (1992)
Leech, J.: Recent developments in the theory of skew lattices. Algebra Univers. 52, 7–24 (1996)
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the 17th International Conference on Computational Linguistics, vol. 2, pp. 768–774. Association for Computational Linguistics (1998)
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the International Conference on Machine Learning, pp. 296–304 (1998)
Rubenstein, H., Goodenough, J.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)
Schuetze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–123 (1998)
Weeds, J., Weir, D., McCarthy, D.: Characterising measures of lexical distributional similarity. In: Proceedings of the 20th International Conference on Computational Linguistics, COLING 2004. Association for Computational Linguistics (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Normalisation Schemes
The raw co-occurrence counts are normalised using two measures:
-
Probability Ratio
$$\begin{aligned} \frac{P(w,f)}{P(w)P(f)} \end{aligned}$$where P(w, c) is the probability that words w and feature f have occurred together, and P(w) and P(f) are probabilities of occurrences of w and f. This measure tells us how often w and f were observed together in comparison to how often they would have occurred were they independent.
-
Positive Pointwise Mutual Information (PPMI)
$$\begin{aligned} \max (log(\frac{P(w,f)}{P(w)P(f)}), 0) \end{aligned}$$This is the positive version of the logarithm of probability ratio, where the negative logarithmic values are sent to 0.
1.2 A.2 Formulae for Computing Entailment
APinc is the average precision applied to feature inclusion. It measures a ranked version of feature inclusion on vectors \(\overrightarrow{u}\) and \(\overrightarrow{v}\), from highest to lowest:
In the above, \(f_r\) is the feature in \(\overrightarrow{u}\), denoted by \(F(\overrightarrow{u})\), with rank r; P(r) is the precision at rank r, which measures how many of \(\overrightarrow{v}\)’s features are included at rank r in the features of \(\overrightarrow{u}\), and \(\textit{rel}'(f_r)\) is a relevance measure reflecting how important \(f_r\) is in \(\overrightarrow{v}\). It is computed as follows:
BAPinc balances APinc with the LIN degree of similarity between the vectors. BAPinc was developed in [14] after realising that APinc returns poor results when the vectors had a radically different number of non-zero features; the LIN measure was included to balance out the extra dimensions of the longer vector.
LIN is a similarity measure between vectors and was defined in [22]. It can be replaced with any other similarity measure, such as the cosine measure.
SAPinc is a measure developed in [12], based on BAPinc, but for dense vectors. Whereas APinc and BAPinc were developed to compute the degree of entailment between word vectors, which are usually sparse since word vectors live in high dimensional spaces (e.g. 5000), SAPinc was developed to deal with phrase and sentence vectors. These are obtained by composing the vectors of words in lower dimension (e.g. 300), where the compositional operators accumulate the information and return dense results.
Here, P(r) and \(rel'(f_r)\) are defined differently, as shown below:
For more explanations on these measures please see [12, 13].
1.3 A.3 Experimental Results for a Second Sample
The results of the experiment of Sect. 6, with PPMI and probability ratio matrices on the second 1000 sample of the dataset are presented in Fig. 7.
Similar to the results presented in the paper, the non-commutative operation performs better on recognising the non-commutative conjunctive entailments.
Rights and permissions
Copyright information
© 2017 Springer-Verlag GmbH Germany
About this paper
Cite this paper
Cvetko-Vah, K., Sadrzadeh, M., Kartsaklis, D., Blundell, B. (2017). Non-commutative Logic for Compositional Distributional Semantics. In: Kennedy, J., de Queiroz, R. (eds) Logic, Language, Information, and Computation. WoLLIC 2017. Lecture Notes in Computer Science(), vol 10388. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-55386-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-662-55386-2_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-55385-5
Online ISBN: 978-3-662-55386-2
eBook Packages: Computer ScienceComputer Science (R0)