The Discrete Basis Problem

Pauli Miettinen²¹,
Taneli Mielikäinen²¹,
Aristides Gionis²¹,
Gautam Das²² &
…
Heikki Mannila²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4213))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

3633 Accesses
17 Citations

Abstract

Matrix decomposition methods represent a data matrix as a product of two smaller matrices: one containing basis vectors that represent meaningful concepts in the data, and another describing how the observed data can be expressed as combinations of the basis vectors. Decomposition methods have been studied extensively, but many methods return real-valued matrices. If the original data is binary, the interpretation of the basis vectors is hard. We describe a matrix decomposition formulation, the Discrete Basis Problem. The problem seeks for a Boolean decomposition of a binary matrix, thus allowing the user to easily interpret the basis vectors. We show that the problem is computationally difficult and give a simple greedy algorithm for solving it. We present experimental results for the algorithm. The method gives intuitively appealing basis vectors. On the other hand, the continuous decomposition methods often give better reconstruction accuracies. We discuss the reasons for this behavior.

Download to read the full chapter text

Chapter PDF

The Hadamard decomposition problem

Article Open access 21 May 2024

Matrix Sparsification and the Sparse Null Space Problem

Article 04 August 2015

Finding a low-rank basis in a matrix subspace

Article 29 June 2016

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Golub, G., Van Loan, C.: Matrix Computations. JHU Press (1996)
Google Scholar
Lee, D., Seung, H.: Learning the parts of objects by Non-negative Matrix Factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Article MATH Google Scholar
Buntine, W.: Variational extensions to EM and multinomial PCA. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS, vol. 2430, p. 23. Springer, Heidelberg (2002)
Chapter Google Scholar
Lee, D., Seung, H.: Algorithms for Non-negative Matrix Factorization. Advances in Neural Information Processing Systems 13, 556–562 (2001)
Google Scholar
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: ACM Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
Google Scholar
Kabán, A., Bingham, E., Hirsimäki, T.: Learning to read between the lines: The aspect Bernoulli model. In: ICDM (2004)
Google Scholar
Seppänen, J., Bingham, E., Mannila, H.: A simple algorithm for topic identification in 0–1 data. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS, vol. 2838, pp. 423–434. Springer, Heidelberg (2003)
Chapter Google Scholar
Koyutürk, M., Grama, A., Ramakrsihnan, N.: Compression, clustering, and pattern discovery in very-high-dimensional discrete-attribute data sets. IEEE Transactions on Knowledge and Data Engineering, 447–461 (2005)
Google Scholar
Gionis, A., Mannila, H., Seppänen, J.K.: Geometric and combinatorial tiles in 0-1 data. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS, vol. 3202, pp. 173–184. Springer, Heidelberg (2004)
Chapter Google Scholar
Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS, vol. 3245, pp. 278–289. Springer, Heidelberg (2004)
Chapter Google Scholar
Besson, J., Pensa, R., Robardet, C., Boulicaut, J.F.: Constraint-based mining of fault-tolerant patterns from boolean data. In: KDID (2006)
Google Scholar
Mishra, N., Ron, D., Swaminathan, R.: On finding large conjunctive clusters. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS, vol. 2777, pp. 448–462. Springer, Heidelberg (2003)
Chapter Google Scholar
Brayton, R.K., Hachtel, G.D., Sangiovanni-Vincentelli, A.L.: Multilevel logic synthesis. Proceedings of the IEEE 78(2), 264–300 (1990)
Article Google Scholar
Banerjee, A., et al.: A generalized maximum entropy approach to Bregman co-clustering and matrix approximations. In: KDD, pp. 509–514 (2004)
Google Scholar
Monson, S.D., Pullman, N.J., Rees, R.: A survey of clique and biclique coverings and factorizations of (0,1)-matrices. Bulletin of the ICA 14, 17–86 (1995)
MATH MathSciNet Google Scholar
Garey, M.R., Johnson, D.S.: Computers and intractability: A guide to the theory of NP-Completeness. W. H. Freeman & Co., New York (1979)
MATH Google Scholar
Downey, R.G., Fellows, M.R.: Parameterized Complexity. In: Monographs in computer science. Springer, New York (1999)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD (1993)
Google Scholar
Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 331–339 (1995)
Google Scholar
Newman, D., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

HIIT Basic Research Unit, Department of Computer Science, University of Helsinki, P.O. Box 68, FIN-00014, Finland
Pauli Miettinen, Taneli Mielikäinen, Aristides Gionis & Heikki Mannila
Computer Science and Engineering Department, University of Texas at Arlington, Arlington, TX, 76019, USA
Gautam Das

Authors

Pauli Miettinen
View author publications
You can also search for this author in PubMed Google Scholar
Taneli Mielikäinen
View author publications
You can also search for this author in PubMed Google Scholar
Aristides Gionis
View author publications
You can also search for this author in PubMed Google Scholar
Gautam Das
View author publications
You can also search for this author in PubMed Google Scholar
Heikki Mannila
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H. (2006). The Discrete Basis Problem. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science(), vol 4213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871637_33

Download citation

DOI: https://doi.org/10.1007/11871637_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45374-1
Online ISBN: 978-3-540-46048-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Discrete Basis Problem

Abstract

Chapter PDF

Similar content being viewed by others

The Hadamard decomposition problem

Matrix Sparsification and the Sparse Null Space Problem

Finding a low-rank basis in a matrix subspace

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

The Discrete Basis Problem

Abstract

Chapter PDF

Similar content being viewed by others

The Hadamard decomposition problem

Matrix Sparsification and the Sparse Null Space Problem

Finding a low-rank basis in a matrix subspace

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation