Dimensionality Reduction in Boolean Data: Comparison of Four BMF Methods

Eduard Bartl¹⁶,
Radim Belohlavek¹⁶,
Petr Osicka¹⁶ &
…
Hana Řezanková¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7627))

Included in the following conference series:

International Workshop on Clustering High-Dimensional Data

1219 Accesses

Abstract

We compare four methods for Boolean matrix factorization (BMF). The oldest of these methods is the 8M method implemented in the BMDP statistical software package developed in the 1960s. The three other methods were developed recently. All the methods compute from an input object-attribute matrix I two matrices, namely an object-factor matrix A and a factor-attribute matrix B in such a way that the Boolean matrix product of A and B is approximately equal to I. Such decompositions are utilized directly in Boolean factor analysis or indirectly as a dimensionality reduction method for Boolean data in machine learning. While some comparison of the BMF methods with matrix decomposition methods designed for real valued data exists in the literature, a mutual comparison of the various BMF methods is a severely neglected topic. In this paper, we compare the four methods on real datasets. In particular, we observe the reconstruction ability of the first few computed factors as well as the number of computed factors necessary to fully reconstruct the input matrix, i.e. the approximation to the Boolean rank of I computed by the methods. In addition, we present some general remarks on all the methods being compared.

E. Bartl and H. Rezankova acknowledge support by Grant No. 202/10/0262 of the Czech Science Foundation. R. Belohlavek and P. Osicka acknowledge support by the ESF project No. CZ.1.07/2.3.00/20.0059, the project is co-financed by the European Social Fund and the state budget of the Czech Republic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Revisiting data reduction for boolean matrix factorization algorithms based on formal concept analysis

Article 11 June 2024

FastStep: Scalable Boolean Matrix Decomposition

Reducing Negative Impact of Noise in Boolean Matrix Factorization with Association Rules

Notes

1.
We thank P. Miettinen for providing us with the datasets DBLP, DNA, and Paleo.

References

Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html
Belohlavek, R.: Optimal decompositions of matrices with entries from residuated lattices. J. Logic Comput., 7 September 2011. doi:10.1093/logcom/exr023
Belohlavek, R., Vychodil, V.: Factor analysis of incidence data via novel decomposition of matrices. In: Ferré, S., Rudolph, S. (eds.) ICFCA 2009. LNCS (LNAI), vol. 5548, pp. 83–97. Springer, Heidelberg (2009)
Chapter Google Scholar
Belohlavek, R., Vychodil, V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. Comput. Syst. Sci. 76(1), 3–20 (2010)
Article MathSciNet MATH Google Scholar
Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., Plemmons, R.J.: Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal. 52, 155–173 (2007)
Article MathSciNet MATH Google Scholar
Cudeck, R., MacCallum, R.C. (eds.): Factor Analysis at 100: Historical Developments and Future Directions. Lawrence Erlbaum Associates Inc., Hillsdale (2007)
Google Scholar
Fortelius, M., et al.: Neogene of the old world database of fossil mammals (NOW) (2003). http://www.helsinki.fi/science/now/
Frolov, A.A., Húsek, D., Polyakov, P.A.: Boolean factor analysis by Hopfield-like autoassociative memory. IEEE Trans. Neural Networks 18(3), 698–707 (2007)
Article Google Scholar
Ganter, B., Wille, R.: Formal Concept Analysis. Mathematical Foundations. Springer, Berlin (1999)
Book MATH Google Scholar
Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 278–289. Springer, Heidelberg (2004)
Chapter Google Scholar
Golub, G.A., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1995)
MATH Google Scholar
Harman, H.H.: Modern Factor Analysis, 2nd edn. The Univ. Chicago Press, Chicago (1970)
MATH Google Scholar
Kim, K.H.: Boolean Matrix Theory and Applications. M. Dekker, New York (1982)
MATH Google Scholar
Lee, D., Seung, H.: Learning parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Leeuw, J.D.: Principal component analysis of binary data. Application to roll-call analysis (2003). http://gifi.stat.ucla.edu
Lu, H., Vaidya, J., Atluri, V.: Optimal Boolean matrix decomposition: application to role engineering. In: Proceedings of IEEE ICDE 2008, pp. 297–306 (2008)
Google Scholar
McDonald, R.P.: Factor Analysis and Related Methods. Lawrence Erlbaum Associates Inc., McHorney (1985)
Google Scholar
Mickey, M.R., Mundle, P., Engelman, L.: Boolean factor analysis. In: Dixon, W.J. (ed.) BMDP Statistical Software Manual, vol. 2, pp. 849–860. University of California Press, Berkeley (1990). http://www.statistical-solutions-software.com/products-page/bmdp-statistical-software/
Google Scholar
Miettinen, P.: Sparse Boolean matrix factorizations. In: Proceedings of 10th IEEE International Conference on Data Minig (ICDM2010), pp. 935–940 (2010)
Google Scholar
Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. Knowl. Data Eng. 20(10), 1348–1362 (2008). preliminary version in PKDD 2006, pp. 335–346
Article Google Scholar
Monson, D.S., Pullman, J.N.: A survey of clique and biclique coverings and factorizations of (0,1)-matrices. Bull. ICA 14, 17–86 (1995)
MathSciNet MATH Google Scholar
Myllykangas, S., Himberg, J., Böhling, T., Nagy, B., Hollmén, J., Knuutila, S.: DNA copy number amplification profiling of human neoplasms. Oncogene 25(55), 7324–7332 (2006)
Article Google Scholar
Nau, D.S.: Specificity covering: immunological and other applications, computational complexity and other mathematical properties, and a computer program. A.M. Thesis, Technical report CS-1976-7, Computer Sci. Dept., Duke Univ., Durham, N.C. (1976)
Google Scholar
Nau, D.S., Markowsky, G., Woodbury, M.A., Amos, D.B.: A mathematical analysis of human leukocyte antigen serology. Math. Biosci. 40, 243–270 (1978)
Article MathSciNet MATH Google Scholar
Outrata, J.: Boolean factor analysis for data preprocessing in machine learning. In: Proceedins of ICML 2010, Washington, D.C., USA, pp. 899–902 (2010)
Google Scholar
Orlitsky, S.A.: Semi-parametric exponential family PCA. In: Saul, L.K., et al. (eds.) Advances in Neural Information Processing Systems 17. MIT Press, Cambridge (2005). http://books.nips.cc/papers/files/nips17/NIPS2004_0152.pdf
Google Scholar
Seppänen, J.K., Bingham, E., Mannila, H.: A simple algorithm for topic identification in 0–1 data. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 423–434. Springer, Heidelberg (2003)
Chapter Google Scholar
Schein, A., Saul, L., Ungar, L.: A generalized linear model for principal component analysis of binary data. In: Proceedings of International Workshop on Artificial Intelligence and Statistics, pp. 14–21 (2003)
Google Scholar
Stockmeyer, L.J.: The set basis problem is NP-complete. IBM Research Report RC5431, Yorktown Heights, NY (1975)
Google Scholar
Tang, F., Tao, H.: Binary principal component analysis. In: Proceedings of British Machine Vision Conference 2006, pp. 377–386 (2006)
Google Scholar
Tatti, N., Mielikäinen, T., Gionis, A., Mannila, H.: What is the dimension of your binary data? In: The 2006 IEEE Conference on Data Mining (ICDM 2006), pp. 603–612. IEEE Computer Society (2006)
Google Scholar
Vaidya, J., Atluri, V., Guo, Q.: The role mining problem: finding a minimal descriptive set of roles. In: ACM Symposium on Access Control Models and Technologies, pp. 175–184, June 2007
Google Scholar
Zivkovic, Z., Verbeek, J.: Transformation invariant component analysis for binary images. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 1, pp. 254–259 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Data Analysis and Modeling Laboratory (DAMOL), Department of Computer Science, Palacky Univeristy, Olomouc, Czech Republic
Eduard Bartl, Radim Belohlavek & Petr Osicka
Department of Statistics and Probability, Faculty of Informatics and Statistics, University of Economics, Prague, Czech Republic
Hana Řezanková

Authors

Eduard Bartl
View author publications
You can also search for this author in PubMed Google Scholar
Radim Belohlavek
View author publications
You can also search for this author in PubMed Google Scholar
Petr Osicka
View author publications
You can also search for this author in PubMed Google Scholar
Hana Řezanková
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Petr Osicka .

Editor information

Editors and Affiliations

DIBRIS, University of Genoa DIBRIS, Genoa, Italy
Francesco Masulli
University of Naples "Parthenope", Naples, Italy
Alfredo Petrosino
DIBRIS, University of Genoa, Genoa, Italy
Stefano Rovetta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bartl, E., Belohlavek, R., Osicka, P., Řezanková, H. (2015). Dimensionality Reduction in Boolean Data: Comparison of Four BMF Methods. In: Masulli, F., Petrosino, A., Rovetta, S. (eds) Clustering High--Dimensional Data. CHDD 2012. Lecture Notes in Computer Science(), vol 7627. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48577-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-662-48577-4_8
Published: 25 November 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48576-7
Online ISBN: 978-3-662-48577-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics