Nothing Special   »   [go: up one dir, main page]

Skip to main content

Dimensionality Reduction in Boolean Data: Comparison of Four BMF Methods

  • Conference paper
  • First Online:
Clustering High--Dimensional Data (CHDD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7627))

Included in the following conference series:

Abstract

We compare four methods for Boolean matrix factorization (BMF). The oldest of these methods is the 8M method implemented in the BMDP statistical software package developed in the 1960s. The three other methods were developed recently. All the methods compute from an input object-attribute matrix I two matrices, namely an object-factor matrix A and a factor-attribute matrix B in such a way that the Boolean matrix product of A and B is approximately equal to I. Such decompositions are utilized directly in Boolean factor analysis or indirectly as a dimensionality reduction method for Boolean data in machine learning. While some comparison of the BMF methods with matrix decomposition methods designed for real valued data exists in the literature, a mutual comparison of the various BMF methods is a severely neglected topic. In this paper, we compare the four methods on real datasets. In particular, we observe the reconstruction ability of the first few computed factors as well as the number of computed factors necessary to fully reconstruct the input matrix, i.e. the approximation to the Boolean rank of I computed by the methods. In addition, we present some general remarks on all the methods being compared.

E. Bartl and H. Rezankova acknowledge support by Grant No. 202/10/0262 of the Czech Science Foundation. R. Belohlavek and P. Osicka acknowledge support by the ESF project No. CZ.1.07/2.3.00/20.0059, the project is co-financed by the European Social Fund and the state budget of the Czech Republic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We thank P. Miettinen for providing us with the datasets DBLP, DNA, and Paleo.

References

  1. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html

  2. Belohlavek, R.: Optimal decompositions of matrices with entries from residuated lattices. J. Logic Comput., 7 September 2011. doi:10.1093/logcom/exr023

  3. Belohlavek, R., Vychodil, V.: Factor analysis of incidence data via novel decomposition of matrices. In: Ferré, S., Rudolph, S. (eds.) ICFCA 2009. LNCS (LNAI), vol. 5548, pp. 83–97. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Belohlavek, R., Vychodil, V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. Comput. Syst. Sci. 76(1), 3–20 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  5. Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., Plemmons, R.J.: Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal. 52, 155–173 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  6. Cudeck, R., MacCallum, R.C. (eds.): Factor Analysis at 100: Historical Developments and Future Directions. Lawrence Erlbaum Associates Inc., Hillsdale (2007)

    Google Scholar 

  7. Fortelius, M., et al.: Neogene of the old world database of fossil mammals (NOW) (2003). http://www.helsinki.fi/science/now/

  8. Frolov, A.A., Húsek, D., Polyakov, P.A.: Boolean factor analysis by Hopfield-like autoassociative memory. IEEE Trans. Neural Networks 18(3), 698–707 (2007)

    Article  Google Scholar 

  9. Ganter, B., Wille, R.: Formal Concept Analysis. Mathematical Foundations. Springer, Berlin (1999)

    Book  MATH  Google Scholar 

  10. Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 278–289. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Golub, G.A., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1995)

    MATH  Google Scholar 

  12. Harman, H.H.: Modern Factor Analysis, 2nd edn. The Univ. Chicago Press, Chicago (1970)

    MATH  Google Scholar 

  13. Kim, K.H.: Boolean Matrix Theory and Applications. M. Dekker, New York (1982)

    MATH  Google Scholar 

  14. Lee, D., Seung, H.: Learning parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)

    Article  Google Scholar 

  15. Leeuw, J.D.: Principal component analysis of binary data. Application to roll-call analysis (2003). http://gifi.stat.ucla.edu

  16. Lu, H., Vaidya, J., Atluri, V.: Optimal Boolean matrix decomposition: application to role engineering. In: Proceedings of IEEE ICDE 2008, pp. 297–306 (2008)

    Google Scholar 

  17. McDonald, R.P.: Factor Analysis and Related Methods. Lawrence Erlbaum Associates Inc., McHorney (1985)

    Google Scholar 

  18. Mickey, M.R., Mundle, P., Engelman, L.: Boolean factor analysis. In: Dixon, W.J. (ed.) BMDP Statistical Software Manual, vol. 2, pp. 849–860. University of California Press, Berkeley (1990). http://www.statistical-solutions-software.com/products-page/bmdp-statistical-software/

    Google Scholar 

  19. Miettinen, P.: Sparse Boolean matrix factorizations. In: Proceedings of 10th IEEE International Conference on Data Minig (ICDM2010), pp. 935–940 (2010)

    Google Scholar 

  20. Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. Knowl. Data Eng. 20(10), 1348–1362 (2008). preliminary version in PKDD 2006, pp. 335–346

    Article  Google Scholar 

  21. Monson, D.S., Pullman, J.N.: A survey of clique and biclique coverings and factorizations of (0,1)-matrices. Bull. ICA 14, 17–86 (1995)

    MathSciNet  MATH  Google Scholar 

  22. Myllykangas, S., Himberg, J., Böhling, T., Nagy, B., Hollmén, J., Knuutila, S.: DNA copy number amplification profiling of human neoplasms. Oncogene 25(55), 7324–7332 (2006)

    Article  Google Scholar 

  23. Nau, D.S.: Specificity covering: immunological and other applications, computational complexity and other mathematical properties, and a computer program. A.M. Thesis, Technical report CS-1976-7, Computer Sci. Dept., Duke Univ., Durham, N.C. (1976)

    Google Scholar 

  24. Nau, D.S., Markowsky, G., Woodbury, M.A., Amos, D.B.: A mathematical analysis of human leukocyte antigen serology. Math. Biosci. 40, 243–270 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  25. Outrata, J.: Boolean factor analysis for data preprocessing in machine learning. In: Proceedins of ICML 2010, Washington, D.C., USA, pp. 899–902 (2010)

    Google Scholar 

  26. Orlitsky, S.A.: Semi-parametric exponential family PCA. In: Saul, L.K., et al. (eds.) Advances in Neural Information Processing Systems 17. MIT Press, Cambridge (2005). http://books.nips.cc/papers/files/nips17/NIPS2004_0152.pdf

    Google Scholar 

  27. Seppänen, J.K., Bingham, E., Mannila, H.: A simple algorithm for topic identification in 0–1 data. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 423–434. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  28. Schein, A., Saul, L., Ungar, L.: A generalized linear model for principal component analysis of binary data. In: Proceedings of International Workshop on Artificial Intelligence and Statistics, pp. 14–21 (2003)

    Google Scholar 

  29. Stockmeyer, L.J.: The set basis problem is NP-complete. IBM Research Report RC5431, Yorktown Heights, NY (1975)

    Google Scholar 

  30. Tang, F., Tao, H.: Binary principal component analysis. In: Proceedings of British Machine Vision Conference 2006, pp. 377–386 (2006)

    Google Scholar 

  31. Tatti, N., Mielikäinen, T., Gionis, A., Mannila, H.: What is the dimension of your binary data? In: The 2006 IEEE Conference on Data Mining (ICDM 2006), pp. 603–612. IEEE Computer Society (2006)

    Google Scholar 

  32. Vaidya, J., Atluri, V., Guo, Q.: The role mining problem: finding a minimal descriptive set of roles. In: ACM Symposium on Access Control Models and Technologies, pp. 175–184, June 2007

    Google Scholar 

  33. Zivkovic, Z., Verbeek, J.: Transformation invariant component analysis for binary images. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 1, pp. 254–259 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petr Osicka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bartl, E., Belohlavek, R., Osicka, P., Řezanková, H. (2015). Dimensionality Reduction in Boolean Data: Comparison of Four BMF Methods. In: Masulli, F., Petrosino, A., Rovetta, S. (eds) Clustering High--Dimensional Data. CHDD 2012. Lecture Notes in Computer Science(), vol 7627. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48577-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48577-4_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48576-7

  • Online ISBN: 978-3-662-48577-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics