Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1835804.1835919acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Mixture models for learning low-dimensional roles in high-dimensional data

Published: 25 July 2010 Publication History

Abstract

Archived data often describe entities that participate in multiple roles. Each of these roles may influence various aspects of the data. For example, a register transaction collected at a retail store may have been initiated by a person who is a woman, a mother, an avid reader, and an action movie fan. Each of these roles can influence various aspects of the customer's purchase: the fact that the customer is a mother may greatly influence the purchase of a toddler-sized pair of pants, but have no influence on the purchase of an action-adventure novel. The fact that the customer is an action move fan and an avid reader may influence the purchase of the novel, but will have no effect on the purchase of a shirt.
In this paper, we present a generic, Bayesian framework for capturing exactly this situation. In our framework, it is assumed that multiple roles exist, and each data point corresponds to an entity (such as a retail customer, or an email, or a news article) that selects various roles which compete to influence the various attributes associated with the data point. We develop robust, MCMC algorithms for learning the models under the framework.

Supplementary Material

JPG File (kdd2010_somaiya_mml_01.jpg)
MOV File (kdd2010_somaiya_mml_01.mov)

References

[1]
C. C. Aggarwal, J. L. Wolf, P. S. Yu, C. Procopiuc, and J. S. Park. Fast algorithms for projected clustering. In SIGMOD, pages 61--72, New York, NY, USA, 1999. ACM Press.
[2]
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD, pages 94--105, New York, NY, USA, 1998. ACM Press.
[3]
A. Asuncion and D. Newman. UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html, 2007.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.
[5]
I. Cadez, S. Gaffney, and P. Smyth. A general probabilistic framework for clustering individuals and objects. In KDD, pages 140--149, 2000.
[6]
I. Cadez, P. Smyth, and H. Mannila. Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction. In KDD, pages 37--46, 2001.
[7]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of Royal Statistical Society, B-39:1--39, 1977.
[8]
T. Griffiths and Z. Ghahramani. Infinite latent feature models and the indian buffet process. In Y. Weiss, B. Schölkopf, and J. Platt, editors, NIPS 18, pages 475--482. MIT Press, Cambridge, MA, 2006.
[9]
K. Heller and Z. Ghahramani. A nonparametric bayesian approach to modeling overlapping clusters. In AISATS. The Society for Artificial Intelligence and Statistics, 2007.
[10]
K. A. Heller, S. Williamson, and Z. Ghahramani. Statistical models for partial membership. In ICML, pages 392--399, 2008.
[11]
S. Kullback and R. A. Leibler. On information and sufficiency. The Annals of Mathematical Statistics, 22(1):79--86, 1951.
[12]
J. S. Liu. The collapsed gibbs sampler in bayesian computations with applications to a gene regulation problem. Journal of the American Statistical Association, 89(427):958--966, 1994.
[13]
J. S. Liu, W. H. Wong, and A. Kong. Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika, 81(1):27--40, 1994.
[14]
G. J. McLachlan and K. E. Basford. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York, 1988.
[15]
G. J. McLachlan and D. Peel. Finite Mixture Models. Wiley, New York, 2000.
[16]
J. V. Neumann. Various techniques used in connection with random digits. Applied Math Series, 1951.
[17]
M. F. Porter. Snowball: A language for stemming algorithms. http://www.snowball.tartarus.org/texts/introduction.html, 2001.
[18]
A. E. Raftery and S. Lewis. How many iterations in the gibbs sampler? In Bayesian Statistics, volume 4, pages 763--773. Oxford University Press, 1992.
[19]
Reuters news and video archive. http://www.reuters.com/resources/archive/us/index.html.
[20]
C. P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer, 2005.
[21]
M. Somaiya, C. Jermaine, and S. Ranka. Learning correlations using the mixture-of-subsets model. ACM Trans. Knowl. Discov. Data, 1(4):1--42, 2008.
[22]
M. Somaiya, C. Jermaine, and S. Ranka. Various experiments using POWER models. http://www.cise.ufl.edu/~ranka/power/, 2010.

Cited By

View all
  • (2013)Guided learning for role discovery (GLRD)Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2487575.2487620(113-121)Online publication date: 11-Aug-2013
  • (2013)Mixed Membership Subspace Clustering2013 IEEE 13th International Conference on Data Mining10.1109/ICDM.2013.109(221-230)Online publication date: Dec-2013
  • (2012)A global local modeling of internet usage in large mobile societiesProceedings of the 7th ACM workshop on Performance monitoring and measurement of heterogeneous wireless and wired networks10.1145/2387191.2387202(69-76)Online publication date: 21-Oct-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
July 2010
1240 pages
ISBN:9781450300551
DOI:10.1145/1835804
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. high-dimensional data
  2. mcmc
  3. mixture models

Qualifiers

  • Research-article

Conference

KDD '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2013)Guided learning for role discovery (GLRD)Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2487575.2487620(113-121)Online publication date: 11-Aug-2013
  • (2013)Mixed Membership Subspace Clustering2013 IEEE 13th International Conference on Data Mining10.1109/ICDM.2013.109(221-230)Online publication date: Dec-2013
  • (2012)A global local modeling of internet usage in large mobile societiesProceedings of the 7th ACM workshop on Performance monitoring and measurement of heterogeneous wireless and wired networks10.1145/2387191.2387202(69-76)Online publication date: 21-Oct-2012
  • (2012)RolXProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2339530.2339723(1231-1239)Online publication date: 12-Aug-2012
  • (2012)Multi-view clustering using mixture models in subspace projectionsProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2339530.2339553(132-140)Online publication date: 12-Aug-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media