Authors:
Sahar Yazdi
1
;
Fatma Najar
2
and
Nizar Bouguila
1
Affiliations:
1
Concordia Institute for Information Systems Engineering (CIISE), Concordia University, Montreal, QC., Canada
;
2
City University of New York (CUNY), John Jay College, New York, NY, U.S.A.
Keyword(s):
Conditional Naive Bayes Model (CNB), Latent Dirichlet Allocation (LDA), LD-CNB Model, LGD-CNB Model, LBL-CNB Model.
Abstract:
Given the fact that the prevalence of big data continues to evolve, the importance of information retrieval techniques becomes increasingly crucial. Numerous models have been developed to uncover the latent structure within data, aiming to extract necessary information or categorize related patterns. However, data is not uniformly distributed, and a substantial portion often contains empty or missing values, leading to the challenge of ”data sparsity”. Traditional probabilistic models, while effective in revealing latent structures, lack mechanisms to address data sparsity. To overcome this challenge, we explored generalized forms of the Dirichlet distributions as priors to hierarchical Bayesian models namely the generalized Dirichlet distribution (LGD-CNB model) and the Beta-Liouville distribution (LBL-CNB model). Our study evaluates the performance of these models in two sets of experiments, employing Gaussian and Discrete distributions as examples of exponential family distributio
ns. Results demonstrate that using GD distribution and BL distribution as priors enhances the model learning process and surpass the performance of the LD-CNB model in each case.
(More)