BiTTM: A Core Biterms-Based Topic Model for Targeted Analysis
<p>The graphical model of TTM.</p> "> Figure 2
<p>The graphical model of BTM.</p> "> Figure 3
<p>The graphical model of BiTTM.</p> "> Figure 4
<p>Effect of various targets on topic coherence. The <span class="html-italic">x</span>-axis shows the word frequency of targets, and the y-axis represents the percentage of documents containing a target.</p> "> Figure 5
<p>Time cost in datasets of different sizes.</p> "> Figure 6
<p>Time cost in datasets with different document lengths.</p> ">
Abstract
:1. Introduction
2. Related Work
2.1. Targeted Topic Models
- Draw
- For each relevant topic
- (a)
- Draw .
- (b)
- For each word
- i.
- Draw .
- (c)
- Draw .
- For each document
- (a)
- Draw .
- (b)
- raw relevance status r based on keyword indicator x and .
- (c)
- If the document is relevant
- i.
- Draw .
- ii.
- Draw
- (d)
- If the document is irrelevant
- i.
- Draw .
2.2. BTM
- Draw
- For each topic
- (a)
- Draw .
- For each biterm
- (a)
- Draw
- (b)
- Draw
2.3. Other Topic Models
3. BiTTM
3.1. Core Biterms
- Synonyms. For example, if the supplied query keyword is “bath”, relevant documents containing words representing similar semantics, such as “shower”, may be missed.
- Words referring to the same targeted aspect in a particular domain. For example, when the domain is confined to Amazon reviews of baby products, the keywords “crib” and “bed” represent the same aspect, although they are not exactly synonyms.
- Words describing the same event. Users often use diverse words to refer to the same event, especially in social networks. For example, considering the Twitter dataset of Oscars, both “mistake” and “oscarsfail” are used to describe the event of a wrong envelope for the Best Picture Award.
Algorithm 1: Preprocessing based on biterms |
3.2. Model Description & Generative Process
- Draw .
- Draw .
- For each target-relevant topic
- (a)
- Draw .
- (b)
- For each word
- Draw .
- (c)
- Draw .
- For each biterm
- (a)
- Draw .
- (b)
- Compute r based on x and .
- (c)
- If is relevant to the target
- Draw .
- Draw , and
- Draw .
- (d)
- If is irrelevant
- Draw .
3.3. Inference
4. Experimental Results
4.1. Baselines and Metrics
- TTM. Targeted Topic Model is the first method for focused analysis that extracts related topics according to a target keyword provided by users. We select TTM rather than APSUM as the baseline of specialised topic models for targeted analysis because TTM outperforms APSUM in terms of topic coherence when the number of topics is less than 50 [3]. For targeted analysis of fine-grained topics, we believe the number of topics in a given corpus is usually less than 50. Moreover, TTM serves as the most valuable comparison because APSUM is not exactly designed for targeted analysis.
- BTM. As our model is developed based on biterms, we also compare with two variations of BTM that are adapted for targeted analysis. BTM is a state-of-the-art topic model for short texts, which also applies to long texts [2]. As a typical full-analysis model, BTM aims to find all topics (or all aspects) from the entire corpus. We then use a filtering strategy to eliminate topics that do not contain the target keywords. This approach is named as BTM for simplicity.
- BTM-PD. This is another variation of BTM which applies the pre-filtering strategy to perform focused analysis. We use only the subset of documents containing the target keywords to model topics. As discussed before, the pre-filtering strategy is handicapped by the variability of target keyword—relevant documents may be filtered so that topics may be missed out.
4.2. Data sets & Experimental Settings
4.3. Quantitative Evaluation
4.4. Time Efficiency Analysis
4.5. Qualitative Evaluation
4.5.1. Discovering Relevant Topics
4.5.2. Handling Semantically Approximate Targets
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Acknowledgments
Conflicts of Interest
References
- Wang, S.; Chen, Z.; Fei, G.; Liu, B.; Emery, S. Targeted Topic Modeling for Focused Analysis. In Proceedings of the ACM SIGKDD International Conference, San Francisco, CA, USA, 13–17 August 2016; pp. 1235–1244. [Google Scholar]
- Cheng, X.; Yan, X.; Lan, Y.; Guo, J. BTM: Topic Modeling over Short Texts. IEEE Trans. Knowl. Data Eng. 2014, 26, 2928–2941. [Google Scholar] [CrossRef]
- Rakesh, V.; Ding, W.; Ahuja, A.; Rao, N.; Sun, Y.; Reddy, C.K. A Sparse Topic Model for Extracting Aspect-Specific Summaries from Online Reviews. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, 23–27 April 2018; pp. 1573–1582. [Google Scholar] [CrossRef] [Green Version]
- Kim, H.; Choi, D.; Drake, B.L.; Endert, A.; Park, H. TopicSifter: Interactive Search Space Reduction through Targeted Topic Modeling. In Proceedings of the 14th IEEE Conference on Visual Analytics Science and Technology, IEEE VAST 2019, Vancouver, BC, Canada, 20–25 October 2019; pp. 35–45. [Google Scholar] [CrossRef] [Green Version]
- He, J.; Li, L.; Wang, Y.; Wu, X. Hierarchical features-based targeted aspect extraction from online reviews. Intell. Data Anal. 2021, 25, 205–223. [Google Scholar] [CrossRef]
- Nguyen, T.; Pham, T.; Le, H.; Nguyen, T.; Bui, H.; Ha, Q. A Targeted Topic Model based Multi-Label Deep Learning Classification Framework for Aspect-based Opinion Mining. In Proceedings of the 12th International Conference on Knowledge and Systems Engineering, KSE 2020, Can Tho City, Vietnam, 12–14 November 2020; pp. 165–170. [Google Scholar]
- Li, S.; Zhang, Y.; Pan, R.; Mao, M.; Yang, Y. Recurrent Attentional Topic Model. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 3223–3229. [Google Scholar]
- Cai, G.; Peng, L.; Wang, Y. Topic Detection and Evolution Analysis on Microblog; Springer: Berlin/Heidelberg, Germany, 2014; pp. 67–77. [Google Scholar]
- Ye, C.; Liu, D.; Chen, N.; Lin, L. Mapping the topic evolution using citation-topic model and social network analysis. In Proceedings of the International Conference on Fuzzy Systems and Knowledge Discovery, Zhangjiajie, China, 15–17 August 2016; pp. 2648–2653. [Google Scholar]
- Xia, Y.; Tang, N.; Hussain, A.; Cambria, E. Discriminative Bi-Term Topic Model for Headline-Based Social News Clustering. In Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2015, Hollywood, FL, USA, 18–20 May 2015; pp. 311–316. [Google Scholar]
- Amara, A.; Taieb, M.A.H.; Aouicha, M.B. Multilingual topic modeling for tracking COVID-19 trends based on Facebook data analysis. Appl. Intell. 2021, 51, 3052–3073. [Google Scholar] [CrossRef]
- Hu, Y.; Tai, C.; Liu, K.E.; Cai, C. Identification of highly-cited papers using topic-model-based and bibliometric features: The consideration of keyword popularity. J. Inf. 2020, 14, 101004. [Google Scholar] [CrossRef]
- Zhang, W.; Wang, J. Integrating Topic and Latent Factors for Scalable Personalized Review-based Rating Prediction. IEEE Trans. Knowl. Data Eng. 2016, 28, 3013–3027. [Google Scholar] [CrossRef]
- Wang, H.; Li, W. Relational Collaborative Topic Regression for Recommender Systems. IEEE Trans. Knowl. Data Eng. 2015, 27, 1343–1355. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Chen, W.; Zha, H.; Gu, X. A Time-Topic Coupled LDA Model for IPTV User Behaviors. IEEE Trans. Broadcast. 2015, 61, 56–65. [Google Scholar] [CrossRef]
- Hu, C.; Hu, Y.; Xu, W.; Shi, P.; Fu, S. Understanding Popularity Evolution Patterns of Hot Topics Based on Time Series Features. In Proceedings of the Web Technologies and Applications—APWeb 2014 Workshops, SNA, NIS, and IoTS, Changsha, China, 5 September 2014; pp. 58–68. [Google Scholar]
- Feuerriegel, S.; Ratku, A.; Neumann, D. Analysis of How Underlying Topics in Financial News Affect Stock Prices Using Latent Dirichlet Allocation. In Proceedings of the Hawaii International Conference on System Sciences, HICSS 2016, Koloa, HI, USA, 5–8 January 2016; pp. 1072–1081. [Google Scholar]
- Viermetz, M.; Skubacz, M.; Ziegler, C.N.; Seipel, D. Tracking Topic Evolution in News Environments. In Proceedings of the IEEE Conference on E-Commerce Technology and the Fifth IEEE Conference on Enterprise Computing, E-Commerce and E-Services, Washington, DC, USA, 21–14 July 2008; pp. 215–220. [Google Scholar]
- Phuong, D.V.; Phuong, T.M. A keyword-topic model for contextual advertising. In Proceedings of the Symposium on Information and Communication Technology 2012, SoICT ’12, Halong City, Vietnam, 23–24 August 2012; pp. 63–70. [Google Scholar]
- Kalyanam, J.; Mantrach, A.; Saez-Trumper, D.; Vahabi, H.; Lanckriet, G. Leveraging Social Context for Modeling Topic Evolution. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; pp. 517–526. [Google Scholar]
- Sordo, M.; Ogihara, M.; Wuchty, S. Analysis of the Evolution of Research Groups and Topics in the ISMIR Conference. In Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR 2015, Málaga, Spain, 26–30 October 2015; pp. 204–210. [Google Scholar]
- Zhao, B.; Xu, W.; Ji, G.; Tan, C. Discovering Topic Evolution Topology in a Microblog Corpus. In Proceedings of the Third International Conference on Advanced Cloud and Big Data, Yangzhou, Jiangsu, China, 30 October–1 November 2015; pp. 7–14. [Google Scholar]
- Gou, Z.; Li, Y. A method of query expansion based on topic models and user profile for search in folksonomy. J. Intell. Fuzzy Syst. 2021, 41, 1701–1711. [Google Scholar] [CrossRef]
- Sperrle, F.; Schäfer, H.; Keim, D.A.; El-Assady, M. Learning Contextualized User Preferences for Co-Adaptive Guidance in Mixed-Initiative Topic Model Refinement. Comput. Graph. Forum 2021, 40, 215–226. [Google Scholar] [CrossRef]
- Lin, T.; Tian, W.; Mei, Q.; Cheng, H. The dual-sparse topic model: Mining focused topics and focused terms in short text. In Proceedings of the 23rd International World Wide Web Conference, WWW ’14, Seoul, Korea, 7–11 April 2014; pp. 539–550. [Google Scholar] [CrossRef]
- Chien, J.T.; Chang, Y.L. Bayesian Sparse Topic Model. J. Signal Process. Syst. 2014, 74, 375–389. [Google Scholar] [CrossRef]
- Slutsky, A.; Hu, X.; An, Y. Learning Focused Hierarchical Topic Models with Semi-Supervision in Microblogs. In Proceedings of the Advances in Knowledge Discovery and Data Mining—19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, 19–22 May 2015; Part II. pp. 598–609. [Google Scholar]
- Chen, X.; Zhou, M.; Carin, L. The contextual focused topic model. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 96–104. [Google Scholar]
- Pu, X.; Jin, R.; Wu, G.; Han, D.; Xue, G.R. Topic Modeling in Semantic Space with Keywords. In Proceedings of the ACM International on Conference on Information and Knowledge Management, Melbourne, VIC, Australia, 19–23 October 2015; pp. 1141–1150. [Google Scholar]
- Williamson, S.; Wang, C.; Heller, K.A.; Blei, D.M. The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling. In Proceedings of the International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 1151–1158. [Google Scholar]
- Zhu, B.; Cai, Y.; Zhang, H. Sparse Biterm Topic Model for Short Texts. In Proceedings of the Web and Big Data—5th International Joint Conference, APWeb-WAIM 2021, Guangzhou, China, 23–25 August 2021; Part I; Lecture Notes in Computer Science. Hou, U.L., Spaniol, M., Sakurai, Y., Chen, J., Eds.; Springer: Cham, Switzerland, 2021; Volume 12858, pp. 227–241. [Google Scholar]
- Shi, L.; Du, J.; Kou, F. A sparse topic model for bursty topic discovery in social networks. Int. Arab J. Inf. Technol. 2020, 17, 816–824. [Google Scholar] [CrossRef]
- Wang, C.; Blei, D.M. Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. In Proceedings of the International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; pp. 1982–1989. [Google Scholar]
- Griffiths, T.L.; Steyvers, M. Finding scientific topics. Proc. Natl. Acad. Sci. USA 2004, 101 (Suppl. S1), 5228. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mimno, D.; Wallach, H.M.; Talley, E.; Leenders, M.; Mccallum, A. Optimizing semantic coherence in topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, UK, 27–31 July 2011; pp. 262–272. [Google Scholar]
- Yao, L.; Zhang, Y.; Wei, B.; Qian, H.; Wang, Y. Incorporating Probabilistic Knowledge into Topic Models. In Proceedings of the Advances in Knowledge Discovery and Data Mining—19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, 19–22 May 2015; Part II. pp. 586–597. [Google Scholar] [CrossRef]
- Arora, S.; Ge, R.; Halpern, Y.; Mimno, D.; Moitra, A.; Sontag, D.; Wu, Y.; Zhu, M. A Practical Algorithm for Topic Modeling with Provable Guarantees. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 280–288. [Google Scholar]
- Li, C.; Wang, H.; Zhang, Z.; Sun, A.; Ma, Z. Topic Modeling for Short Texts with Auxiliary Word Embeddings. In Proceedings of the International Acm Sigir Conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, 17–21 July 2016; pp. 165–174. [Google Scholar]
- Allahyari, M.; Kochut, K. Automatic Topic Labeling Using Ontology-Based Topic Models. In Proceedings of the IEEE International Conference on Machine Learning and Applications, ICMLA 2015, Miami, FL, USA, 9–11 December 2015; pp. 259–264. [Google Scholar]
- Huang, J.; Peng, M.; Wang, H.; Cao, J.; Gao, W.; Zhang, X. A probabilistic method for emerging topic tracking in Microblog stream. World Wide-Web-Internet Web Inf. Syst. 2017, 20, 325–350. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Bollegala, D.; Hayashi, K.; Kawarabayashi, K. Think Globally, Embed Locally—Locally Linear Meta-embedding of Words. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden, 13–19 July 2018; pp. 3970–3976. [Google Scholar] [CrossRef] [Green Version]
- Zhang, P.; Wang, S.; Li, D.; Li, X.; Xu, Z. Combine Topic Modeling with Semantic Embedding: Embedding Enhanced Topic Model. IEEE Trans. Knowl. Data Eng. 2020, 32, 2322–2335. [Google Scholar] [CrossRef]
- Li, S.; Pan, R.; Luo, H.; Liu, X.; Zhao, G. Adaptive cross-contextual word embedding for word polysemy with unsupervised topic modeling. Knowl. Based Syst. 2021, 218, 106827. [Google Scholar] [CrossRef]
- Inoue, S.; Aida, T.; Komachi, M.; Asai, M. Modeling Text using the Continuous Space Topic Model with Pre-Trained Word Embeddings. In Proceedings of the ACL-IJCNLP 2021 Student Research Workshop, ACL 2021, Online, 5–10 July 2021; Kabbara, J., Lin, H., Paullada, A., Vamvas, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 138–147. [Google Scholar]
- Gupta, P.; Chaudhary, Y.; Schütze, H. Multi-source Neural Topic Modeling in Multi-view Embedding Spaces. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, 6–11 June 2021; Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 4205–4217. [Google Scholar]
Example 1 | Synonyms | Two candidate query keywords: “bath” and “shower”. |
Example 2 | Domain restriction | Two candidate query keywords, “crib” and “bed”, in the Amazon review dataset Baby. |
Example 3 | Event Description | Two candidate query keywords, “mistake” and “osarsfail”, in the Twitter dataset Oscars. |
Notation | Meaning |
---|---|
B | the set of (core) biterms |
W | the set of words |
D | the set of documents |
the bernoulli distribution over biterm b | |
beta prior of , Dirichlet prior of | |
topic-word distribution over the irrelevant topic | |
topic-word distribution over the kth relevant topic | |
beta prior of | |
word smoothing prior, weak word smoothing prior | |
bernoulli distribution of word selector | |
target indicator, word, topic, status | |
word selector of word w under topic k, the sum of word selector . | |
word selector of word w, the sum of word selector | |
the sum of word selector except the word w | |
the sum of word selector under topic k except the word w | |
, | the number of times word w is relevant (irrelevant) excluding biterm |
, | the number of relevant (irrelevant) biterms excluding biterm . |
the mth word in biterm , where | |
the number of times that word assigned to topic k under relevant status excluding | |
the total number of words assigned to topic k under relevant status excluding | |
the number of times that word assigned to topic k under irrelevant status excluding biterm | |
the total number of words assigned to topic k under irrelevant status excluding biterm | |
the number of times that word w is relevant | |
the total number of words that are relevant excluding word w |
Datasets | ||||
---|---|---|---|---|
Type | Source | Domain | Length | Size (KB) |
short | cigar | 2.947836 | 641 | |
ecig | 3.499578 | 708 | ||
Oscars | 3.906165 | 565 | ||
medium | baby | Amazon | 28.07813 | 141 |
camera | Amazon | 79.08307 | 1285 | |
computer | Amazon | 80.9001 | 1295 | |
long | home | Amazon | 179.4867 | 619 |
food | Amazon | 258.4938 | 195 | |
care | Amazon | 675.1493 | 1523 |
Datasets | BiTTM | |||||
TopM = 5 | TopM = 10 | TopM = 15 | TopM = 20 | TopM = 25 | TopM = 30 | |
cigar | −43.85160224 | −213.938738 | −515.6799558 | −942.5318026 | −1493.639111 | −2170.963155 |
ecig | −47.18005764 | −225.7818028 | −537.0456906 | −975.5867746 | −1540.959263 | −2233.885475 |
Oscars | −41.56838081 | −203.696938 | −495.7966598 | −913.9376539 | −1451.063314 | −2102.622643 |
baby | −18.77161638 | −97.92745779 | −258.704099 | −512.2443085 | −862.3028153 | −1314.514524 |
camera | −11.60928852 | −57.83209167 | −151.6040027 | −295.3511581 | −493.2409214 | −748.8210303 |
computer | −14.82764795 | −66.09833632 | −163.5166346 | −310.8723517 | −517.6532656 | −789.6438756 |
home | −9.146932399 | −45.84159746 | −117.7837843 | −239.7455224 | −418.9237371 | −656.914665 |
food | −7.47342957 | −42.52194503 | −109.7998671 | −220.2148208 | −370.0466113 | −563.1454128 |
care | −11.52868834 | −49.17233659 | −116.4456707 | −224.1156048 | −375.6415389 | −585.8790988 |
Datasets | TTM | |||||
TopM = 5 | TopM = 10 | TopM = 15 | TopM = 20 | TopM = 25 | TopM = 30 | |
cigar | −34.68247162 | −197.0478114 | −493.1432304 | −921.1591797 | −1481.235217 | −2165.174583 |
ecig | −34.67967533 | −200.4202621 | −502.3021289 | −935.8301943 | −1495.777983 | −2182.933002 |
Oscars | −31.17239749 | −178.8829525 | −456.1151923 | −858.0192159 | −1383.122495 | −2032.967068 |
baby | −16.2223621 | −101.2752879 | −281.5019366 | −562.4756204 | −937.8116571 | −1440.487996 |
camera | −11.28057698 | −62.01834755 | −164.0824617 | −318.5467601 | −529.0200582 | −801.4709905 |
computer | −13.16215736 | −68.04826101 | −174.2534566 | −332.9993329 | −549.2408003 | −824.3810116 |
home | −9.934943893 | −51.89310174 | −131.4846734 | −254.4960003 | −428.6019462 | −643.0140473 |
food | −11.33222981 | −54.36687511 | −128.2609949 | −242.0789656 | −399.9506445 | −600.8773534 |
care | −9.739145131 | −53.9780039 | −133.7955495 | −254.3544155 | −432.013784 | −666.1259446 |
Type | Datasets | Targets | BiTTM | TTM | BTM-PD | BTM | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P@5 | P@10 | P@20 | P@5 | P@10 | P@20 | P@5 | P@10 | P@20 | P@5 | P@10 | P@20 | |||
short | cigar | ashtray | 0.92 | 0.84 | 0.6 | 0.92 | 0.66 | 0.46 | 0.6 | 0.6 | 0.39 | 0.2 | 0.2 | 0.135 |
place | 0.64 | 0.56 | 0.43 | 0.52 | 0.44 | 0.29 | 0.48 | 0.4 | 0.26 | 0.22 | 0.17 | 0.135 | ||
ecig | smokeless | 0.72 | 0.64 | 0.45 | 0.56 | 0.5 | 0.45 | 0.44 | 0.4 | 0.31 | 0.3 | 0.24 | 0.19 | |
warning | 0.6 | 0.54 | 0.54 | 0.48 | 0.46 | 0.37 | 0.36 | 0.3 | 0.25 | 0.4 | 0.24 | 0.215 | ||
Oscars | mistake | 0.6 | 0.56 | 0.49 | 0.4 | 0.48 | 0.45 | 0.2 | 0.32 | 0.28 | 0.28 | 0.18 | 0.14 | |
oscarsfail | 0.64 | 0.52 | 0.44 | 0.36 | 0.4 | 0.31 | 0.24 | 0.18 | 0.15 | 0.26 | 0.18 | 0.125 | ||
medium | baby | rinses | 0.52 | 0.38 | 0.3 | 0.2 | 0.26 | 0.24 | 0.08 | 0.1 | 0.11 | 0.12 | 0.12 | 0.085 |
shower | 0.52 | 0.48 | 0.38 | 0.44 | 0.4 | 0.32 | 0.2 | 0.36 | 0.31 | 0.24 | 0.19 | 0.145 | ||
camera | portable | 0.52 | 0.48 | 0.4 | 0.4 | 0.44 | 0.29 | 0.32 | 0.28 | 0.23 | 0.1 | 0.09 | 0.08 | |
price | 0.52 | 0.48 | 0.51 | 0.48 | 0.44 | 0.42 | 0.44 | 0.34 | 0.26 | 0.08 | 0.11 | 0.09 | ||
computer | display | 0.48 | 0.74 | 0.72 | 0.4 | 0.62 | 0.62 | 0.28 | 0.32 | 0.32 | 0.1 | 0.15 | 0.165 | |
keyboard | 0.52 | 0.66 | 0.59 | 0.44 | 0.6 | 0.52 | 0.32 | 0.34 | 0.33 | 0.18 | 0.15 | 0.14 | ||
long | home | clean | 0.76 | 0.8 | 0.69 | 0.68 | 0.72 | 0.64 | 0.32 | 0.34 | 0.32 | 0.25 | 0.19 | 0.18 |
kitchen | 0.72 | 0.72 | 0.66 | 0.68 | 0.8 | 0.66 | 0.64 | 0.52 | 0.44 | 0.22 | 0.18 | 0.16 | ||
food | disease | 0.6 | 0.6 | 0.6 | 0.48 | 0.46 | 0.42 | 0.44 | 0.4 | 0.38 | 0.22 | 0.23 | 0.19 | |
microwave | 0.84 | 0.58 | 0.47 | 0.68 | 0.64 | 0.44 | 0.32 | 0.32 | 0.26 | 0.18 | 0.17 | 0.125 | ||
care | diabetic | 0.56 | 0.62 | 0.53 | 0.48 | 0.44 | 0.25 | 0.28 | 0.28 | 0.23 | 0.12 | 0.13 | 0.12 | |
infant | 0.48 | 0.54 | 0.51 | 0.36 | 0.3 | 0.29 | 0.2 | 0.14 | 0.11 | 0.18 | 0.15 | 0.105 | ||
average score | 0.62 | 0.60 | 0.52 | 0.50 | 0.50 | 0.41 | 0.34 | 0.33 | 0.27 | 0.20 | 0.17 | 0.14 | ||
improvement by BiTTM | +0.12 | +0.09 | +0.10 | +0.28 | +0.27 | +0.24 | +0.42 | +0.43 | +0.38 |
Domain | Size (KB) | BiTTM | BTM-PD | TTM | BTM |
---|---|---|---|---|---|
cigar | 641 | 6.012 | 0.378 | 60.994 | 314.940 |
ecig | 708 | 2.960 | 2.900 | 107.008 | 489.180 |
Oscars | 565 | 3.620 | 3.934 | 58.042 | 418.560 |
baby | 141 | 16.011 | 9.753 | 24.246 | 1153.740 |
camera | 1285 | 81.086 | 164.834 | 811.684 | 15,997.380 |
computer | 1295 | 96.978 | 147.023 | 713.068 | 17,559.660 |
home | 619 | 37.083 | 90.283 | 129.481 | 7116.300 |
food | 195 | 9.166 | 31.833 | 18.534 | 2496.600 |
care | 1523 | 135.040 | 218.065 | 1516.927 | 18,767.580 |
Datasets: Food. Target: Disease | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BiTTM | TTM | BTM-PD | BTM | |||||||||||
Tea | SFA | Research | Risk | Prevention | Tea | SFA | Research | Risk | Tea | SFA | Research | Tea | SFA | Research |
tea | fat | study | disease | cherry | disease | fat | heart | risk | tea | protein | cherry | tea | fat | study |
effect | cancer | increase | risk | reduce | tea | saturated | study | disease | green | oil | chocolate | antioxidant | saturated | disease |
benefit | people | research | heart | tart | flavour | tart | cancer | cherry | weight | fat | tart | disease | disease | risk |
work | health | small | high | prevent | bad | chip | reduce | star | fat | palm | study | heart | risk | heart |
lower | fruit | antioxidant | green | include | price | sweetener | order | bag | loss | saturated | disease | fruit | oil | reduce |
taste | blood | good | cell | body | higher | follow | brand | measure | study | quality | dark | rich | coconut | show |
long | sugar | eat | level | find | company | thing | vegetable | calorie | increase | eat | cocoa | provide | find | cherry |
kind | animal | amount | show | product | back | production | cook | state | body | diet | sweet | health | study | health |
pure | saturated | diet | day | result | nutrient | add | large | price | drink | disease | cancer | substitute | increase | food |
rich | drink | water | sweet | add | simply | expensive | drink | shipping | calorie | gram | product | vegetable | health | cancer |
BiTTM | TTM | BTM-PD | BTM | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Target: Bath | |||||||||||
Blanket | Spout | Protection | Sentiment | Blanket | Spout | Sentiment | - | - | Blanket | Spout-1 | Spout-2 |
cover | spout | cute | fit | spout | fit | spout | cover | spout | faucet | ||
pull | head | play | time | cover | cute | bath | bath | cute | easy | ||
product | stay | tub | daughter | snap | son | shower | spout | head | spout | ||
shower | bath | put | faucet | bright | whale | tub | faucet | fit | cute | ||
time | whale | easy | give | hate | quickly | cover | head | cover | tub | ||
child | month | buy | nice | read | shower | whale | protect | faucet | cover | ||
recommend | crib | thing | problem | realize | bend | fit | tub | tub | product | ||
blanket | easily | kid | bump | mobile | bear | faucet | whale | son | bath | ||
lot | side | protect | diaper | break | front | knob | fit | whale | head | ||
start | year | bumper | find | parent | touch | perfect | time | bath | whale | ||
Target: shower | |||||||||||
blanket | spout | cute | son | shower | spout | cover | spout | spout | blanket | spout | spout |
buy | shower | thing | pull | gift | tub | fit | shower | shower | buy | cover | fit |
cover | head | easy | bath | bath | kid | cute | cover | cover | gift | fit | shower |
put | stay | faucet | time | buy | top | whale | cute | whale | swaddle | faucet | cute |
big | kid | product | daughter | hole | easy | face | faucet | bath | receive | head | pull |
pretty | gift | side | play | blanket | head | easily | head | tub | friend | shower | stay |
safe | perfect | problem | nice | time | remove | couple | pull | pull | child | product | cover |
stroller | soft | install | protect | thing | worry | snap | tub | mold | day | tub | son |
change | worry | child | find | trip | picture | high | whale | kid | hold | time | tub |
quality | bag | car | fall | totally | face | expect | bath | faucet | shower | pull | whale |
BiTTM | TTM | BTM-PD | BTM | |||
---|---|---|---|---|---|---|
Target: Oscarsfail | ||||||
Beginning | Correction | Discussion | Discussion | Discussion | Discussion | Discussion |
lalaland | moonlight | oscar | oscarsfail | win | win | moonlight |
hollywood | oscarsfail | envelopegate | oscar | oscarsfail | vote | bestpicture |
winner | award | picture | envelopegate | lalaland | lalaland | lalaland |
cast | mistake | actor | majorgroup | vote | moonlight | russians |
actress | vote | reaction | pick | short | popular | hack |
emma | variety | electoral | scene | white | electoral | election |
people | moment | time | hollywood | rating | oscarsfail | envelope |
white | mahershala | movie | hack | helmets | lrihendry | votetrumppics |
barryjenkins | violadavis | word | moana | moonlight | word | neontaster |
bestpicture | black | affleck | auliicravalho | documentary | oscar | oscarsfail |
Target: mistake | ||||||
lalaland | moonlight | win | win | lalaland | lalaland | moonlight |
realize | winner | mistake | moonlight | mistake | moonlight | winner |
announce | real | academy | picture | moonlight | picture | picture |
award | oscar | black | moment | crew | win | announce |
producer | night | film | realize | realize | realize | lalaland |
moment | people | reaction | lalaland | cast | mistake | real |
congrat | oscarsfail | give | watch | moment | moment | abc |
abc | movie | congratulation | mistake | watch | crew | watch |
thr | theellenshow | time | cast | win | watch | mistake |
happen | russians | support | crew | picture | cast | realize |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, J.; Chen, L.; Li, L.; Wu, X. BiTTM: A Core Biterms-Based Topic Model for Targeted Analysis. Appl. Sci. 2021, 11, 10162. https://doi.org/10.3390/app112110162
Wang J, Chen L, Li L, Wu X. BiTTM: A Core Biterms-Based Topic Model for Targeted Analysis. Applied Sciences. 2021; 11(21):10162. https://doi.org/10.3390/app112110162
Chicago/Turabian StyleWang, Jiamiao, Ling Chen, Lei Li, and Xindong Wu. 2021. "BiTTM: A Core Biterms-Based Topic Model for Targeted Analysis" Applied Sciences 11, no. 21: 10162. https://doi.org/10.3390/app112110162