Article

Empirical Study of Tweets Topic Classification Using Transformer-Based Language Models

Authors:

Susanne Becken,

Bela StanticAuthors Info & Claims

Intelligent Information and Database Systems: 13th Asian Conference, ACIIDS 2021, Phuket, Thailand, April 7–10, 2021, Proceedings

Pages 340 - 350

https://doi.org/10.1007/978-3-030-73280-6_27

Published: 07 April 2021 Publication History

Abstract

Social media opens up a great opportunity for policymakers to analyze and understand a large volume of online content for decision-making purposes. People’s opinions and experiences on social media platforms such as Twitter are extremely significant because of its volume, variety, and veracity. However, processing and retrieving useful information from natural language content is very challenging because of its ambiguity and complexity. Recent advances in Natural Language Understanding (NLU)-based techniques more specifically Transformer-based architecture solve sequence-to-sequence modeling tasks while handling long-range dependencies efficiently, and models based on transformers setting new benchmarks in performance across a wide variety of NLU-based tasks. In this paper, we applied transformer-based sequence modeling on short texts’ topic classification from tourist/user-posted tweets. Multiple BERT-like state-of-the-art sequence modeling approaches on topic/target classification tasks are investigated on the Great Barrier Reef tweet dataset and obtained findings can be valuable for researchers working on classification with large data sets and a large number of target classes.

References

[1]

Alaei AR, Becken S, and Stantic B Sentiment analysis in tourism: capitalizing on big data J. Travel Res. 2019 58 2 175-191

[2]

Allan, J.: Introduction to topic detection and tracking. The Information Retrieval Series, vol. 12 (2012)

[3]

Becken S, Connolly RM, Chen J, and Stantic B A hybrid is born: integrating collective sensing, citizen science and professional monitoring of the environment Ecol. Inform. 2019 52 35-45

[4]

Becken, S., Stantic, B., Chen, J., Alaei, A., Connolly, R.M.: Monitoring the environment and human sentiment on the great barrier reef: assessing the potential of collective sensing. J. Environ. Manag. 203, 87–97 (2017)

[5]

Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: IJCAI (2011)

[6]

Dai, Z., et al.: Crest: cluster-based representation enrichment for short text classification. In: Advances in Knowledge Discovery and Data Mining, pp. 256–267 (2013)

[7]

Daume, S., Galaz, V.: “Anyone know what species this is?” - twitter conversations as embryonic citizen science communities. Plos One 11, 1–25 (2016)

[8]

Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805

[9]

Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. CoRR abs/1801.06146 (2018). http://arxiv.org/abs/1801.06146

[10]

Hutto, C.J., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media, pp. 216–225 (2014)

[11]

Kumar, A., Jaiswal, A.: Systematic literature review of sentiment analysis on twitter using soft computing techniques. Concurrency and Computation: Practice and Experience, vol. 32, no. 1 (2019)

[12]

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite bert for self-supervised learning of language representations (2019)

[13]

Lee, K., Palsetia, D., Narayanan, R., Patwary, M.M.A., Agrawal, A., Choudhary, A.: Twitter trending topic classification. In: International Conference on Data Mining Workshops, pp. 251–258 (2011)

[14]

Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). http://arxiv.org/abs/1907.11692

[15]

Lodia L and Tardin R Citizen science contributes to the understanding of the occurrence and distribution of cetaceans in south-eastern brazil - a case study Ocean Coast. Manag. 2018 158 45-55

[16]

Nigam, K., Mccallum, A.K., Thrun, S.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2), 103–134 (2000)

[17]

Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)

[18]

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)

[19]

Ribeiro, F.N., Araújo, M., Gonçalves, P., Benevenuto, F., Gonçalves, M.A.: A benchmark comparison of state-of-the-practice sentiment analysis methods. CoRR abs/1512.01818 (2015)

[20]

Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 841–842 (2010)

[21]

Tang, D., Qin, B., Liu, T.: Deep learning for sentiment analysis: successful approaches and future challenges. WIREs Data Min. Knowl. Disc. 5(6), 292–303 (2015)

[22]

Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762

[23]

Vo DT and Ock CY Learning to classify short text from scientific documents using topic models with various types of knowledge Expert Syst. Appl. 2015 42 1684-1698

[24]

Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: Xlnet: generalized autoregressive pretraining for language understanding. CoRR abs/1906.08237 (2019). http://arxiv.org/abs/1906.08237

[25]

Yüksel, A.E., Türkmen, Y.A., Özgür, A., Altınel, B.: Turkish tweet classification with transformer encoder. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pp. 1380–1387. INCOMA Ltd. (2019).

Recommendations

Twitter Trending Topic Classification
ICDMW '11: Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops

With the increasing popularity of microblogging sites, we are in the era of information explosion. As of June 2011, about 200 million tweets are being generated everyday. Although Twitter provides a list of most popular topics people tweet about known ...
Documents topic classification model in social networks using classifiers voting system
RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systems

Topic model uncovers abstract topics within texts documents, which is an essential task in text analysis in social networks. However, identifying topics in text documents in social networks is challenging since the texts are short, unlabeled, and ...
Using Topic Modeling and Word Embedding for Topic Extraction in Twitter
Abstract
Topic analysis (also called topic detection, topic modeling, or topic extraction) is a machine learning technique that organizes and understands large collections of text data, by assigning “tags” or categories according to each individual text's ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Intelligent Information and Database Systems: 13th Asian Conference, ACIIDS 2021, Phuket, Thailand, April 7–10, 2021, Proceedings

Apr 2021

880 pages

ISBN:978-3-030-73279-0

DOI:10.1007/978-3-030-73280-6

Editors:
Ngoc Thanh Nguyen
Wrocław University of Science and Technology, Wrocław, Poland
,
Suphamit Chittayasothorn
King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
,
Dusit Niyato
Nanyang Technological University, Singapore, Singapore
,
Bogdan Trawiński
Wrocław University of Science and Technology, Wrocław, Poland

© Springer Nature Switzerland AG 2021.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 07 April 2021

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten