Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-030-73280-6_27guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Empirical Study of Tweets Topic Classification Using Transformer-Based Language Models

Published: 07 April 2021 Publication History

Abstract

Social media opens up a great opportunity for policymakers to analyze and understand a large volume of online content for decision-making purposes. People’s opinions and experiences on social media platforms such as Twitter are extremely significant because of its volume, variety, and veracity. However, processing and retrieving useful information from natural language content is very challenging because of its ambiguity and complexity. Recent advances in Natural Language Understanding (NLU)-based techniques more specifically Transformer-based architecture solve sequence-to-sequence modeling tasks while handling long-range dependencies efficiently, and models based on transformers setting new benchmarks in performance across a wide variety of NLU-based tasks. In this paper, we applied transformer-based sequence modeling on short texts’ topic classification from tourist/user-posted tweets. Multiple BERT-like state-of-the-art sequence modeling approaches on topic/target classification tasks are investigated on the Great Barrier Reef tweet dataset and obtained findings can be valuable for researchers working on classification with large data sets and a large number of target classes.

References

[1]
Alaei AR, Becken S, and Stantic B Sentiment analysis in tourism: capitalizing on big data J. Travel Res. 2019 58 2 175-191
[2]
Allan, J.: Introduction to topic detection and tracking. The Information Retrieval Series, vol. 12 (2012)
[3]
Becken S, Connolly RM, Chen J, and Stantic B A hybrid is born: integrating collective sensing, citizen science and professional monitoring of the environment Ecol. Inform. 2019 52 35-45
[4]
Becken, S., Stantic, B., Chen, J., Alaei, A., Connolly, R.M.: Monitoring the environment and human sentiment on the great barrier reef: assessing the potential of collective sensing. J. Environ. Manag. 203, 87–97 (2017)
[5]
Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: IJCAI (2011)
[6]
Dai, Z., et al.: Crest: cluster-based representation enrichment for short text classification. In: Advances in Knowledge Discovery and Data Mining, pp. 256–267 (2013)
[7]
Daume, S., Galaz, V.: “Anyone know what species this is?” - twitter conversations as embryonic citizen science communities. Plos One 11, 1–25 (2016)
[8]
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
[9]
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. CoRR abs/1801.06146 (2018). http://arxiv.org/abs/1801.06146
[10]
Hutto, C.J., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media, pp. 216–225 (2014)
[11]
Kumar, A., Jaiswal, A.: Systematic literature review of sentiment analysis on twitter using soft computing techniques. Concurrency and Computation: Practice and Experience, vol. 32, no. 1 (2019)
[12]
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite bert for self-supervised learning of language representations (2019)
[13]
Lee, K., Palsetia, D., Narayanan, R., Patwary, M.M.A., Agrawal, A., Choudhary, A.: Twitter trending topic classification. In: International Conference on Data Mining Workshops, pp. 251–258 (2011)
[14]
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). http://arxiv.org/abs/1907.11692
[15]
Lodia L and Tardin R Citizen science contributes to the understanding of the occurrence and distribution of cetaceans in south-eastern brazil - a case study Ocean Coast. Manag. 2018 158 45-55
[16]
Nigam, K., Mccallum, A.K., Thrun, S.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2), 103–134 (2000)
[17]
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)
[18]
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)
[19]
Ribeiro, F.N., Araújo, M., Gonçalves, P., Benevenuto, F., Gonçalves, M.A.: A benchmark comparison of state-of-the-practice sentiment analysis methods. CoRR abs/1512.01818 (2015)
[20]
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 841–842 (2010)
[21]
Tang, D., Qin, B., Liu, T.: Deep learning for sentiment analysis: successful approaches and future challenges. WIREs Data Min. Knowl. Disc. 5(6), 292–303 (2015)
[22]
Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
[23]
Vo DT and Ock CY Learning to classify short text from scientific documents using topic models with various types of knowledge Expert Syst. Appl. 2015 42 1684-1698
[24]
Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: Xlnet: generalized autoregressive pretraining for language understanding. CoRR abs/1906.08237 (2019). http://arxiv.org/abs/1906.08237
[25]
Yüksel, A.E., Türkmen, Y.A., Özgür, A., Altınel, B.: Turkish tweet classification with transformer encoder. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pp. 1380–1387. INCOMA Ltd. (2019).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Intelligent Information and Database Systems: 13th Asian Conference, ACIIDS 2021, Phuket, Thailand, April 7–10, 2021, Proceedings
Apr 2021
880 pages
ISBN:978-3-030-73279-0
DOI:10.1007/978-3-030-73280-6

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 07 April 2021

Author Tags

  1. Transformer
  2. Natural language processing
  3. Topic classification
  4. Target classification
  5. Deep learning

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media