Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3269206.3271717acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Learning under Feature Drifts in Textual Streams

Published: 17 October 2018 Publication History

Abstract

Huge amounts of textual streams are generated nowadays, especially in social networks like Twitter and Facebook. As the discussion topics and user opinions on those topics change drastically with time, those streams undergo changes in data distribution, leading to changes in the concept to be learned, a phenomenon called concept drift. One particular type of drift, that has not yet attracted a lot of attention is feature drift, i.e., changes in the features that are relevant for the learning task at hand. In this work, we propose an approach for handling feature drifts in textual streams. Our approach integrates i) an ensemble-based mechanism to accurately predict the feature/word values for the next time-point by taking into account the different features might be subject to different temporal trends and ii) a sketch-based feature space maintenance mechanism that allows for a memory-bounded maintenance of the feature space over the stream. Experiments with textual streams from the sentiment analysis, email preference and spam detection demonstrate that our approach achieves significantly better or competitive performance compared to baselines.

References

[1]
Giulio Angiani, Laura Ferrari, Tomaso Fontanini, Paolo Fornacciari, Eleonora Iotti, Federico Magliani, and Stefano Manicardi. 2016. A Comparison between Preprocessing Techniques for Sentiment Analysis in Twitter. In KDWeb .
[2]
Jean Paul Barddal, Heitor Murilo Gomes, Fabr'icio Enembreck, and Bernhard Pfahringer. 2017. A survey on feature drift adaptation: Definition, benchmark, challenges and future directions. Journal of Systems and Software, Vol. 127 (2017), 278--294.
[3]
Albert Bifet and Eibe Frank. 2010. Sentiment knowledge discovery in Twitter streaming data. In International conference on discovery science. Springer, 1--15.
[4]
Albert Bifet and Ricard Gavalda. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM international conference on data mining. SIAM, 443--448.
[5]
Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. 2010. MOA: Massive online analysis. Journal of Machine Learning Research, Vol. 11, May (2010), 1601--1604.
[6]
Albert Bifet, Geoffrey Holmes, and Bernhard Pfahringer. 2011. MOA-tweetreader: real-time analysis in Twitter streaming data. In International Conference on Discovery Science. Springer, 46--60.
[7]
Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavaldà. 2009. New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 139--148.
[8]
George EP Box and Gwilym M Jenkins. 1976. Time series analysis: forecasting and control, revised ed .Holden-Day.
[9]
J. Cryer and K. Chan. {n. d.}. Time Series Analysis with Applications in R .
[10]
Joao Gama. 2010. Knowledge discovery from data streams .CRC Press.
[11]
João Gama, Indr.e vZ liobait.e, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM computing surveys (CSUR), Vol. 46, 4 (2014), 44.
[12]
Anastasia Giachanou and Fabio Crestani. 2016. Tracking sentiment by time series analysis. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 1037--1040.
[13]
Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, Vol. 1, 12 (2009).
[14]
T Ryan Hoens, Robi Polikar, and Nitesh V Chawla. 2012. Learning from streaming data with concept drift and imbalance: an overview. Progress in Artificial Intelligence, Vol. 1, 1 (2012), 89--101.
[15]
Charles C Holt. 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International journal of forecasting, Vol. 20, 1 (2004), 5--10.
[16]
Zhao Jianqiang and Gui Xiaolin. 2017. Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis. IEEE Access, Vol. 5 (2017), 2870--2879.
[17]
Ioannis Katakis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2006. Dynamic feature space and incremental feature selection for the classification of textual data streams. Knowledge Discovery from Data Streams (2006), 107--116.
[18]
Ioannis Katakis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2010. Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowledge and Information Systems, Vol. 22, 3 (2010), 371--391.
[19]
Jyrki Kivinen and Manfred K Warmuth. 1999. Averaging expert predictions. In European Conference on Computational Learning Theory. Springer, 153--167.
[20]
Guy Lebanon and Yang Zhao. 2008. Local likelihood modeling of temporal text streams. In Proceedings of the 25th international conference on Machine learning. ACM, 552--559.
[21]
Guanjun Lin, Nan Sun, Surya Nepal, Jun Zhang, Yang Xiang, and Houcine Hassan. 2017. Statistical Twitter Spam Detection Demystified: Performance, Stability and Scalability. IEEE access, Vol. 5 (2017), 11142--11154.
[22]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval .Cambridge University Press, New York, NY, USA.
[23]
Emaad Manzoor, Hemank Lamba, and Leman Akoglu. 2018. xStream: Outlier Detexion in Feature-Evolving Data Streams. (2018).
[24]
Damianos P Melidis, Alvaro Veizaga Campero, Vasileios Iosifidis, Eirini Ntoutsi, and Myra Spiliopoulou. 2018. Enriching Lexicons with Ephemeral Words for Sentiment Analysis in Social Streams. In Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics. ACM, 38.
[25]
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Efficient computation of frequent and top-k elements in data streams. In International Conference on Database Theory. Springer, 398--412.
[26]
Luis Moreira-Matias, Joao Gama, Michel Ferreira, Joao Mendes-Moreira, and Luis Damas. 2013. Predicting taxi--passenger demand using streaming data. IEEE Transactions on Intelligent Transportation Systems, Vol. 14, 3 (2013), 1393--1402.
[27]
Hai-Long Nguyen, Yew-Kwong Woon, Wee-Keong Ng, and Li Wan. 2012a. Heterogeneous ensemble for feature drifts in data streams. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 1--12.
[28]
Le T Nguyen, Pang Wu, William Chan, Wei Peng, and Ying Zhang. 2012b. Predicting collective sentiment dynamics from time-series social media. In Proceedings of the first international workshop on issues of sentiment discovery and opinion mining. ACM, 6.
[29]
Kyosuke Nishida, Takahide Hoshide, and Ko Fujimura. 2012. Improving tweet stream classification by detecting changes in word probability. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 971--980.
[30]
Myra Spiliopoulou, Eirini Ntoutsi, and Max Zimmermann. 2016. Opinion Stream Mining. Encyclopedia of Machine Learning and Data Mining (2016), 1--10.
[31]
Soroush Vosoughi, Helen Zhou, and Deb Roy. 2016. Enhanced twitter sentiment classification using contextual information. arXiv preprint arXiv:1605.05195 (2016).
[32]
Sebastian Wagner, Max Zimmermann, Eirini Ntoutsi, and Myra Spiliopoulou. 2015. Ageing-based multinomial naive bayes classifiers over opinionated data streams. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases . Springer, 401--416.

Cited By

View all
  • (2024)An incremental clustering algorithm based on semantic conceptsKnowledge and Information Systems10.1007/s10115-024-02063-066:6(3303-3335)Online publication date: 1-Jun-2024
  • (2022)Assessing Batch and Online Learning for Delivery in Full and On Time Predictions2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892386(1-9)Online publication date: 18-Jul-2022
  • (2021)Concept Drift Detection for Social Media: A Survey2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)10.1109/ICAC3N53548.2021.9725548(12-16)Online publication date: 17-Dec-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
October 2018
2362 pages
ISBN:9781450360142
DOI:10.1145/3269206
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. concept drifts
  2. ensemble learning
  3. feature drifts
  4. textual streams
  5. time series

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '18
Sponsor:

Acceptance Rates

CIKM '18 Paper Acceptance Rate 147 of 826 submissions, 18%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)3
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An incremental clustering algorithm based on semantic conceptsKnowledge and Information Systems10.1007/s10115-024-02063-066:6(3303-3335)Online publication date: 1-Jun-2024
  • (2022)Assessing Batch and Online Learning for Delivery in Full and On Time Predictions2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892386(1-9)Online publication date: 18-Jul-2022
  • (2021)Concept Drift Detection for Social Media: A Survey2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)10.1109/ICAC3N53548.2021.9725548(12-16)Online publication date: 17-Dec-2021
  • (2020)Drift-Aware Multi-Memory Model for Imbalanced Data Streams2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378101(878-885)Online publication date: 10-Dec-2020
  • (2020)Resource management for model learning at entity levelAnnals of Telecommunications10.1007/s12243-020-00800-4Online publication date: 29-Aug-2020
  • (2019)FAHTProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367032.3367242(1480-1486)Online publication date: 10-Aug-2019
  • (2019)Streaming Feature Selection for Multi-Label Data with Dynamic Sliding Windows and Feature Repulsion LossEntropy10.3390/e2112115121:12(1151)Online publication date: 25-Nov-2019
  • (2019)Exploiting entity information for stream classification over a stream of reviewsProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297333(564-573)Online publication date: 8-Apr-2019
  • (2019)Sentiment analysis on big sparse data streams with limited labelsKnowledge and Information Systems10.1007/s10115-019-01392-9Online publication date: 17-Aug-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media