Computer Science > Machine Learning

arXiv:2212.14720 (cs)

[Submitted on 30 Dec 2022 (v1), last revised 3 Aug 2023 (this version, v2)]

Title:Learning from Data Streams: An Overview and Update

View PDF

Abstract:The literature on machine learning in the context of data streams is vast and growing. However, many of the defining assumptions regarding data-stream learning tasks are too strong to hold in practice, or are even contradictory such that they cannot be met in the contexts of supervised learning. Algorithms are chosen and designed based on criteria which are often not clearly stated, for problem settings not clearly defined, tested in unrealistic settings, and/or in isolation from related approaches in the wider literature. This puts into question the potential for real-world impact of many approaches conceived in such contexts, and risks propagating a misguided research focus. We propose to tackle these issues by reformulating the fundamental definitions and settings of supervised data-stream learning with regard to contemporary considerations of concept drift and temporal dependence; and we take a fresh look at what constitutes a supervised data-stream learning task, and a reconsideration of algorithms that may be applied to tackle such tasks. Through and in reflection of this formulation and overview, helped by an informal survey of industrial players dealing with real-world data streams, we provide recommendations. Our main emphasis is that learning from data streams does not impose a single-pass or online-learning approach, or any particular learning regime; and any constraints on memory and time are not specific to streaming. Meanwhile, there exist established techniques for dealing with temporal dependence and concept drift, in other areas of the literature. For the data streams community, we thus encourage a shift in research focus, from dealing with often-artificial constraints and assumptions on the learning mode, to issues such as robustness, privacy, and interpretability which are increasingly relevant to learning in data streams in academic and industrial settings.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2212.14720 [cs.LG]
	(or arXiv:2212.14720v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2212.14720

Submission history

From: Jesse Read [view email]
[v1] Fri, 30 Dec 2022 14:01:41 UTC (608 KB)
[v2] Thu, 3 Aug 2023 08:18:36 UTC (732 KB)

Computer Science > Machine Learning

Title:Learning from Data Streams: An Overview and Update

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning from Data Streams: An Overview and Update

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators