Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2623330.2623751acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

The setwise stream classification problem

Published: 24 August 2014 Publication History

Abstract

In many applications, classification labels may not be associated with a single instance of records, but may be associated with a data set of records. The class behavior may not be possible to infer effectively from a single record, but may be only be inferred by an aggregate set of records. Therefore, in this problem, the class label is associated with a set of instances both in the training and test data. Therefore, the problem may be understood to be that of classifying a set of data sets. Typically, the classification behavior may only be inferred from the overall patterns of data distribution, and very little information is embedded in any given record for classification purposes. We refer to this problem as the setwise classification problem.
The problem can be extremely challenging in scenarios where the data is received in the form of a stream, and the records within any particular data set may not necessarily be received contiguously. In this paper, we present a first approach for real time and streaming classification of such data. We present experimental results illustrating the effectiveness of the approach.

Supplementary Material

MP4 File (p432-sidebyside.mp4)

References

[1]
C. Aggarwal, J. Han, J. Wang, and P. Yu. A Framework for Clustering Evolving Data Streams, VLDB Conference, pp. 81--92, 2003.
[2]
C. Aggarwal, J. Han, J. Wang, P. Yu. On Demand Classification of Data Streams, ACM KDD Conference, pp.503--508, 2004.
[3]
C. Aggarwal. A Survey of Stream Classification Algorithms, Data Classification: Algorithms and Applications, CRC Press, 2014.
[4]
C. Aggarwal. The Multi-Set Stream Clustering Problem, SIAM Conference on Data Mining, pp. 59--69, 2012.
[5]
C. Aggarwal. On Segment-based Stream Modeling andits Applications, SIAM Conference on Data Mining, pp. 721--732, 2009.
[6]
C. Aggarwal. Data Streams: Models and Algorithms, Springer, New York, 2007.
[7]
C. Aggarwal. Data Classification: Algorithms and Applications, CRC Press, Boca Raton, FL, 2014.
[8]
T. Al-Khateeb, M. Masud, L. Khan, C. Aggarwal, J. Han, and B. Thuraisingham. Stream Classification with Recurring and Novel Class Detection Using Class-Based Ensemble. IEEE ICDM Conference, pp. 31--40, 2012.
[9]
T. M. Cover, and P. E. Hart. Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 13(1), pp. 21--27,1967.
[10]
T. Dietterich, R. Lathrop, and T. Lozano-Perez. Solving the Multiple-Instance Problem with Axis-Parallel Rectangles. Artificial Intelligence, 89, pp. 31--71, 1997.
[11]
P. Domingos, and G. Hulten. Mining High-Speed Data Streams. ACM KDD Conference, pp.71--80, 2000.
[12]
R. Duda, P. Hart, and D. Stork. Pattern Classification, Wiley-Interscience, 2000.
[13]
W. Fan. Systematic Data Selection to Mine Concept Drifting Data Streams, ACM KDD Conference, pp. 128--137, 2004.
[14]
J. Gama, R. Rocha, and P. Medas. Accurate Decision Trees for Mining High-Speed Data Streams, ACM KDD Conference, pp. 523--528, 2003.
[15]
G. Hulten, L. Spencer, and P. Domingos. Mining Time Changing Data Streams. ACM KDD Conference, pp. 97--106, 2001.
[16]
R. Jin, and G. Agrawal. Efficient Decision Tree Construction on Streaming Data, ACM KDD Conference, pp. 571--576, 2003.
[17]
M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B. Thuraisingham.Addressing Concept-Evolution in Concept-Drifting Data Streams. IEEE ICDM Conference, pp. 929--934, 2010.
[18]
M. Masud, T. Al-Khateeb, L. Khan, C. Aggarwal, J. Gao, J. Han, and B. Thuraisingham. Detecting Recurring and Novel Classes in Concept-Drifting Data Streams. IEEE ICDM Conference, pp. 1176--1181, 2011.
[19]
M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, A. Srivastava, and N. Oza. Classification and Adaptive Novel Class Detection of Feature-Evolving Data Streams, IEEE Transactions on Knowledgeand Data Engineering, 25(7), pp. 1484--1497, 2013.
[20]
X. Ning, and G. Karypis. The Set Classification Problem and Solution Methods. SIAM Conference on Data Mining, pp. 847--858, 2009.
[21]
J. R. Quinlan. C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993.
[22]
S. Ruping. Incremental Learning with Support Vector Machines. IEEE ICDM Conference, pp. 641--642, 2001.
[23]
V. Vapnik. The Nature of Statistical Learning Theory, Springer, New York, 1995.
[24]
H. Wang, W. Fan, P. Yu, J. Han. Mining Concept-Drifting Data Streams using Ensemble Classifiers. ACM KDD Conference, pp. 226--235, 2003.
[25]
T. Zhang, R. Ramakrishnan, M. Livny. Fast Density Estimation Using CF-Kernel for Very Large Databases. ACM KDD Conference, pp. 312--316, 1999.

Cited By

View all
  • (2023)Exploring a global interpretation mechanism for deep learning networks when predicting sepsisScientific Reports10.1038/s41598-023-30091-313:1Online publication date: 21-Feb-2023
  • (2021)Importance of Research Into Big Data With Machine Learning ApproachMachine Learning in Cancer Research With Applications in Colon Cancer and Big Data Analysis10.4018/978-1-7998-7316-7.ch008(155-159)Online publication date: 2021
  • (2018)AnySC: Anytime Set-wise Classification of Variable Speed Data Streams2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622567(967-974)Online publication date: Dec-2018
  • Show More Cited By

Index Terms

  1. The setwise stream classification problem

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2014
    2028 pages
    ISBN:9781450329569
    DOI:10.1145/2623330
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data classification
    2. data streams

    Qualifiers

    • Research-article

    Conference

    KDD '14
    Sponsor:

    Acceptance Rates

    KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Exploring a global interpretation mechanism for deep learning networks when predicting sepsisScientific Reports10.1038/s41598-023-30091-313:1Online publication date: 21-Feb-2023
    • (2021)Importance of Research Into Big Data With Machine Learning ApproachMachine Learning in Cancer Research With Applications in Colon Cancer and Big Data Analysis10.4018/978-1-7998-7316-7.ch008(155-159)Online publication date: 2021
    • (2018)AnySC: Anytime Set-wise Classification of Variable Speed Data Streams2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622567(967-974)Online publication date: Dec-2018
    • (2017)Developing a low dimensional patient class profile in accordance to their respiration-induced tumor motionProceedings of the VLDB Endowment10.14778/3137765.313776810:12(1610-1621)Online publication date: 1-Aug-2017
    • (2017)DeMalCProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132848(1559-1567)Online publication date: 6-Nov-2017
    • (2017)Nonlinear Dynamics of Information Diffusion in Social NetworksACM Transactions on the Web10.1145/305774111:2(1-40)Online publication date: 24-Apr-2017
    • (2017)Ecosystem on the WebWorld Wide Web10.1007/s11280-016-0389-x20:3(439-465)Online publication date: 1-May-2017
    • (2016)PRIMEACM SIGARCH Computer Architecture News10.1145/3007787.300114044:3(27-39)Online publication date: 18-Jun-2016
    • (2016)Bounded distortion parametrization in the space of metricsACM Transactions on Graphics10.1145/2980179.298242635:6(1-16)Online publication date: 5-Dec-2016
    • (2016)On a New SDP-SOCP Method for Acoustic Source Localization ProblemACM Transactions on Sensor Networks10.1145/296844912:4(1-26)Online publication date: 25-Oct-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media