Query-By-Committee Framework Used for Semi-Automatic Sleep Stages Classification †
Abstract
:1. Introduction
2. Active Learning
- Learn a classifier c on the set of labeled instances .
- Assign instances from the set of unlabeled instances to some class by using the learnt classifier c.
- Use a query strategy in order to select instance from set .
- Ask an “oracle” for the class which the selected instance belongs to. By “oracle” it is often meant an expert—a human annotator who has an expertise in the given field.
- Add the newly classified instance to set (and remove it from set ).
- Repeat steps 1–5 until a terminal condition is met (e.g., a given number of iterations is reached, the error attained a specified threshold, etc.).
2.1. Query Strategies
2.2. Advantages and Disadvantages of Active Learning
- Advantages of Active Learning
- -
- Saving of time and money: there is no need to annotate a large amount of data, it is sufficient to label only the most informative instances.
- -
- Online adaptation of the classifier: the classifier is automatically retrained when new unseen instances are available.
- Disadvantages of Active Learning
- -
- Application-dependent selection of the query strategy: the query strategy has to be chosen wisely according i.e., to a chosen classifier (e.g., margin uncertainty sampling is suitable when the classifier computes posterior probabilities [3]), to some specific relationship among data instances in the observation space (then density-weighted methods are useful [11]), etc.
- -
- Sensitivity to the initialisation: when the process is not properly initialised, the performance of the chosen classifier is insufficient during several first iterations (the so-called “cold start problem” [12]) which can result in a slower convergence of the learning process.
3. Dataset
4. Proposed Method
5. Experiments and Results
6. Conclusions and Discussion
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
AASM | American Association of Sleep Medicine |
CWT | Continuous wavelet transform |
EEG | Electroencephalography |
EMG | Electromyography |
EOG | Electrooculography |
MUS | Margin uncertainty sampling |
PSD | Power spectral density |
PSG | Polysomnography |
QBC | Query-by-committee |
RS | Random sampling |
References
- Gerla, V. Automatic Analysis of Long-Term EEG Signals. Ph.D. thesis, Czech Technical University, Prague, Czech Republic, 2012. [Google Scholar]
- Duce, B.; Rego, C.; Milosavljevic, J.; Hukins, C. The AASM recommended and acceptable EEG montages are comparable for the staging of sleep and scoring of EEG arousals. J. Clin. Sleep Med. 2014, 10, 803. [Google Scholar] [CrossRef] [PubMed]
- Settles, B. Active Learning Literature Survey; Computer Sciences Technical Report 1648; University of Wisconsin–Madison: Madison, WI, USA, 2009. [Google Scholar]
- Scheffer, T.; Decomain, C.; Wrobel, S. Active hidden markov models for information extraction. In Proceedings of the International Symposium on Intelligent Data Analysis, Cascais, Portugal, 13–15 September 2001; Springer: Berlin/Heidelberg, Germany, 2001; pp. 309–318. [Google Scholar]
- Seung, H.S.; Opper, M.; Sompolinsky, H. Query by Committee. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92, Pittsburgh, PA, USA, 27–29 July 1992; ACM: New York, NY, USA, 1992; pp. 287–294. [Google Scholar]
- Ramirez-Loaiza, M.E.; Sharma, M.; Kumar, G.; Bilgic, M. Active learning: An empirical study of common baselines. Data Min. Knowl. Discov. 2017, 31, 287–313. [Google Scholar] [CrossRef]
- Schein, A.I.; Ungar, L.H. Active learning for logistic regression: An evaluation. Mach. Learn. 2007, 68, 235–265. [Google Scholar] [CrossRef]
- Lewis, D.D.; Catlett, J. Heterogeneous Uncertainty Sampling for Supervised Learning. In Proceedings of the Eleventh International Conference on Machine Learning, New Brunswick, NJ, USA, 10–13 July 1994; pp. 148–156. [Google Scholar]
- Tomanek, K. Resource-aware Annotation Through Active Learning. Ph.D. thesis, Technical University Dortmund, Dortmund, Germany, 2010. [Google Scholar]
- Dagan, I.; Engelson, S.P. Committee-Based Sampling For Training Probabilistic Classifiers. In Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, USA, 9–12 July 1995; pp. 150–157. [Google Scholar]
- Settles, B.; Craven, M. An Analysis of Active Learning Strategies for Sequence Labeling Tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, Honolulu, HI, USA, 25–27 October 2008; Association for Computational Linguistics: Stroudsburg, PA, USA, 2008; pp. 1070–1079. [Google Scholar]
- Attenberg, J.; Provost, F. Why label when you can search? Alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–28 July 2010; pp. 423–432. [Google Scholar]
- Klem, G.H.; Lüders, H.O.; Jasper, H.; Elger, C. The ten-twenty electrode system of the International Federation. Electroencephalogr. Clin. Neurophysiol. 1999, 52, 3–6. [Google Scholar]
- Grimova, N.; Macas, M.; Gerla, V. Addressing the Cold Start Problem in Active Learning Approach Used For Semi-automated Sleep Stages Classification. In Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 3–6 December 2018; pp. 2249–2253. [Google Scholar]
- Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 8, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
Healthy Patients | Insomniac Patients | |
---|---|---|
Number of patients | 18 | 18 |
Males | 27.8% | 38.9% |
Recording duration | 7.80.9 h | 7.4 h |
Feature | Description | |
---|---|---|
1 | STD | Standard deviation of the signal in the time domain |
2 | SWNS | Skewness of the signal in the time domain |
3 | KRTS | Skewness of the signal in the time domain |
4 | MBL | Mobility of the signal in the time domain |
5 | CMPL | Complexity of the signal in the time domain |
6 | E | Shannon entropy |
7 | SE | Spectral entropy of CWT spectrum |
8 | SEF90 | Spectral edge frequency below 90% of the total power of the signal is located |
9 | SEF95 | Spectral edge frequency below 95% of the total power of the signal is located |
10 | PPF | Power peak frequency—frequency of maximum power |
11 | MDF | Mean dominant frequency |
12 | SMF | Median frequency |
13 | HFD | Higuchi fractal dimension |
14 | CWT0.5-3 | Relative PSD for frequency band of range 0.5–3 Hz |
15 | CWT3-7 | Relative PSD for frequency band of range 3–7 Hz |
16 | CWT7-12 | Relative PSD for frequency band of range 7–12 Hz |
17 | CWT11-13 | Relative PSD for frequency band of range 11–13 Hz |
18 | CWT12-22 | Relative PSD for frequency band of range 12–22 Hz |
19 | CWT13-15 | Relative PSD for frequency band of range 13–15 Hz |
20 | CWT22-30 | Relative PSD for frequency band of range 22–30 Hz |
21 | CWT30-45 | Relative PSD for frequency band of range 30–45 Hz |
Dataset | RS | MUS | QBC | Dataset | RS | MUS | QBC |
---|---|---|---|---|---|---|---|
1 | 0.279 | 0.241 | 0.231 | 19 | 0.309 | 0.307 | 0.276 |
2 | 0.580 | 0.469 | 0.429 | 20 | 0.475 | 0.399 | 0.380 |
3 | 0.539 | 0.412 | 0.351 | 21 | 0.255 | 0.179 | 0.174 |
4 | 0.353 | 0.347 | 0.293 | 22 | 0.312 | 0.245 | 0.220 |
5 | 0.633 | 0.619 | 0.584 | 23 | 0.200 | 0.154 | 0.153 |
6 | 0.354 | 0.276 | 0.225 | 24 | 0.293 | 0.209 | 0.151 |
7 | 0.339 | 0.348 | 0.343 | 25 | 0.277 | 0.241 | 0.210 |
8 | 0.496 | 0.490 | 0.479 | 26 | 0.468 | 0.423 | 0.421 |
9 | 0.621 | 0.606 | 0.592 | 27 | 0.419 | 0.331 | 0.315 |
10 | 0.437 | 0.279 | 0.250 | 28 | 0.182 | 0.174 | 0.168 |
11 | 0.323 | 0.278 | 0.264 | 29 | 0.386 | 0.351 | 0.380 |
12 | 0.472 | 0.406 | 0.399 | 30 | 0.315 | 0.222 | 0.191 |
13 | 0.513 | 0.448 | 0.461 | 31 | 0.405 | 0.343 | 0.354 |
14 | 0.362 | 0.328 | 0.297 | 32 | 0.518 | 0.477 | 0.467 |
15 | 0.395 | 0.344 | 0.334 | 33 | 0.336 | 0.318 | 0.314 |
16 | 0.412 | 0.335 | 0.320 | 34 | 0.287 | 0.212 | 0.211 |
17 | 0.355 | 0.300 | 0.286 | 35 | 0.260 | 0.226 | 0.229 |
18 | 0.614 | 0.569 | 0.589 | 36 | 0.478 | 0.380 | 0.344 |
Dataset | RS | MUS | QBC | Dataset | RS | MUS | QBC |
---|---|---|---|---|---|---|---|
1 | 0.255 | 0.236 | 0.231 | 19 | 0.290 | 0.269 | 0.290 |
2 | 0.522 | 0.447 | 0.399 | 20 | 0.393 | 0.363 | 0.369 |
3 | 0.418 | 0.368 | 0.345 | 21 | 0.238 | 0.169 | 0.172 |
4 | 0.296 | 0.298 | 0.288 | 22 | 0.258 | 0.229 | 0.224 |
5 | 0.636 | 0.638 | 0.605 | 23 | 0.218 | 0.128 | 0.136 |
6 | 0.311 | 0.244 | 0.219 | 24 | 0.239 | 0.180 | 0.171 |
7 | 0.336 | 0.322 | 0.331 | 25 | 0.235 | 0.210 | 0.191 |
8 | 0.480 | 0.470 | 0.467 | 26 | 0.453 | 0.425 | 0.415 |
9 | 0.606 | 0.580 | 0.596 | 27 | 0.373 | 0.282 | 0.313 |
10 | 0.399 | 0.252 | 0.245 | 28 | 0.176 | 0.169 | 0.161 |
11 | 0.286 | 0.278 | 0.253 | 29 | 0.383 | 0.329 | 0.363 |
12 | 0.416 | 0.372 | 0.382 | 30 | 0.212 | 0.180 | 0.197 |
13 | 0.478 | 0.427 | 0.468 | 31 | 0.381 | 0.322 | 0.326 |
14 | 0.360 | 0.307 | 0.295 | 32 | 0.494 | 0.449 | 0.443 |
15 | 0.344 | 0.345 | 0.315 | 33 | 0.331 | 0.282 | 0.297 |
16 | 0.357 | 0.298 | 0.315 | 34 | 0.269 | 0.190 | 0.195 |
17 | 0.307 | 0.276 | 0.276 | 35 | 0.253 | 0.195 | 0.204 |
18 | 0.574 | 0.575 | 0.588 | 36 | 0.448 | 0.320 | 0.312 |
Dataset | RS | MUS | QBC | Dataset | RS | MUS | QBC |
---|---|---|---|---|---|---|---|
1 | 0.207 | 0.191 | 0.194 | 19 | 0.207 | 0.191 | 0.243 |
2 | 0.390 | 0.401 | 0.375 | 20 | 0.309 | 0.256 | 0.329 |
3 | 0.311 | 0.297 | 0.292 | 21 | 0.154 | 0.125 | 0.131 |
4 | 0.238 | 0.216 | 0.242 | 22 | 0.197 | 0.175 | 0.170 |
5 | 0.639 | 0.589 | 0.604 | 23 | 0.110 | 0.084 | 0.099 |
6 | 0.182 | 0.166 | 0.188 | 24 | 0.167 | 0.143 | 0.147 |
7 | 0.287 | 0.242 | 0.233 | 25 | 0.168 | 0.154 | 0.160 |
8 | 0.426 | 0.418 | 0.422 | 26 | 0.386 | 0.331 | 0.380 |
9 | 0.537 | 0.530 | 0.529 | 27 | 0.227 | 0.210 | 0.222 |
10 | 0.244 | 0.214 | 0.208 | 28 | 0.127 | 0.117 | 0.127 |
11 | 0.244 | 0.194 | 0.187 | 29 | 0.278 | 0.244 | 0.252 |
12 | 0.312 | 0.290 | 0.349 | 30 | 0.128 | 0.121 | 0.161 |
13 | 0.398 | 0.354 | 0.372 | 31 | 0.250 | 0.205 | 0.253 |
14 | 0.265 | 0.254 | 0.283 | 32 | 0.409 | 0.362 | 0.394 |
15 | 0.310 | 0.254 | 0.287 | 33 | 0.245 | 0.221 | 0.235 |
16 | 0.264 | 0.230 | 0.264 | 34 | 0.151 | 0.137 | 0.130 |
17 | 0.250 | 0.244 | 0.228 | 35 | 0.182 | 0.162 | 0.162 |
18 | 0.569 | 0.549 | 0.560 | 36 | 0.239 | 0.235 | 0.227 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Grimova, N.; Macas, M. Query-By-Committee Framework Used for Semi-Automatic Sleep Stages Classification. Proceedings 2019, 31, 80. https://doi.org/10.3390/proceedings2019031080
Grimova N, Macas M. Query-By-Committee Framework Used for Semi-Automatic Sleep Stages Classification. Proceedings. 2019; 31(1):80. https://doi.org/10.3390/proceedings2019031080
Chicago/Turabian StyleGrimova, Nela, and Martin Macas. 2019. "Query-By-Committee Framework Used for Semi-Automatic Sleep Stages Classification" Proceedings 31, no. 1: 80. https://doi.org/10.3390/proceedings2019031080
APA StyleGrimova, N., & Macas, M. (2019). Query-By-Committee Framework Used for Semi-Automatic Sleep Stages Classification. Proceedings, 31(1), 80. https://doi.org/10.3390/proceedings2019031080