Abstract
In this paper, we present a perception sensor network (PSN) capable of detecting audio- and visual-based emergency situations such as students’ quarrel with scream and punch, and of keeping an effective school safety. As a system aspect, PSN is basically composed of ambient type sensor units using a Kinect, a pan-tilt-zoom camera, and a control board to acquire raw audio signals, color and depth images. Audio signals, which are acquired by the Kinect microphone array, are used in recognizing sound classes and localizing that sound source. Vision signals, which are acquired by the Kinect and PTZ camera stream, are used to detect the location of humans, identify their name and recognize their gestures. In the system, fusion methods are utilized to associate with multiple person detection and tracking, face identification, and audio–visual emergency recognition. Two approaches of matching pursuit algorithm and dense trajectories covariance matrix are also applied for reliably recognizing abnormal activities of students. Through this, human-caused emergencies are detected automatically while identifying human data of occurrence place, subject, and emergency type. Our PSN that consists of four units was used to conduct experiments to detect the designated target with abnormal actions in multi-person scenarios. By evaluating the performance of perception capabilities and integrated system, it was confirmed that the proposed system can help to conduct more meaningful information which can be of substantive support to teachers or staff members in school environments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
School Violence, http://www.cdc.gov/ViolencePrevention/youthviolence/schoolviolence.
Korean National Police Agency, https://www.police.go.kr.
Korean School Information, http://www.schoolinfo.go.kr.
BEHAVE dataset, http://homepages.inf.ed.ac.uk/rbf/BEHAVE.
D-CASE challenge, http://www.cs.tut.fi/sgn/arg/dcase2017/challenge/index.
Robot Operating System, https://wiki.ros.org/indigo.
OpenNI package, https://wiki.ros.org/openni_tracker.
The boundaries of an object on an image.
References
An K, Lee G, Yun SS, Choi J (2015) Multiple humans recognition of robot aided by perception sensor network. In: Proc Int Conf Ubiquitous Robots and Ambient Intelligence—URAI’15 pp 359–361
Blunsden S, Andrade E, Fisher R (2007) Non parametric classification of human interaction. In: Proc 3rd Iberian Conf. Pattern Recog. Image Anal, pp 347–354
Candamo J, Shreve M, Goldgof DB, Sapper DB, Kasturi R (2010) Understanding transit scenes: a survey on human behavior-recognition algorithm. IEEE Trans Intell Trans Sys 11(1):206–224
Chen C-C, Yao Y, Drira A et al (2009) Cooperative mapping of multiple PTZ cameras in automated surveillance systems. In: Proc IEEE Int conf computer vision and pattern recog—CVPR’09, pp 1078–1084
Chu S, Narayanan S, Kuo C-CJ (2009) Environmental sound recognition with time frequency audio features. IEEE Trans Audio Speech Lang Process 17:1142–1158. doi:10.1109/TASL.2009.2017438
Cook DJ, Augusto JC, Jakkula VR (2009) Ambient intelligence: technologies, applications, and opportunities. Pervasive Mob Comput 5:277–298. doi:10.1016/j.pmcj.2009.04.001
Cui X, Liu Q, Gao M, Metaxas DN (2011) Abnormal detection using interaction energy potentials. In: Proc IEEE Int Conf Computer Vision and Pattern Recog—CVPR’11, pp 3161–3167
Datta A, Shah M, Lobo NDV (2002) Person-on-person violence detection in video data. Object Recognit Supp User Interact Serv Robot 1:433–438. doi:10.1109/ICPR.2002.1044748
Demarty CH, Penet C, Soleymani M, Gravier G (2015) VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. Multimed Tools Appl 74:7379–7404. doi:10.1007/s11042-014-1984-4
Fernandez-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181
Gan T, Wong Y, Zhang D, Kankanhalli MS (2013) Temporal encoded F-formation system for social interaction detection. In: Proc 21st ACM Int Conf Multimed—MM’13, pp 937–946. doi:10.1145/2502081.2502096
Geen R (1990) Human aggression, 2nd edn. Open University Press, Buckingham
Huang W, Chiew TK, Li H, Kok TS, Biswas J (2010) Scream detection for home applications. In: Proc 5th IEEE Conf Indus Elec Appl, pp 2115–2120
Huang J, Xiao S, Zhou Q, Guo F, You X, Li H, Li B (2015) A robust feature extraction algorithm for the classification of acoustic targets in wild environments. Circ Syst Signal Process 34:1–12
Juang L-H, Wu M-N (2015) Fall down detection under smart home system. J Med Syst 39:1–12
Junejo IN, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33:172–185. doi:10.1109/TPAMI.2010.68
Kang B, Kim D (2013) Face Identification using affine simulated dense local descriptors. In: Proc Int Conf Ubiquitous Robots and Ambient Intelligence—URAI’13, pp 346–351
Kiktova E, Juhar J, Cizmar A (2015) Feature selection for acoustic events detection. Multimed Tools Appl 74(12):4213–4233
Kim YJ, Cho NG, Lee SW (2014) Group activity recognition with group interaction zone. In: Proc 22nd Int Conf Pat Recog—ICPR’14, pp 3517–3521
Kooij JFP, Liem MC, Krijnders JD et al (2016) Multi-modal human aggression detection. Comput Vis Image Underst 144:106–120. doi:10.1016/j.cviu.2015.06.009
Kotus J, Łopatka K, Czyżewski A, Bogdanis G (2016) Processing of acoustical data in a multimodal bank operating room surveillance system. Multimed Tools Appl 75:10787–10805. doi:10.1007/S11042-014-2264-Z
Krstulovic S, Gribonval R (2006) Mptk: matching pursuit made tractable. In: Proc 2006 IEEE Int Conf Acoustics Speed and Signal Process III-496–III-499
Lee Y, Han DK, Ko H (2013) Acoustic signal based abnormal event detection in indoor environment using multiclass adaboost. IEEE Trans Consum Electron 59:615–622. doi:10.1109/TCE.2013.6626247
Lei B, Mak M-W (2014) Sound-event partitioning and feature normalization for robust sound-event detection. Proc 19th Int Conf Digit Signal Process. doi:10.1109/ICDSP.2014.6900692
Li Y, Ho KC, Popescu M (2014) Efficient source separation algorithms for acoustic fall detection using a microsoft kinect. IEEE Trans Biomed Eng 61:745–755. doi:10.1109/TBME.2013.2288783
Lu Y, Payandeh S (2009) Intelligent cooperative tracking in multi-camera systems. In: Proc Ninth Int Conf Intel Syst Design Appl, pp 608–613
Madabhushi A, Aggarwal JK (1999) A Bayesian approach to human activity recognition. In: Proc 1999 IEEE Workshop Visual Surveillance, pp 25–32
Mastorakis G, Makris D (2014) Fall detection system using Kinect’s infrared sensor. J Real Time Image Process 9:635–646. doi:10.1007/s11554-012-0246-9
Mubashir M, Shao L, Seed L (2013) A survey on fall detection: Principles and approaches. Neurocomputing 100:144–152. doi:10.1016/j.neucom.2011.09.037
Nakadai K, Takahashi T, Okuno HG et al (2010) Design and implementation of robot audition system “HARK”—open source software for listening to three simultaneous speakers. Adv Robot 24:739–761. doi:10.1163/016918610X493561
Nguyen Q, Choi J (2015) Selection of the closest sound source for robot auditory attention in multi-source scenarios. J Intell Robot Syst Theory Appl 1–13. doi:10.1007/s10846-015-0313-0
Nguyen Q, Choi J (2017) Matching pursuit based robust acoustic event classification for surveillance system. Comp Elec Engr 57(1):43–54
Nguyen Q, Yun S, Choi J (2014) Audio–visual integration for human-robot interaction in multi-person scenarios. In: Proc IEEE Emer Tech Fac Autom—ETFA’14, pp 1–4
Pedregosa F, Varoquaux G, Gramfort A et al (2012) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830. doi:10.1007/s13398-014-0173-7.2
Piczak K (2015) Environmental sound classification with convolutional neural networks. In: Proc IEEE 25th Int Workshop Mach Learn Sig Process—MLSP’15, pp 1–6
Popoola OP, Wang K (2012) Video-based abnormal human behavior recognition—a review. Syst Man Cybern Part C Appl Rev IEEE Trans 42:865–878. doi:10.1109/TSMCC.2011.2178594
Richardson D, Green L (2006) Direct and indirect aggression: Relationships as social context. J Appl Soc Psychol 36(10):2492–2508
Robers S, Zhang A, Morgan RE, Musu-Gillette L (2015) Indicators of school crime and safety: 2014 (No. NCES 2015-072/NCJ 248036). US Department of Education, Washington, DC
Schädler M, Meyer B, Kollmeier B (2012) Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. J Acoust Soc Am 131(5):4134–4151
Schwarz LA, Mkhitaryan A, Mateus D, Navab N (2012) Human skeleton tracking from depth data using geodesic distances and optical flow. Image Vis Comput 30:217–226. doi:10.1016/j.imavis.2011.12.001
Shen G, Nguyen Q, Choi JS (2012) An environmental sound source classification system based on mel-frequency cepstral coefficients and gaussian mixture models. IFAC Proc 45(6):1802–1807
Song B, Ding C, Kamal AT et al (2011) Distributed camera networks. Signal Process Mag IEEE 28:20–31. doi:10.1109/MSP.2011.940441
Stowell D, Plumbley M (2014) Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ 2:e488
Valenzise G, Gerosa L, Tagliasacchi M, Antonacci F, Sarti A (2007) Scream and gunshot detection and localization for audio-surveillance systems. In: Proc IEEE Int Conf Adv Video Signal based Surveillance—AVSS’07, pp 21–26
Wang JC, Lin CH, Chen BW, Tsai M-K (2009) Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation. Proc 5th IEEE Int Work Vis Softw Underst Anal 25:27–27. doi:10.1109/VISSOF.2009.5336427
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79. doi:10.1007/s11263-012-0594-8
Yasin H, Khan SA (2008) Moment invariants based human mistrustful and suspicious motion detection, recognition and classification. In: Proc Comput Modeling Simul, pp 734–739
Yun SS, Choi J (2017) A remote management for school emergency situations using perception sensor network and interactive robots. In: Proc IEEE Int Conf Human-Robot Inter, pp 333–334
Acknowledgements
We give special thanks to Dr. Seong-Whan Lee and Nam-Gyu Cho, staff members at Korea University, for their technical cooperation and assistance in the integrated experiments and insightful discussions. The research was supported partly by the ‘Implementation of Technologies for Identification, Behavior, and Location of Human based on Sensor Network Fusion’ Program and ‘Development of Social Robot Intelligence for Social Human-Robot Interaction of Service Robots’ program through the Ministry of Trade, Industry, and Energy (Grant Number: 10041629 [SimonPiC] and 10077468 [DeepTasK]) and partly by ICT R&D programs of IITP (2015-0-00197 [LISTEN] and 2017-0-00432 [BCI]).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yun, SS., Nguyen, Q. & Choi, J. Recognition of emergency situations using audio–visual perception sensor network for ambient assistive living. J Ambient Intell Human Comput 10, 41–55 (2019). https://doi.org/10.1007/s12652-017-0597-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-017-0597-y