Recognition of emergency situations using audio–visual perception sensor network for ambient assistive living

Sang-Seok Yun¹,
Quang Nguyen² &
JongSuk Choi²

636 Accesses
9 Citations
Explore all metrics

Abstract

In this paper, we present a perception sensor network (PSN) capable of detecting audio- and visual-based emergency situations such as students’ quarrel with scream and punch, and of keeping an effective school safety. As a system aspect, PSN is basically composed of ambient type sensor units using a Kinect, a pan-tilt-zoom camera, and a control board to acquire raw audio signals, color and depth images. Audio signals, which are acquired by the Kinect microphone array, are used in recognizing sound classes and localizing that sound source. Vision signals, which are acquired by the Kinect and PTZ camera stream, are used to detect the location of humans, identify their name and recognize their gestures. In the system, fusion methods are utilized to associate with multiple person detection and tracking, face identification, and audio–visual emergency recognition. Two approaches of matching pursuit algorithm and dense trajectories covariance matrix are also applied for reliably recognizing abnormal activities of students. Through this, human-caused emergencies are detected automatically while identifying human data of occurrence place, subject, and emergency type. Our PSN that consists of four units was used to conduct experiments to detect the designated target with abnormal actions in multi-person scenarios. By evaluating the performance of perception capabilities and integrated system, it was confirmed that the proposed system can help to conduct more meaningful information which can be of substantive support to teachers or staff members in school environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Methods and Algorithms of Audio-Video Signal Processing for Analysis of Indoor Human Activity

A computer vision-based perception system for visually impaired

Article 21 May 2016

Obstacle and Fall Detection to Guide the Visually Impaired People with Real Time Monitoring

Article 27 June 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

School Violence, http://www.cdc.gov/ViolencePrevention/youthviolence/schoolviolence.
Korean National Police Agency, https://www.police.go.kr.
Korean School Information, http://www.schoolinfo.go.kr.
BEHAVE dataset, http://homepages.inf.ed.ac.uk/rbf/BEHAVE.
D-CASE challenge, http://www.cs.tut.fi/sgn/arg/dcase2017/challenge/index.
Robot Operating System, https://wiki.ros.org/indigo.
OpenNI package, https://wiki.ros.org/openni_tracker.
The boundaries of an object on an image.
https://www.sound-ideas.com/.
https://www.freesound.org/.

References

An K, Lee G, Yun SS, Choi J (2015) Multiple humans recognition of robot aided by perception sensor network. In: Proc Int Conf Ubiquitous Robots and Ambient Intelligence—URAI’15 pp 359–361
Blunsden S, Andrade E, Fisher R (2007) Non parametric classification of human interaction. In: Proc 3rd Iberian Conf. Pattern Recog. Image Anal, pp 347–354
Candamo J, Shreve M, Goldgof DB, Sapper DB, Kasturi R (2010) Understanding transit scenes: a survey on human behavior-recognition algorithm. IEEE Trans Intell Trans Sys 11(1):206–224
Article Google Scholar
Chen C-C, Yao Y, Drira A et al (2009) Cooperative mapping of multiple PTZ cameras in automated surveillance systems. In: Proc IEEE Int conf computer vision and pattern recog—CVPR’09, pp 1078–1084
Chu S, Narayanan S, Kuo C-CJ (2009) Environmental sound recognition with time frequency audio features. IEEE Trans Audio Speech Lang Process 17:1142–1158. doi:10.1109/TASL.2009.2017438
Article Google Scholar
Cook DJ, Augusto JC, Jakkula VR (2009) Ambient intelligence: technologies, applications, and opportunities. Pervasive Mob Comput 5:277–298. doi:10.1016/j.pmcj.2009.04.001
Article Google Scholar
Cui X, Liu Q, Gao M, Metaxas DN (2011) Abnormal detection using interaction energy potentials. In: Proc IEEE Int Conf Computer Vision and Pattern Recog—CVPR’11, pp 3161–3167
Datta A, Shah M, Lobo NDV (2002) Person-on-person violence detection in video data. Object Recognit Supp User Interact Serv Robot 1:433–438. doi:10.1109/ICPR.2002.1044748
Article Google Scholar
Demarty CH, Penet C, Soleymani M, Gravier G (2015) VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. Multimed Tools Appl 74:7379–7404. doi:10.1007/s11042-014-1984-4
Article Google Scholar
Fernandez-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181
MathSciNet MATH Google Scholar
Gan T, Wong Y, Zhang D, Kankanhalli MS (2013) Temporal encoded F-formation system for social interaction detection. In: Proc 21st ACM Int Conf Multimed—MM’13, pp 937–946. doi:10.1145/2502081.2502096
Geen R (1990) Human aggression, 2nd edn. Open University Press, Buckingham
Google Scholar
Huang W, Chiew TK, Li H, Kok TS, Biswas J (2010) Scream detection for home applications. In: Proc 5th IEEE Conf Indus Elec Appl, pp 2115–2120
Huang J, Xiao S, Zhou Q, Guo F, You X, Li H, Li B (2015) A robust feature extraction algorithm for the classification of acoustic targets in wild environments. Circ Syst Signal Process 34:1–12
Article Google Scholar
Juang L-H, Wu M-N (2015) Fall down detection under smart home system. J Med Syst 39:1–12
Article Google Scholar
Junejo IN, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33:172–185. doi:10.1109/TPAMI.2010.68
Article Google Scholar
Kang B, Kim D (2013) Face Identification using affine simulated dense local descriptors. In: Proc Int Conf Ubiquitous Robots and Ambient Intelligence—URAI’13, pp 346–351
Kiktova E, Juhar J, Cizmar A (2015) Feature selection for acoustic events detection. Multimed Tools Appl 74(12):4213–4233
Article Google Scholar
Kim YJ, Cho NG, Lee SW (2014) Group activity recognition with group interaction zone. In: Proc 22nd Int Conf Pat Recog—ICPR’14, pp 3517–3521
Kooij JFP, Liem MC, Krijnders JD et al (2016) Multi-modal human aggression detection. Comput Vis Image Underst 144:106–120. doi:10.1016/j.cviu.2015.06.009
Article Google Scholar
Kotus J, Łopatka K, Czyżewski A, Bogdanis G (2016) Processing of acoustical data in a multimodal bank operating room surveillance system. Multimed Tools Appl 75:10787–10805. doi:10.1007/S11042-014-2264-Z
Article Google Scholar
Krstulovic S, Gribonval R (2006) Mptk: matching pursuit made tractable. In: Proc 2006 IEEE Int Conf Acoustics Speed and Signal Process III-496–III-499
Lee Y, Han DK, Ko H (2013) Acoustic signal based abnormal event detection in indoor environment using multiclass adaboost. IEEE Trans Consum Electron 59:615–622. doi:10.1109/TCE.2013.6626247
Article Google Scholar
Lei B, Mak M-W (2014) Sound-event partitioning and feature normalization for robust sound-event detection. Proc 19th Int Conf Digit Signal Process. doi:10.1109/ICDSP.2014.6900692
Google Scholar
Li Y, Ho KC, Popescu M (2014) Efficient source separation algorithms for acoustic fall detection using a microsoft kinect. IEEE Trans Biomed Eng 61:745–755. doi:10.1109/TBME.2013.2288783
Article Google Scholar
Lu Y, Payandeh S (2009) Intelligent cooperative tracking in multi-camera systems. In: Proc Ninth Int Conf Intel Syst Design Appl, pp 608–613
Madabhushi A, Aggarwal JK (1999) A Bayesian approach to human activity recognition. In: Proc 1999 IEEE Workshop Visual Surveillance, pp 25–32
Mastorakis G, Makris D (2014) Fall detection system using Kinect’s infrared sensor. J Real Time Image Process 9:635–646. doi:10.1007/s11554-012-0246-9
Article Google Scholar
Mubashir M, Shao L, Seed L (2013) A survey on fall detection: Principles and approaches. Neurocomputing 100:144–152. doi:10.1016/j.neucom.2011.09.037
Article Google Scholar
Nakadai K, Takahashi T, Okuno HG et al (2010) Design and implementation of robot audition system “HARK”—open source software for listening to three simultaneous speakers. Adv Robot 24:739–761. doi:10.1163/016918610X493561
Article Google Scholar
Nguyen Q, Choi J (2015) Selection of the closest sound source for robot auditory attention in multi-source scenarios. J Intell Robot Syst Theory Appl 1–13. doi:10.1007/s10846-015-0313-0
Nguyen Q, Choi J (2017) Matching pursuit based robust acoustic event classification for surveillance system. Comp Elec Engr 57(1):43–54
Article Google Scholar
Nguyen Q, Yun S, Choi J (2014) Audio–visual integration for human-robot interaction in multi-person scenarios. In: Proc IEEE Emer Tech Fac Autom—ETFA’14, pp 1–4
Pedregosa F, Varoquaux G, Gramfort A et al (2012) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830. doi:10.1007/s13398-014-0173-7.2
MathSciNet MATH Google Scholar
Piczak K (2015) Environmental sound classification with convolutional neural networks. In: Proc IEEE 25th Int Workshop Mach Learn Sig Process—MLSP’15, pp 1–6
Popoola OP, Wang K (2012) Video-based abnormal human behavior recognition—a review. Syst Man Cybern Part C Appl Rev IEEE Trans 42:865–878. doi:10.1109/TSMCC.2011.2178594
Article Google Scholar
Richardson D, Green L (2006) Direct and indirect aggression: Relationships as social context. J Appl Soc Psychol 36(10):2492–2508
Article Google Scholar
Robers S, Zhang A, Morgan RE, Musu-Gillette L (2015) Indicators of school crime and safety: 2014 (No. NCES 2015-072/NCJ 248036). US Department of Education, Washington, DC
Google Scholar
Schädler M, Meyer B, Kollmeier B (2012) Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. J Acoust Soc Am 131(5):4134–4151
Article Google Scholar
Schwarz LA, Mkhitaryan A, Mateus D, Navab N (2012) Human skeleton tracking from depth data using geodesic distances and optical flow. Image Vis Comput 30:217–226. doi:10.1016/j.imavis.2011.12.001
Article Google Scholar
Shen G, Nguyen Q, Choi JS (2012) An environmental sound source classification system based on mel-frequency cepstral coefficients and gaussian mixture models. IFAC Proc 45(6):1802–1807
Article Google Scholar
Song B, Ding C, Kamal AT et al (2011) Distributed camera networks. Signal Process Mag IEEE 28:20–31. doi:10.1109/MSP.2011.940441
Article Google Scholar
Stowell D, Plumbley M (2014) Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ 2:e488
Article Google Scholar
Valenzise G, Gerosa L, Tagliasacchi M, Antonacci F, Sarti A (2007) Scream and gunshot detection and localization for audio-surveillance systems. In: Proc IEEE Int Conf Adv Video Signal based Surveillance—AVSS’07, pp 21–26
Wang JC, Lin CH, Chen BW, Tsai M-K (2009) Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation. Proc 5th IEEE Int Work Vis Softw Underst Anal 25:27–27. doi:10.1109/VISSOF.2009.5336427
Google Scholar
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79. doi:10.1007/s11263-012-0594-8
Article MathSciNet Google Scholar
Yasin H, Khan SA (2008) Moment invariants based human mistrustful and suspicious motion detection, recognition and classification. In: Proc Comput Modeling Simul, pp 734–739
Yun SS, Choi J (2017) A remote management for school emergency situations using perception sensor network and interactive robots. In: Proc IEEE Int Conf Human-Robot Inter, pp 333–334

Download references

Acknowledgements

We give special thanks to Dr. Seong-Whan Lee and Nam-Gyu Cho, staff members at Korea University, for their technical cooperation and assistance in the integrated experiments and insightful discussions. The research was supported partly by the ‘Implementation of Technologies for Identification, Behavior, and Location of Human based on Sensor Network Fusion’ Program and ‘Development of Social Robot Intelligence for Social Human-Robot Interaction of Service Robots’ program through the Ministry of Trade, Industry, and Energy (Grant Number: 10041629 [SimonPiC] and 10077468 [DeepTasK]) and partly by ICT R&D programs of IITP (2015-0-00197 [LISTEN] and 2017-0-00432 [BCI]).

Author information

Authors and Affiliations

Division of Mechanical Convergence Engineering, Silla University, Busan, 46958, Republic of Korea
Sang-Seok Yun
Center for Robotics Research, Korea Institute of Science and Technology, Seoul, 02792, Republic of Korea
Quang Nguyen & JongSuk Choi

Authors

Sang-Seok Yun
View author publications
You can also search for this author in PubMed Google Scholar
Quang Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
JongSuk Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to JongSuk Choi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yun, SS., Nguyen, Q. & Choi, J. Recognition of emergency situations using audio–visual perception sensor network for ambient assistive living. J Ambient Intell Human Comput 10, 41–55 (2019). https://doi.org/10.1007/s12652-017-0597-y

Download citation

Received: 30 November 2016
Accepted: 15 October 2017
Published: 28 October 2017
Issue Date: 29 January 2019
DOI: https://doi.org/10.1007/s12652-017-0597-y

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Methods and Algorithms of Audio-Video Signal Processing for Analysis of Indoor Human Activity

A computer vision-based perception system for visually impaired

Obstacle and Fall Detection to Guide the Visually Impaired People with Real Time Monitoring

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Recognition of emergency situations using audio–visual perception sensor network for ambient assistive living

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Methods and Algorithms of Audio-Video Signal Processing for Analysis of Indoor Human Activity

A computer vision-based perception system for visually impaired

Obstacle and Fall Detection to Guide the Visually Impaired People with Real Time Monitoring

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation