A Mutiscale Residual Attention Network for Multitask Learning of Human Activity Using Radar Micro-Doppler Signatures
"> Figure 1
<p>The CNN-based framework of the proposed <span class="html-italic">MRA-Net</span>. The model is composed of two parts: feature extractor and multitask classifier. In the multitask classifier part, there are three branches: activity recognition branch <span class="html-italic">P<sub>r</sub></span>, person identification branch <span class="html-italic">P<sub>i</sub></span> and FLWL branch <span class="html-italic">P<sub>w</sub></span>. There is a FC layer and a Softmax layer in <span class="html-italic">P<sub>i</sub></span> and <span class="html-italic">P<sub>w</sub></span>, respectively, and <span class="html-italic">V<sub>i</sub></span> and <span class="html-italic">V<sub>r</sub></span> denote the corresponding output vectors that are utilized for the final classifications of the two tasks. <span class="html-italic">W</span> denotes the output vector that is utilized for the automatic loss weight learning.</p> "> Figure 2
<p>The feature extractor part of <span class="html-italic">MRA-Net</span>, which is composed of three blocks. There are three branches in each block: coarse-scale learning branch, fine-scale learning branch and residual attention learning branch. All branches are able to facilitate the feature learning process. In the CNN-based feature extractor, the convolution operation with a kernel size of 3 × 3 and a stride of 1 is denoted as 3 × 3/1, and the pooling operation is denoted as the same way.</p> "> Figure 3
<p>Convolution operations with different convolution kernels. The 5 × 5 kernel is suitable to learn coarse-scale features while the 3 × 3 kernel is suitable to learn fine-scale features in MD signatures.</p> "> Figure 4
<p>The initialization process of loss weights. From the accuracy curves of the activity recognition task and person identification task under several typical ratio values <math display="inline"><semantics> <msub> <mi>r</mi> <mi>w</mi> </msub> </semantics></math>, it is shown that <math display="inline"><semantics> <mrow> <mo>[</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mn>2</mn> <mn>3</mn> </mfrac> </mstyle> <mo>,</mo> <mn>1</mn> <mo>]</mo> </mrow> </semantics></math> is the proper initial range for <math display="inline"><semantics> <msub> <mi>r</mi> <mi>w</mi> </msub> </semantics></math>.</p> "> Figure 5
<p>(<b>a</b>) Photo of UWB radar module named <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>u</mi> <mi>l</mi> <mi>s</mi> <mi>O</mi> <mi>N</mi> </mrow> </semantics></math> <span class="html-italic">440</span>; and (<b>b</b>) experimental deployment diagram where a tested person is running towards the radar at a velocity <span class="html-italic">v</span>.</p> "> Figure 6
<p>Radar data preprocessing: (<b>a</b>) Raw radar data; (<b>b</b>) radar data with background clutter suppression; and (<b>c</b>) radar MD signature.</p> "> Figure 7
<p>Typical MD signatures of five activities performed by six subjects. From top to bottom: box, circle, jump, run and walk. From left to right: Sub #1, Sub #2, Sub #3, Sub #4, Sub #5 and Sub #6. In each spectrogram, the radial velocity range is from −5.14 m/s to 5.14 m/s, and the activity duration is 1 s. It is shown that every piece of MD signature is both activity-specific and individual-unique.</p> "> Figure 8
<p>(<b>a</b>) Confusion matrix of the person identification task. (<b>b</b>) Confusion matrix of the activity recognition task. To illustrate the classification performance more clearly with confusion matrix, color is used to indicate the value <span class="html-italic">log(Recall)</span> instead of <span class="html-italic">Recall</span>.</p> "> Figure 9
<p>F1-score curves of five activities for person identification. “Run” has the highest F1-score, which indicates that the spectrogram of “run” is the most efficient for person identification among the five activities.</p> "> Figure 10
<p>(<b>a</b>) Performance comparison for person identification on test dataset. (<b>b</b>) Performance comparison for activity recognition on test dataset. The proposed <span class="html-italic">MRA-Net</span> for joint activity recognition and person identification outperforms the state-of-the-art single-task approaches.</p> "> Figure 11
<p>Performance comparison of <span class="html-italic">MRA-Net</span> and <span class="html-italic">JMI-CNN</span> for MTL under different SNRs.</p> "> Figure 12
<p>(<b>a</b>) Visualization of the loss weights for the person identification and activity classification tasks. (<b>b</b>) Bar chart for the statistical result of <span class="html-italic">r<sub>w</sub></span>. It is indicated that the automatically assigned loss weights for each MD signature vary, and most <span class="html-italic">r<sub>w</sub></span> are between <math display="inline"><semantics> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mrow> <mn>2.3</mn> </mrow> <mrow> <mn>2.7</mn> </mrow> </mfrac> </mstyle> </semantics></math> and <math display="inline"><semantics> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mrow> <mn>2.4</mn> </mrow> <mrow> <mn>2.6</mn> </mrow> </mfrac> </mstyle> </semantics></math>.</p> ">
Abstract
:1. Introduction
- A novel multiscale residual attention network, named as MRA-Net, is proposed to jointly perform human identification and activity recognition tasks based on radar MD spectrograms. MRA-Net outperforms the state-of-the-art methods for both tasks by jointly recognizing human activities and identifying persons.
- A fine-grained loss weight learning mechanism is proposed to automatically search for proper loss weights rather than equalizing or manually tuning the loss weight of each task.
- Extensive experiments with state-of-the-art results have validated the feasibility of radar-based joint activity recognition and person identification, as well as the effectiveness of MRA-Net towards this issue.
2. Related Work
2.1. Person Identification
2.2. Activity Recognition
2.3. Multitask Learning
3. Multiscale Residual Attention Network
3.1. Multiscale Learning
3.2. Residual Attention Learning
3.3. Fine-Grained Loss Weight Learning
4. Dataset Description
5. Evaluation and Implementation
5.1. Performance Metrics
5.2. Implementation Details
6. Experiments and Discussion
6.1. Experimental Results
6.2. Comparison with the State-of-the-Art
6.3. Fine-grained Loss Weight Learning
6.4. Ablation Study
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
IoT | Internet of Things |
MRA-Net | Multiscale Residual Attention Network |
FLWL | Fine-grained Loss Weight Learning |
MD | micro-Doppler |
CNN | Convolutional Neural Networks |
MTL | Multitask Learning |
Fully Connected | FC |
UWB | Ultra-wideband |
PRF | Pulse Repetition Frequency |
CPI | Coherent Processing Interval |
TN | True Negatives |
FP | False Positives |
FN | False Negatives |
TP | True Positives |
Adam | Adaptive Moment Estimation |
RCS | Radar Cross-Section |
DCNN | Deep Convolution Neural Network |
LSTM | Stacked Long Short Term Memory |
CAE | Convolutional Autoencoder |
GWN | Gaussian White Noise |
SNR | Signal Noise Ratio |
References
- Lien, J.; Gillian, N.; Karagozler, M.E.; Amihood, P.; Schwesig, C.; Olson, E.; Raja, H.; Poupyrev, I. Soli: Ubiquitous gesture sensing with millimeter wave radar. ACM Trans. Graph. (TOG) 2016, 35, 142. [Google Scholar] [CrossRef]
- Zou, Y.; Liu, W.; Wu, K.; Ni, L.M. Wi-Fi radar: Recognizing human behavior with commodity Wi-Fi. IEEE Commun. Mag. 2017, 55, 105–111. [Google Scholar] [CrossRef]
- Lee, I.; Lee, K. The Internet of Things (IoT): Applications, investments, and challenges for enterprises. Bus. Horizons 2015, 58, 431–440. [Google Scholar] [CrossRef]
- Brutti, A.; Cavallaro, A. Online Cross-Modal Adaptation for Audio–Visual Person Identification With Wearable Cameras. IEEE Trans. Hum.-Mach. Syst. 2017, 47, 40–51. [Google Scholar] [CrossRef]
- Vandersmissen, B.; Knudde, N.; Jalalvand, A.; Couckuyt, I.; Bourdoux, A.; De Neve, W.; Dhaene, T. Indoor Person Identification Using a Low-Power FMCW Radar. IEEE Trans. Geosci. Remote. Sens. 2018, 56, 3941–3952. [Google Scholar] [CrossRef] [Green Version]
- Kim, Y.; Ling, H. Human activity classification based on micro-Doppler signatures using a support vector machine. IEEE Trans. Geosci. Remote Sens. 2009, 47, 1328–1337. [Google Scholar]
- Liu, K.; Liu, W.; Gan, C.; Tan, M.; Ma, H. T-C3D: Temporal convolutional 3D network for real-time action recognition. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Kim, J.H.; Hong, G.S.; Kim, B.G.; Dogra, D.P. deepGesture: Deep learning-based gesture recognition scheme using motion sensors. Displays 2018, 55, 38–45. [Google Scholar] [CrossRef]
- Seyfioğlu, M.S.; Özbayğglu, A.M.; Gurbuz, S.Z. Deep convolutional autoencoder for radar-based classification of similar aided and unaided human activities. IEEE Trans. Aerosp. Electron. Syst. 2018, 54, 1709–1723. [Google Scholar] [CrossRef]
- Yan, Y.; Ricci, E.; Subramanian, R.; Liu, G.; Lanz, O.; Sebe, N. A multi-task learning framework for head pose estimation under target motion. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 1070–1083. [Google Scholar] [CrossRef]
- Liu, A.; Su, Y.; Nie, W.; Kankanhalli, M.S. Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 102–114. [Google Scholar] [CrossRef]
- Ruder, S. An overview of multi-task learning in deep neural networks. arXiv 2017, arXiv:1706.05098. [Google Scholar]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual attention network for image classification. arXiv 2017, arXiv:1704.06904. [Google Scholar]
- Ranjan, R.; Patel, V.M.; Chellappa, R. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 121–135. [Google Scholar] [CrossRef] [PubMed]
- Tian, Y.; Luo, P.; Wang, X.; Tang, X. Pedestrian detection aided by deep learning semantic tasks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015. [Google Scholar]
- Liang, G.; Lan, X.; Zheng, K.; Wang, S.; Zheng, N. Cross-View Person Identification by Matching Human Poses Estimated with Confidence on Each Body Joint. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Ali, M.M.; Mahale, V.H.; Yannawar, P.; Gaikwad, A. Fingerprint recognition for person identification and verification based on minutiae matching. In Proceedings of the IEEE 6th International Conference on Advanced Computing (IACC), Bhimavaram, India, 27–28 February 2016; pp. 332–339. [Google Scholar]
- Zeng, Y.; Pathak, P.H.; Mohapatra, P. WiWho: Wifi-based person identification in smart spaces. In Proceedings of the 15th International Conference on Information Processing in Sensor Networks, Vienna, Austria, 11–14 April 2016; p. 4. [Google Scholar]
- Cao, P.; Xia, W.; Ye, M.; Zhang, J.; Zhou, J. Radar-ID: Human identification based on radar micro-Doppler signatures using deep convolutional neural networks. IET Radar Sonar Navig. 2018, 12, 729–734. [Google Scholar] [CrossRef]
- Huang, X.; Dai, M. Indoor device-free activity recognition based on radio signal. IEEE Trans. Veh. Technol. 2017, 66, 5316–5329. [Google Scholar] [CrossRef]
- Zhang, T.; Li, N.; Huang, J.; Zhong, J.X.; Li, G. An Active Action Proposal Method Based on Reinforcement Learning. In Proceedings of the 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4053–4057. [Google Scholar]
- Wang, J.; Chen, Y.; Hao, S.; Peng, X.; Hu, L. Deep learning for sensor-based activity recognition: A survey. Pattern Recognit. Lett. 2019, 119, 3–11. [Google Scholar] [CrossRef] [Green Version]
- Kim, Y.; Moon, T. Human detection and activity classification based on micro-Doppler signatures using deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2016, 13, 8–12. [Google Scholar] [CrossRef]
- Le, H.T.; Phung, S.L.; Bouzerdoum, A.; Tivive, F.H.C. Human Motion Classification with Micro-Doppler Radar and Bayesian-Optimized Convolutional Neural Networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2961–2965. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Mioranda-Correa, J.A.; Patras, I. A Multi-Task Cascaded Network for Prediction of Affect, Personality, Mood and Social Context Using EEG Signals. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition, Xi’an, China, 15–19 May 2018; pp. 373–380. [Google Scholar]
- Kendall, A.; Gal, Y.; Cipolla, R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. arXiv 2017, arXiv:1705.07115. [Google Scholar]
- Yin, X.; Liu, X. Multi-task convolutional neural network for pose-invariant face recognition. IEEE Trans. Image Process. 2018, 27, 964–975. [Google Scholar] [CrossRef]
- Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 354–370. [Google Scholar]
- Vignaud, L.; Ghaleb, A.; Le Kernec, J.; Nicolas, J.M. Radar high resolution range & micro-doppler analysis of human motions. In Proceedings of the 2009 International Radar Conference “Surveillance for a Safer World” (RADAR 2009), Bordeaux, France, 12–16 October 2009; pp. 1–6. [Google Scholar]
- Wang, M.; Zhang, Y.D.; Cui, G. Human motion recognition exploiting radar with stacked recurrent neural network. Digit. Signal Process. 2019, 87, 125–131. [Google Scholar] [CrossRef]
- Lang, Y.; Wang, Q.; Yang, Y.; Hou, C.; Liu, H.; He, Y. Joint Motion Classification and Person Identification via Multi-Task Learning for Smart Homes. IEEE Internet Things J. 2019. [Google Scholar] [CrossRef]
Center Frequency | 4.0 GHz |
Chirp Bandwidth | 1.8 GHz |
Pulse Repetition Frequency (PRF) | 290 Hz |
Coherent Processing Interval (CPI) | 0.2 s |
Sub #1 | Sub #2 | Sub #3 | Sub #4 | Sub #5 | Sub #6 | |
---|---|---|---|---|---|---|
Gender | male | male | male | female | male | female |
Age | 23 | 25 | 23 | 23 | 23 | 24 |
Height (cm) | 173 | 178 | 172 | 166 | 188 | 169 |
Weight (kg) | 73 | 71 | 75 | 66 | 92 | 52 |
Act. | Walk | Run | Jump | Circle | Box | ||
---|---|---|---|---|---|---|---|
Num. | |||||||
Sub. | |||||||
Sub #1 | 340 | 338 | 166 | 396 | 266 | ||
Sub #2 | 388 | 282 | 160 | 356 | 197 | ||
Sub #3 | 233 | 237 | 220 | 430 | 158 | ||
Sub #4 | 208 | 211 | 100 | 242 | 141 | ||
Sub #5 | 256 | 175 | 190 | 320 | 145 | ||
Sub #6 | 177 | 414 | 197 | 316 | 239 |
Activity Recognition | Person Identification | |
---|---|---|
MRA-Net for multitask learning | 98.29% | 95.87% |
MRA-Net for activity recognition | 97.61% | × |
MRA-Net for person identification | × | 91.34% |
Multitask Learning | Activity Recognition | Person Identification | |
---|---|---|---|
Greedy search | 95.72% | 88.97% | |
97.28% | 91.37% | ||
98.43% | 94.85% | ||
96.13% | 93.16% | ||
1 | 97.45% | 90.23% | |
FLWL mechanism | 98.29% | 95.87% |
Multiscale Learning | Residual Attention Learning Mechanism | F1-Score of Activity Recognition | F1-Score of Person Identification | Execution Time | ||
---|---|---|---|---|---|---|
Coarse Scale | Fine Scale | |||||
(1) | √ | × | × | 92.96% | 89.75% | 2.31 s |
(2) | × | √ | × | 94.52% | 90.14% | −13.96% |
(3) | √ | √ | × | 96.83% | 91.30% | +89.28% |
(4) | √ | × | √ | 97.71% | 93.89% | +83.03% |
(5) | × | √ | √ | 98.18% | 94.06% | +75.41% |
(6) | √ | √ | √ | 98.29% | 95.87% | +136.03% |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
He, Y.; Li, X.; Jing, X. A Mutiscale Residual Attention Network for Multitask Learning of Human Activity Using Radar Micro-Doppler Signatures. Remote Sens. 2019, 11, 2584. https://doi.org/10.3390/rs11212584
He Y, Li X, Jing X. A Mutiscale Residual Attention Network for Multitask Learning of Human Activity Using Radar Micro-Doppler Signatures. Remote Sensing. 2019; 11(21):2584. https://doi.org/10.3390/rs11212584
Chicago/Turabian StyleHe, Yuan, Xinyu Li, and Xiaojun Jing. 2019. "A Mutiscale Residual Attention Network for Multitask Learning of Human Activity Using Radar Micro-Doppler Signatures" Remote Sensing 11, no. 21: 2584. https://doi.org/10.3390/rs11212584
APA StyleHe, Y., Li, X., & Jing, X. (2019). A Mutiscale Residual Attention Network for Multitask Learning of Human Activity Using Radar Micro-Doppler Signatures. Remote Sensing, 11(21), 2584. https://doi.org/10.3390/rs11212584