Deep Learning with Dynamically Weighted Loss Function for Sensor-Based Prognostics and Health Management
<p>Basic components in a perceptron comprises of the input layer that can take in an arbitrary number of inputs, <span class="html-italic">s</span>, the weight, <span class="html-italic">w</span> that maps the inputs to the subsequent layer, a bias, <span class="html-italic">b</span>, activation function <span class="html-italic">H</span> to introduce non-linearity into the function and the output, <span class="html-italic">Z</span>.</p> "> Figure 2
<p>An initial weight value iteratively minimized based on the partial derivative of a loss function to achieve global minima in loss.</p> "> Figure 3
<p>A deep feedforward neural network (DNN) similar to the perceptron has the input layer along with the output layer. However, the DNN has a large number of hidden layers and neuron units.</p> "> Figure 4
<p>The <b>top</b> part of the figure illustrates a normal two dimensional convolutional neural network (CNN) with its convolutional layer and max pooling layer. The max pooling layer is subsequently flattened to feed the data into a fully connected layer. The <b>bottom</b> figure is a one-dimensional convolutional neural network (CNN1D) network where the filter is moving in only one direction to perform the convolution and max-pooling operations.</p> "> Figure 5
<p>The long short-term memory (LSTM) unit contain a forget gate, output gate and input gate. The yellow circle represents the sigmoid activation function while the pink circle represents a tanh activation function. Additionally, the ”×” and “+” symbols are the element-wise multiplication and addition operator.</p> "> Figure 6
<p>The Bi-LSTM and Bi-GRU are structurally the same except for the LSTM and GRU unit. The red arrows indicate the input value flow, blue arrows are the output values, and the grey arrows represent the information flow between the LSTM/GRU units.</p> "> Figure 7
<p>The gated recurrent unit (GRU) unit contain a reset gate and update gate. The yellow circle represents the sigmoid activation function while the pink circle represents a tanh activation function. Additionally, the ”×”, “+”, and ”1–” symbols are the element-wise multiplication, addition, and inversion operator.</p> "> Figure 8
<p>The output <math display="inline"><semantics> <mrow> <mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </semantics></math> from the deep learning model and the ground truth <span class="html-italic">Y</span> are used to calculate the mean square error (MSE) for one instance. The MSE is then passed through a non-linear function to produce the weight that will be used to dynamically adjust the loss function.</p> "> Figure 9
<p>The output <math display="inline"><semantics> <mrow> <mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </semantics></math> from the deep learning model and the ground truth <span class="html-italic">Y</span> are used to calculate the cross entropy (CE) loss for one instance. The CE is then combined with the weighted function to produce the weight that will be used to dynamically adjust the loss function.</p> "> Figure 10
<p>Boxplot of final cost using a combination of gamma value, [1, 2, 3, 4, 5] and alpha value, [0.25, 0.5, 0.75, 1.0]. The x-axis are denoted by the combination of alpha and gamma. For instance, ’g1a100’ represents gamma value of 1 and alpha of 1.0.</p> "> Figure 11
<p>A confusion matrix with the associated cost of each fault. A confusion matrix tabulates the performance of a classification model. A true positive and true negative are correct classification therefore, there are no cost associated to it. Whereas false positive and false negative receive a cost of 10 and 500 respectively. The <span class="html-italic">p</span> and <span class="html-italic">n</span> represents positive and negative class while <span class="html-italic">P</span> and <span class="html-italic">N</span> represents the total positive and negative class. The actual class is denoted by an apostrophe.</p> "> Figure 12
<p>Boxplots of all scoring functions result from the four deep learning models, (<b>a</b>) DNN, (<b>b</b>) Bi-GRU, (<b>c</b>) CNN1D, and (d) Bi-LSTM using a dynamically weighted loss function, and without the weight. The asterisk on the top of each boxplot denotes the <span class="html-italic">p</span>-value where “***” < 0.001, “**” < 0.01, “*” < 0.05.</p> "> Figure 13
<p>Boxplots of Costs result from using (<b>a</b>) DNN, (<b>b</b>) BiGRU, (<b>c</b>) CNN1D, and (d) BiLSTM with CE and FL respectively. The asterisk on the top of each boxplot denotes the <span class="html-italic">p</span>-value where “***” < 0.001, “**” < 0.01, “*” < 0.05.</p> "> Figure 14
<p>PR Curve for (<b>a</b>) DNN, (<b>b</b>) Bi-GRU, (<b>c</b>) CNN1D, and (<b>d</b>) Bi-LSTM using focal loss (<span style="color:green">Green line</span>) vs. cross entropy loss (<span style="color:red">Red line</span>). The AUC of PR curves are included at the top of each plot for each loss function.</p> ">
Abstract
:1. Introduction
2. Background
2.1. How Neural Network Learning Is Performed
2.2. Deep Feedforward Neural Network
2.3. Convolutional Neural Networks
2.4. Long Short-Term Memory
2.5. Gated Recurrent Unit
2.6. Current Deep Learning Solutions
3. Dynamically Weighted Loss Function
3.1. Proposed Dynamically Weighted Loss Function
3.2. Focal Loss Function
4. Experimental Design
4.1. Case Study 1: Remaining Useful Life Prediction of Gas Turbine Engine
4.1.1. Data Description
4.1.2. Data Preprocessing
4.1.3. Deep Learning Architectures Investigated
4.1.4. Evaluation Metrics
4.2. Case Study 2: Fault Detection in Air Pressure System of Heavy Trucks
4.2.1. Data Description
4.2.2. Data Preprocessing
4.2.3. Deep Learning Architectures
4.2.4. Evaluation Metrics
5. Results and Discussion
5.1. Case Study 1: Remaining Useful Life Prediction of Gas Turbine Engine
5.2. Case Study 2: Fault Detection in Air Pressure System of Heavy Trucks
6. Conclusions and Future Work
Author Contributions
Funding
Conflicts of Interest
References
- He, Y.; Gu, C.; Chen, Z.; Han, X. Integrated predictive maintenance strategy for manufacturing systems by combining quality control and mission reliability analysis. Int. J. Prod. Res. 2017, 55, 5841–5862. [Google Scholar] [CrossRef]
- Short, M.; Twiddle, J. An industrial digitalization platform for condition monitoring and predictive maintenance of pumping equipment. Sensors 2019, 19, 3781. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, F.; He, Y.; Zhao, Y.; Zhang, A.; Zhou, D. Risk-oriented product assembly system health modeling and predictive maintenance strategy. Sensors 2019, 19, 2086. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhu, M.; Liu, C. A Correlation Driven Approach with Edge Services for Predictive Industrial Maintenance. Sensors 2018, 18, 1844. [Google Scholar] [CrossRef] [Green Version]
- Tsui, K.L.; Chen, N.; Zhou, Q.; Hai, Y.; Wang, W. Prognostics and Health Management: A Review on Data Driven Approaches. Math. Prob. Eng. 2015, 2015, 793161. [Google Scholar] [CrossRef] [Green Version]
- Rengasamy, D.; Morvan, H.P.; Figueredo, G.P. Deep learning approaches to aircraft maintenance, repair and overhaul: A review. In Proceedings of the IEEE 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 150–156. [Google Scholar]
- Figueredo, G.P.; Owa, K.; John, R.I. Multi-Objective Optimization for Preventive Maintenance in Transportation: A Review; Technical Report; University of Nottingham: Nottingham, UK, 2018. [Google Scholar]
- Khan, S.; Yairi, T. A review on the application of deep learning in system health management. Mec. Syst. Signal Proc. 2018, 107, 241–265. [Google Scholar] [CrossRef]
- Ellefsen, A.L.; Æsøy, V.; Ushakov, S.; Zhang, H. A Comprehensive Survey of Prognostics and Health Management Based on Deep Learning for Autonomous Ships. IEEE Trans. Reliab. 2019, 68, 720–740. [Google Scholar] [CrossRef] [Green Version]
- Jafari, M.; Li, R.; Xing, Y.; Auer, D.; Francis, S.; Garibaldi, J.; Chen, X. FU-net: Multi-class Image Segmentation using Feedback Weighted U-net. In Proceedings of the International Conference on Image and Graphics (ICIG 2019), Beijing, China, 23–25 August 2019. [Google Scholar]
- Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar] [CrossRef] [Green Version]
- Saxena, A.; Goebel, K.; Simon, D.; Eklund, N. Damage propagation modeling for aircraft engine run-to-failure simulation. In Proceedings of the 2008 International Conference on Prognostics and Health Management, Denver, CO, USA, 6–9 October 2008. [Google Scholar]
- Rengasamy, D.; Mase, J.M.; Rothwell, B.; Figueredo, G.P. An Intelligent Toolkit for Benchmarking Data-Driven Aerospace Prognostics. In Proceedings of the IEEE 22nd Intelligent Transportation Systems Conference (ITSC 2019), Auckland, New Zealand, 27–30 October 2019. [Google Scholar]
- Ramasso, E. Investigating computational geometry for failure prognostics. Int. J. Prognost. Health Manag. 2014, 5, 1–18. [Google Scholar]
- LeCun, Y.; Boser, B.; Denker, J.; Henderson, D.; Howard, R.; Hubbard, W.; Jackel, L. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Jain, L.C.; Medsker, L.R. Recurrent Neural Networks: Design and Applications, 1st ed.; CRC Press, Inc.: Boca Raton, FL, USA, 1999. [Google Scholar]
- Cho, K.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR 2014, arXiv:1406.1078. [Google Scholar]
- Tamilselvan, P.; Wang, P. Failure diagnosis using deep belief learning based health state classification. Reliab. Eng. Syst. Saf. 2013, 115, 124–135. [Google Scholar] [CrossRef]
- Hinton, G.E.; Osindero, S.; Teh, Y.W. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
- Kohonen, T. The self-organizing map. Proc. IEEE 1990, 78, 1464–1480. [Google Scholar] [CrossRef]
- Zhang, C.; Zhang, P.; Qin, A.; Tan, C. Multiobjective Deep Belief Networks Ensemble for Remaining Useful Life Estimation in Prognostics. IEEE Trans. Neural Nets Learn. Syst. 2016, 28, 2306–2318. [Google Scholar] [CrossRef] [PubMed]
- Yuan, M.; Wu, Y.; Lin, L. Fault diagnosis and remaining useful life estimation of aero engine using LSTM neural network. In Proceedings of the 2016 IEEE International Conference on Aircraft Utility Systems (AUS), Beijing, China, 10–12 October 2016; pp. 135–140. [Google Scholar]
- Zheng, S.; Ristovski, K.; Farahat, A.; Gupta, C. Long Short-Term Memory Network for Remaining Useful Life estimation. In Proceedings of the 2017 IEEE International Conference on Prognostics and Health Management (ICPHM), Dallas, TX, USA, 19–21 June 2017; pp. 88–95. [Google Scholar]
- Ellefsen, A.L.; Bjørlykhaug, E.; Æsøy, V.; Ushakov, S.; Zhang, H. Remaining useful life predictions for turbofan engine degradation using semi-supervised deep architecture. Reliab. Eng. Syst. Saf. 2019, 183, 240–251. [Google Scholar] [CrossRef]
- Wang, J.; Wen, G.; Yang, S.; Liu, Y. Remaining Useful Life Estimation in Prognostics Using Deep Bidirectional LSTM Neural Network. In Proceedings of the 2018 Prognostics and System Health Management Conference (PHM-Chongqing), Chongqing, China, 26–28 October 2018; pp. 1037–1042. [Google Scholar]
- Babu, G.S.; Zhao, P.; Li, X. Deep Convolutional Neural Network Based Regression Approach for Estimation of Remaining Useful Life. In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA), Dallas, TX, USA, 16–19 April 2016. [Google Scholar]
- Li, X.; Ding, Q.; Sun, J. Remaining Useful Life Estimation in Prognostics Using Deep Convolution Neural Networks. Reliab. Eng. Syst. Saf. 2017, 172, 1–11. [Google Scholar] [CrossRef] [Green Version]
- Ruder, S. An overview of gradient descent optimization algorithms. CoRR 2016, arXiv:1609.04747. [Google Scholar]
- Peng, Y.; Wang, H.; Wang, J.; Liu, D.; Peng, X. A modified echo state network based remaining useful life estimation approach. In Proceedings of the IEEE International Conference on Prognostics and Health Management, Denver, CO, USA, 18–21 June 2012. [Google Scholar]
- Pin, L.; Goh, C.K.; Chen Tan, K. A time window neural network based framework for Remaining Useful Life estimation. In Proceedings of the International Joint Conference on Neural Networks, Vancouver, BC, Canada, 24–29 July 2016; pp. 1746–1753. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Beretta, L.; Santaniello, A. Nearest neighbor imputation algorithms: A critical evaluation. BMC Med. Inf. Decision Making 2016, 16. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Saito, T.; Rehmsmeier, M. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Symbol | Description | Unit |
---|---|---|
T2 | Total temperature at fan inlet | °R |
T24 | Total temperature at Low Pressure Compressor outlet | °R |
T30 | Total temperature at High Pressure Compressor (HPC) outlet | °R |
T50 | Total temperature at Low Pressure Turbine outlet | °R |
P2 | Pressure at fan inlet | psia |
P15 | Total pressure in bypass-duct | psia |
P30 | Total pressure at HPC outlet | psia |
Nf | Physical fan speed | rpm |
Nc | Physical core speed | rpm |
epr | Engine pressure ratio (P50/P2) | — |
Ps30 | Static pressure at HPC | psia |
phi | Ratio of fuel flow to Ps30 | pps/psi |
NRf | Corrected fan speed | rpm |
NRc | Corrected core speed | rpm |
BPR | Bypass Ratio | — |
farB | Burner fuel-air ratio | — |
htBleed | Bleed Enthalpy | — |
Nf_dmd | Demanded fan speed | rpm |
PCNfR_dmd | Demanded corrected fan speed | rpm |
W31 | High Pressure Turbine coolant bleed | lbm/s |
W32 | Low Pressure Turbine coolant bleed | lbm/s |
Deep Learning Architecture | Hyperparameters |
---|---|
Bi-LSTM | Number of layers: 2 |
Layer 1 units: 100 Layer 2 units: 50 | |
Activation function: Leaky ReLU | |
DNN | Number of layers: 6 |
Layer 1 units: 100 Layer 2 units: 500 Layer 3 units: 100 Layer 4 units: 250 Layer 5 units: 12 Layer 6 units: 6 | |
Activation function: ReLU | |
CNN1D | Number of layers: 2 |
Layer 1 units: 64 Layer 2 units: 64 | |
Activation function: ReLU | |
Filter size: 3 x Features | |
Bi-GRU | Number of layers: 2 |
Layer 1 units: 100 Layer 2 units: 50 | |
Activation function: Leaky ReLU |
Data | Number of Positive Instance | Number of Negative Instance | Percentage of Minority Class |
---|---|---|---|
Training | 1000 | 59,000 | 1.67% |
Testing | 375 | 16,000 | 2.34% |
Deep Learning Architecture | Hyperparameters |
---|---|
Bi-LSTM | Number of layers: 2 |
Layer 1 units: 32 Layer 2 units: 16 | |
Activation function: ReLU | |
DNN | Number of layers: 2 |
Layer 1 units: 64 Layer 2 units: 64 | |
Activation function: Sigmoid | |
CNN1D | Number of layers: 1 |
Layer 1 units: 30 | |
Activation function: ReLU | |
Filter size: 10 × 1 | |
Bi-GRU | Number of layers: 2 |
Layer 1 units: 32 Layer 2 units: 16 | |
Activation function: ReLU |
Deep Learning Architecture | Scoring Function | RMSE |
---|---|---|
Bidirectional LSTM | 178.568 | 20.1 |
Bidirectional LSTM + DW | 129.089 | 13.9 |
−27.7% | −30.6% | |
DNN | 93,473.3 | 23.1 |
DNN + DW | 13,741.3 | 23.9 |
−85.2% | +3.4% | |
CNN1D | 112.858 | 22.3 |
CNN1D + DW | 63.002 | 21.1 |
−44.1% | −5.7% | |
Bidirectional GRU | 169.550 | 11.6 |
Bidirectional GRU + DW | 81.899 | 12.9 |
−51.6% | +11.8% |
Deep Learning Architectures | Cost | False Negative Rate | False Omission Rate | Recall |
---|---|---|---|---|
Bidirectional LSTM | 22,565 | 0.101 | 0.00248 | 0.898 |
Bidirectional LSTM + FL | 15,160 | 0.045 | 0.00113 | 0.954 |
−32.8% | −55.4% | −54.4% | +6.2% | |
DNN | 31,505 | 0.156 | 0.00378 | 0.844 |
DNN + FL | 24,200 | 0.112 | 0.00273 | 0.888 |
−28.2% | −39.3% | −27.8% | +5.0% | |
CNN1D | 16,855 | 0.067 | 0.00164 | 0.933 |
CNN1D + FL | 12,580 | 0.012 | 0.00030 | 0.988 |
−25.4% | −82.1% | −81.7% | +5.9% | |
Bidirectional GRU | 35,480 | 0.177 | 0.00429 | 0.822 |
Bidirectional GRU + FL | 21,350 | 0.074 | 0.00187 | 0.925 |
−39.8% | −58.1% | −56.4% | +12.5% |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rengasamy, D.; Jafari, M.; Rothwell, B.; Chen, X.; Figueredo, G.P. Deep Learning with Dynamically Weighted Loss Function for Sensor-Based Prognostics and Health Management. Sensors 2020, 20, 723. https://doi.org/10.3390/s20030723
Rengasamy D, Jafari M, Rothwell B, Chen X, Figueredo GP. Deep Learning with Dynamically Weighted Loss Function for Sensor-Based Prognostics and Health Management. Sensors. 2020; 20(3):723. https://doi.org/10.3390/s20030723
Chicago/Turabian StyleRengasamy, Divish, Mina Jafari, Benjamin Rothwell, Xin Chen, and Grazziela P. Figueredo. 2020. "Deep Learning with Dynamically Weighted Loss Function for Sensor-Based Prognostics and Health Management" Sensors 20, no. 3: 723. https://doi.org/10.3390/s20030723
APA StyleRengasamy, D., Jafari, M., Rothwell, B., Chen, X., & Figueredo, G. P. (2020). Deep Learning with Dynamically Weighted Loss Function for Sensor-Based Prognostics and Health Management. Sensors, 20(3), 723. https://doi.org/10.3390/s20030723