Essential Steps in Prognostic Health Management
Essential Steps in Prognostic Health Management
Essential Steps in Prognostic Health Management
Sreerupa Das, Richard Hall, Stefan Herzog, Gregory Harrison, Michael Bodkin
Lockheed Martin, Global Training and Logistics
100 Global Innovation Circle, Orlando, FL 32825
Abstract—Prognostic health management (PHM) systems are remaining useful life of cutters in a high speed milling
designed to predict impending faults and to determine remaining machine.
useful life of machinery. An efficient prognostic system can speed
up fault diagnosis by providing an indication of what parts of the
II. GENERAL APPROACH
machinery or vehicle are most likely to fail and will need
maintenance in the near future. In this paper, we discuss the PHM analysis involves a variety of steps including
essential steps involved in building an effective PHM system. We collection of raw data from sensors, data characterization,
describe time and frequency domain features that can be digital signal processing, extraction of condition indicators,
extracted from raw sensor data. These features or condition and finally the intelligent processing engine for performing
indicators can help summarize the information in raw data and diagnosis and prognosis. Figure 1 delineates the essential
extract critical clues that reflect the health of the machinery. steps. These steps are described in detail below.
Analytical models can then be used to learn the essential health
indicators and how they relate to fault conditions. In addition,
we describe a case study of implementing a PHM system for a
high speed face milling CNC cutter. We describe features that
were analyzed from sensor data. For the analytical engine, we
used a Neural Network model for learning the association of the
extracted features and the magnitude of wear in the cutter. The
neural network was able to determine remaining useful life of
cutters in terms of number of remaining cuts for a given wear
limit based on extracted features.
I. INTRODUCTION
• Data validation: basic sanity check, handle Thus kurtosis is the fourth centralized moment of the signal,
missing data, handle abnormal data values. normalized by the square of the variance.
• Data normalization: scale data ranges between Apart from time domain features described above, features
[0..1], can be extracted from the frequency domain, order domain or
joint time-frequency domain. Several advanced signal
• Data Correlation
processing techniques have been explored in the literature.
Some of them are listed below:
C. Feature Extraction
Sensor data must be processed to extract features or 1) Band Stop/Pass filter – This technique is used to
condition indicators that reflect the health of the machinery attenuate/accentuate a known frequency range in a signal.
being monitored. Condition Indicators or features embody 2) Spectral Density, Power Spectral Density – The
key information that is obtained by processing the raw data. spectral density captures the frequency content of a signal and
Tracking relevant condition indicators over time gives us a helps identify periodicities in the signal. Fast fourier
good indication of fault progression in machinery. This helps transform (fft) forms the basis for analyzing a time domain
us to prepare for an impending fault. Some of the useful time signal in a frequency domain.
domain features or condition indicators are listed below:
3) Ceptrum Analysis – This technique is useful in
1) Mean – Average value of a time varying signal. detecting changes in sideband patterns and detecting periodic
2) Standard Deviation – Measures how much the data structure in spectrum. It is defined as the inverse Fourier
points are dispersed from the 'average'. Standard Deviation σ transform of a logarithmic spectrum of the regular Fourier
(sigma) is the square root of the average value of (X − μ)2. tranform of the time signal.
3) Root Mean Square -- The Root Mean Square value 4) Time-Frequency Analysis – Time Frequency Analysis
(RMS) for a vibration signal reflects the energy content of the for non-stationary signals are gaining popularity. Short Time
signal. It can be expressed as: Fourier Transform with a sliding window, Wigner-Ville
Distribution (autocorrelation of Fourier Transform with a
delay) and Gabor Transform are some of the commonly used
techniques for Time-Frequency analysis.
5) Time Synchronous Averaging & Order Tracking – Time
Synchronous Averaging (TSA) is performed by averaging
Where, together a series of signal segments each corresponding to one
srms is the root mean square value of dataset s, period of a synchronising signal. Before TSA can be
si is the i-th member of dataset s, performed, the signal must be order tracked to give integer
N is the number of points in dataset s. number of samples per revolution and a defined start point
Authorized licensed use limited to: PAKISTAN INST OF ENGINEERING AND APPLIED SCIENCES. Downloaded on March 14,2022 at 20:52:50 UTC from IEEE Xplore. Restrictions apply.
(with help of tachometer readings). Order tracking minimizes E. Perform PHM Analysis
errors introduced by fluctuations in the sampling frequency. One or more models could be generated to train on the task
6) Hilbert Transform – A real function f(t) and its Hilbert and provide prognostics. Since most learning systems start
transform h(t) together form an analytic signal. An analytic with an unbiased and usually random configuration using
signal is one that gives us a one-sided spectrum in the Monte Carlo methods (e.g., starting with random weights in a
frequency domain [1]. One applications of Hilbert transform neural network), the solutions they develop are unique, even
is that the magnitude of the analytic signal is the envelop of though they are trained on the same training data. Also, since
the original signal f(t). Also, since the magnitude of the FFT training data is usually limited, the way each model learns to
of the analytic signal doubled in the logarithmic scale enabling generalize from the limited data is distinct and could be
a large display range. valuable piece of information. Hence it is advantageous to
train a group of models on a specific task and evaluate their
7) Wavelet Analysis – Wavelets are effectively the impluse response on new data (real situation) to generate the final
response functions of a series of filters applied to the signal to outcome of the PHM system. Various techniques have been
extract features of the same order (scale) as the specific researched to pick the best model or make the best prediction
wavelet. By choosing wavelets similar to sought features in given the output of a group of models, trained on a limited set
the signal, wavelets can be used for compression and of data. Some of these techniques include:
extraction of salient features.
1) Ensemble Learning – Such methods use multiple
D. Building a model models to obtain better predictive performance than could be
obtained from any of the constituent models.
Once key condition indicators are extracted, the next step
is to build a model to interpret the information in the features 2) Random Forest – is an ensemble classifier that consists
and correlate them to the behavior of the machinery. Any of many decision trees and outputs the class that is the mode
such analytical model will have to make simplifying of the class's output by individual trees [2].
assumptions about reality. Nevertheless, such models are
important tools to summarize patterns from underlying data 3) Cross Validation – Cross-validation is a way to predict
and are used to make best possible predictions for situations. the performance of a model on a validation set using
computation in place of mathematical analysis. This
Machine Learning and Statistics provide numerous technique is often used to determine the best performing
algorithms that allow computers to evolve behaviors based on model in a group of models.
empirical sensor data. These algorithms take advantage of
examples (training data) to capture the unknown underlying 4) Voting – Given a class of learned models, voting or
probability distribution. A major focus of machine learning is majority response could be used to determine the response of
to automatically learn to recognize complex patterns and make the overall PHM system.
intelligent decisions based on data. However, the challenge
lies in the fact that the set of all possible observations (sensor III. CASE STUDY: PHM FOR MILLING MACHINE
values) and corresponding behaviors is too large to be covered In this case study we present the techniques used in
by the set of observed training data. Hence the model must implementing a PHM system for a high speed face CNC
generalize from the given examples, so as to be able to (Computer Numerical Control) milling cutter. The cutting
generate the best guess on new cases. Some of the commonly process involves discontinuous and varying loads on flutes of
used Machine/Statistical learning models approaches include: the cutter as they engage and disengage with the cutting
• Decision Tree learning surface and results in wear over time. Flute wear phenomenon
is complex and is a function of setup, type of cutter used and
• Association rule learning workpiece materials being processed. With flute wear
progression, more force or power is required to achieve the
• Neural Networks
same amount of cut, i.e., material removed from a workpiece.
• Genetic Programming In addition, as flutes wear, changes in sound emitted from a
cutting operation become distinct. The most undesirable effect
• Logic programming of flute wear is that it results in growing imperfections in the
• Support Vector Machines cutting surface finish which are often unacceptable, especially
while milling fine instruments. Degradation of the milled
• Clustering surface from worn cutters leads to rework or scrapping the
workpiece. Usually cutters have a distinctive wear pattern.
• Bayesian networks There is a break-in period with a steep wear rate in new
• Reinforcement learning cutters. Following the break-in period, wear significantly
slows down to a small uniform rate which is also called the
Other Statistical models such as Regression Model, steady-state wear region. Finally there is acceleration in the
Gaussian Mixture model, Hidden Markov Model also have the wear rate as it approaches its end of life. Although general
same underlying goal – that of generating the most likely trend of the wear of a cutter may be known, each cutter
outcome for a given observation (sensor values). behaves differently possibly due to imperfections in the
composition and geometry. Hence it is important to be able to
Authorized licensed use limited to: PAKISTAN INST OF ENGINEERING AND APPLIED SCIENCES. Downloaded on March 14,2022 at 20:52:50 UTC from IEEE Xplore. Restrictions apply.
estimate the remaining useful life of a cutter during its Column 1: Force (N) in X dimension
operation based on its current conditions. In order to Column 2: Force (N) in Y dimension
determine a cutter’s health condition, sensors can be placed on Column 3: Force (N) in Z dimension
the cutter to measure the vibration and force exerted by the Column 4: Vibration (g) in X dimension
cutter on the workpiece along the three dimensions. Also, Column 5: Vibration (g) in Y dimension
acoustic emission data can be utilized to reflect on the cutter’s Column 6: Vibration (g) in Z dimension
health. Column 7: AE-RMS (V)
In the rest of the paper, we discuss steps taken to predict the Each cut data file contains more than 200,000 records
life of the cutter. A variation of a back propagation Neural corresponding to duration of more than 4 seconds required to
Networks model was used for learning the association of the make one cut.
features with fault conditions. The neural network was used to
In addition, the wear pattern of cutters C1, C4 and C6 are
determine the remaining useful life of a cutter in terms of
provided. The wear data consisted of the wear on each of the
number of remaining cuts for a given wear limit based on
three flutes for the three cutters after each cut (in 10-3 mm) for
extracted features.
about 300 cuts.
IV. CASE STUDY: CNC MILLING CUTTER 2) Task and Evaluation Methodology
Tool wear phenomenon is complex in the varied setup and
A. Problem Definition materials processed. Workpiece surface finish is degraded
from worn cutters leading to rework or scrapping the
1) Background
workpiece. The task was to estimate the maximum number of
cuts one could "safely" make for an unspecified wear limit.
This implied that the maximum wear of any flute should not
exceed the wear limit (not the average wear across the flutes).
E.g., if the wear pattern of three flutes is as Figure 3, then
Figure 4 shows the maximum wear of all the three flutes.
Authorized licensed use limited to: PAKISTAN INST OF ENGINEERING AND APPLIED SCIENCES. Downloaded on March 14,2022 at 20:52:50 UTC from IEEE Xplore. Restrictions apply.
A. Noise Elimination
The task here is to make an estimate of the maximum safe As the cutter engages with and disengages from the work
cuts at integer values of wear over the range 66 to 165 (10- piece at the start and end of every cut, we noticed certain
3
mm) as shown in Figure 4. amount of noise or disparity compared to the rest of the
records while cutting was in progress. This noise was
apparent in the time domain data. In order to eliminate
variations in the end conditions, the first few and last few
records in each cut file were eliminated. Also since this
analysis was performed on face milling cutters and
approximately 315 cuts were required to mill a face, to avoid
disparity on the edges, the first few and the last few cut files
were disregarded.
Authorized licensed use limited to: PAKISTAN INST OF ENGINEERING AND APPLIED SCIENCES. Downloaded on March 14,2022 at 20:52:50 UTC from IEEE Xplore. Restrictions apply.
Figure 6. Chip deformation in cutting
Authorized licensed use limited to: PAKISTAN INST OF ENGINEERING AND APPLIED SCIENCES. Downloaded on March 14,2022 at 20:52:50 UTC from IEEE Xplore. Restrictions apply.
Hence,
Authorized licensed use limited to: PAKISTAN INST OF ENGINEERING AND APPLIED SCIENCES. Downloaded on March 14,2022 at 20:52:50 UTC from IEEE Xplore. Restrictions apply.
Figure 9. The final solution
CONCLUSION
We described the essential steps needed to implement a
PHM system in general. We delineated the steps required and
D. TRAINING METHODOLOGY enumerated possible approaches that can be taken at each step.
Furthermore, we discussed the application of the steps to a
All inputs presented to the network were normalized to specific task, that of predicting the remaining useful life of
values between -1 and +1. Each input set consisted of features CNC milling cutters. Using the above mentioned methodical
extracted at a particular cut from one cutter, where each feature approach we were able to generate the best result for the given
was represented by one input unit. There were about 300 input task (generated the nest solution as part of the 2010 PHM Data
patterns for one cutter. For output, the wear values of the three Challenge).
flutes were provided (in 10-3 mm). Only training cutters (C1,
C4 and C6) were used as their wear patterns were provided.
The values of η- and η+ were set to 0.5 and 1.5. And from ACKNOWLEDGMENT
experimentation, we concluded that about 2000 iterations were We would like to thank the Prognostic Health Management
sufficient to train the network. Society (www.phm.org) for providing the data for this analysis.
The data used in this paper was part of their data released for
One important aspect that we found helped learning was to the 2010 PHM Challenge. We would also like to thank
have some concept of time and notion of history during any Lockheed Martin for supporting this research.
given cut. We choose not to make the model any more
complex by using recurrent neural networks and hence used the
input set from the last cut (i.e., at cut c and c-1) to determine REFERENCES
the wear at cut c. This helped the learning process by [1] Hahn Stefan L., Hilbert transforms in signal processing, Artech House,
providing some sense of history. In addition, in order to instill Inc.,Boston, 1996.
a dependence on time (i.e., which cut is it right now), we used [2] Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–
the cut number also as part of the input pattern. 32.
[3] X. Li, B. S. Lim, J. H. Zhou, S. Huang, S. J. Phua, K. C. Shaw and M. J.
A batch of 100 neural networks was trained on all the Er (2009). Fuzzy Neural Network Modelling for Tool Wear Estimation
training. Due to scarcity of training data set, we were not able in Dry Milling Operation, Annual Conference of the Prognostics and
to get good results using cross validation. Evaluation function Health Management Society, San Diego, CA.
(from the challenge) was instead used to select the best model. [4] I. Kandilli, M. Sönmez, H. M. Ertunc and B. Çakır (2007). Online
Monitoring Of Tool Wear In Drilling and Milling By Multi-Sensor
The selected model was used to predict the wear pattern of the Neural Network Fusion, IEEE International Conference on
other three cutters (C2, C3 and C5). Finally, in order to Mechatronics and Automation.
determine wears at integer values over the range 66 and 165 (in [5] V. P. Astakov, S. Shvets (2004). The assessment of plastic deformation
10-3mm), the maximum wear curves were interpolated for the in metal cutting. Journal of Materials Processing Technology 146 .
integer values between 66 and 165. [6] P. Scanlon, A. Lyons and A. O’Loughlin (2007). Acoustic signal
processing for degradation analysis of rotating machinery to determine
Since the training data was limited, we had to depend on the remaining useful life, IEEE Workshop on Applications of Signal
the daily leaderboard’s evaluation to help us refine our Processing to Audio and Acoustics.
solution. We varied the combinations of selected feature sets [7] D. E. Rumelhart, G. E. Hinton, and R. J. Williams (1986). Learning
and concluded that the standard deviation and total power at representations by back-propagating error. Nature, pp. 533–536.
harmonics of the tooth pass frequencies were the best [8] M. Riedmiller and H. Braun (1993). A direct adaptive method for faster
indicators. Other features that helped derive the final solution backpropagation learning: The RPROP algorithm. Proc. IEEE
International Conference On Neural Network, pp. 586-591.
included rms of wavelet decomposed frequency component d5
and kurtosis. The final solution is shown in Figure 9.
Authorized licensed use limited to: PAKISTAN INST OF ENGINEERING AND APPLIED SCIENCES. Downloaded on March 14,2022 at 20:52:50 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: PAKISTAN INST OF ENGINEERING AND APPLIED SCIENCES. Downloaded on March 14,2022 at 20:52:50 UTC from IEEE Xplore. Restrictions apply.