Nothing Special   »   [go: up one dir, main page]

IEEE-A Machine Learning-Based Framework For Predictive Maintenance of Semiconductor Laser For Optical Communication

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

4698 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 40, NO.

14, JULY 15, 2022

A Machine Learning-Based Framework for


Predictive Maintenance of Semiconductor Laser for
Optical Communication
Khouloud Abdelli , Helmut Grießer, Member, IEEE, and Stephan Pachnicke , Senior Member, IEEE

Abstract—Semiconductor lasers, one of the key components for Since its invention in 1962, the performance and the productivity
optical communication systems, have been rapidly evolving to meet of semiconductor lasers have been extensively improved to meet
the requirements of next generation optical networks with respect the demands of next generation high speed optical networks
to high speed, low power consumption, small form factor etc.
However, these demands have brought severe challenges to the in terms of linewidth, power consumption, cost etc. [1], [2].
semiconductor laser reliability. Therefore, a great deal of attention However, the performance of the laser during operation can
has been devoted to improving it and thereby ensuring reliable be adversely affected by several intrinsic and external factors
transmission. In this paper, a predictive maintenance framework such as contamination [3], [4], facet oxidation [5], threading
using machine learning techniques is proposed for real-time heath
dislocations in the substrate [6], crystal defects, a high ambient
monitoring and prognosis of semiconductor laser and thus enhanc-
ing its reliability. The proposed approach is composed of three temperature [3], etc. Many of these factors are hard to predict but
stages: i) real-time performance degradation prediction, ii) degra- induce laser degradation and failure, and thereby result in optical
dation detection, and iii) remaining useful life (RUL) prediction. network disruption and high maintenance costs. Moreover, the
First of all, an attention based gated recurrent unit (GRU) model lifetime of the laser device is prone to a wear-out failure mode
is adopted for real-time prediction of performance degradation.
(i.e., gradual degradation) defined as the usual failure mode
Then, a convolutional autoencoder is used to detect the degradation
or abnormal behavior of a laser, given the predicted degradation of a device operating over its service [7]. The complexity of
performance values. Once an abnormal state is detected, a RUL the laser structure, and the diversity of the factors inducing
prediction model based on attention-based deep learning is utilized. the degradation make the reliability assessment a challenging
Afterwards, the estimated RUL is input for decision making and issue [3]. Therefore, a great deal of research has been devoted
maintenance planning. The proposed framework is validated using
to improving the laser reliability.
experimental data derived from accelerated aging tests conducted
for semiconductor tunable lasers. The proposed approach achieves The qualification of laser reliability is typically performed
a very good degradation performance prediction capability with a with laboratory data, obtained from accelerated life tests
small root mean square error (RMSE) of 0.01, a good anomaly de- conducted under high stress conditions such as high
tection accuracy of 94.24% and a better RUL estimation capability temperatures or high drive current. This speeds up the
compared to the existing ML-based laser RUL prediction models. degradation and thereby shortens the time to failure of the
Index Terms—Anomaly detection, machine learning, predictive device, otherwise the time required to collect field lifetime
maintenance, remaining useful prediction, semiconductor laser. data from operational devices can be years [8]. Conventionally,
the laser lifetime is estimated by extrapolating a mathematical
I. INTRODUCTION fit of the laser current or output power over time. However,
such a reliability extrapolation is inaccurate and can result in
EMICONDUCTOR lasers have been widely used as optical
S communication light sources for high speed data transmis-
sion due to their high efficiency, low cost, and compactness.
considerable overestimation or underestimation of the actual
lifetime of the laser. The laser is considered degraded if the
value crosses the threshold, which is determined based on
the laser design and specifications. However, the threshold
Manuscript received January 7, 2022; revised March 11, 2022; accepted approach is imprecise and leads to a high false alarm rate.
March 27, 2022. Date of publication April 12, 2022; date of current version
July 16, 2022. This work was supported in part by the CELTIC-NEXT through Recently, machine learning (ML) concepts achieving higher
project AI-NET-PROTECT under Project C2019/3-4 and in part by the German accuracy and prediction capability have been proposed to
Federal Ministry of Education and Research under Grant FKZ16KIS1279K. improve the laser reliability estimation. Abdelli et al. [9], [10]
(Corresponding author: Khouloud Abdelli.)
Khouloud Abdelli is with ADVA Optical Networking SE, 82152 Mu- proposed a federated learning approach for semiconductor laser
nich/Martinsried, Germany, and also with Kiel University (CAU), Chair of lifetime prediction, and developed an artificial neural network
Communications, 24143 Kiel, Germany (e-mail: kabdelli@advaoptical.com). model for laser mean time to failure (MTTF) prediction given
Helmut Grießer is with ADVA Optical Networking SE, 82152 Mu-
nich/Martinsried, Germany (e-mail: hgriesser@adva.com). the laser characteristics. However, the degradation trend over
Stephan Pachnicke is with the Kiel University (CAU), Chair of Communica- time, which impacts the estimation of MTTF, is not taken into
tions, 24143 Kiel, Germany (e-mail: stephan.pachnicke@tf.uni-kiel.de). consideration as features for the ML model. We also presented a
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/JLT.2022.3163579. long short-term memory (LSTM) model for laser failure modes,
Digital Object Identifier 10.1109/JLT.2022.3163579 trained with synthetic data modelling the different laser
0733-8724 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:10:16 UTC from IEEE Xplore. Restrictions apply.
ABDELLI et al.: MACHINE LEARNING-BASED FRAMEWORK FOR PREDICTIVE MAINTENANCE 4699

degradation types [11], and we proposed a hybrid


prognostic model based on convolutional neural networks
(CNN) and LSTM for laser remaining useful life prediction
(RUL), trained with experimental data [12]. However, due to
the limited amount of the data used to train the RUL prediction
model, the performance of the model was not good.
In this paper, an ML-based framework for predictive mainte-
nance of a semiconductor laser is proposed for monitoring and
performing diagnosis and prognosis of the laser device during
operation once deployed in an optical network. The proposed
framework is composed of three phases: real-time monitoring, Fig. 1. Structure of the gated recurrent unit (GRU) cell.
degradation detection and RUL prediction. The proposed ap-
proach is trained with synthetic laser reliability data produced
by a generative adversarial neural network (GAN) model and
validated using experimental data of tunable lasers. Our main
contributions can be summarized as follows:
r A predictive maintenance framework using different ML
techniques to enhance the reliability of the semiconductor
laser during operation, and thereby maximizing the over-
all equipment effectiveness and minimizing the costs and
effectively scheduling the maintenance activities
r An ML model for real-time prediction of the performance
degradation, adopting the combination of a gated recurrent
unit (GRU) and an attention mechanism.
r A convolutional autoencoder model for laser degradation
or abnormal behavior.
r An attention-based deep learning model by making full use Fig. 2. Structure of a standard autoencoder: the training objective is to mini-
of the combination of the statistical features characterizing mize the reconstruction error between the output x̂ and the input x.
the degradation trend and the temporal sequential features.
r The proposed framework is validated using experimental
data of a tunable laser, and the results demonstrate the ef- out the memory. The GRU cell is updated at each time step t by
fectiveness of the framework by achieving high prediction applying the following equations:
capability and detection accuracy.
r A GAN model for generating realistic laser reliability data. z t = σ (W z xt + W z ht−1 + bz ) (1)
The rest of this paper is structured as follows: Section II r t = σ (W r xt + W r ht−1 + br ) (2)
gives some background information about GRU, autoencoder,
attention mechanism and GAN. Section III presents the proposed 
ht = tanh( W h xt + W h (r t ◦ ht−1 ) + bh ) (3)
framework as well as the different ML models involved in the
development of the framework. Section IV describes the experi- ht = z t ◦ ht−1 + (1 − z t ) ◦ 
ht (4)
mental data, the data generation using GAN and the validation of
where z denotes the update gate, r represents the reset gate, x
the presented framework. Conclusions are drawn in Section V.
is the input vector, h is the output vector, W and b represent
the weight matrix and the bias vector, respectively. σ(·) is the
II. BACKGROUND gate activation function, tanh(·) represents the output activation
In this section, we briefly describe the theoretical concepts function. “◦” represents the element-wise product operator.
about the machine learning models involved in the development Equation (1) represents the update gate, (2) the reset gate, (3)
and the validation of the proposed framework. computes a candidate state for the current time step using the
parts of the previous hidden state, and (4) shows how the output
A. Gated Recurrent Unit (GRU) ht is calculated.
The GRU, recently proposed by Cho et al. in 2014 to tackle
the gradient vanishing problem [13], is an improved version B. Autoencoder
of standard recurrent neural networks (RNNs), used to process An autoencoder (AE) is a specific type of artificial neural
sequential data and to capture long-term dependencies. The network seeking to learn a compressed representation of an input
typical structure of a GRU, shown in Fig. 1, contains two gates, a in an unsupervised manner [14]. An AE is composed of two
reset and an update gate, controlling the flow of the information. sub-models, namely the encoder and the decoder. Fig. 2 shows a
The update gate regulates the information that flows into the standard architecture of the AE. The encoder is used to compress
memory, while the reset gate controls the information flowing an input x into a lower-dimensional encoding (i.e., latent-space

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:10:16 UTC from IEEE Xplore. Restrictions apply.
4700 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 40, NO. 14, JULY 15, 2022

representation) z through a non-linear transformation, which is


expressed as follows:
z = f ( W x + b) , (5)
where W and b denote the weight matrix and bias vector of the
encoder and f represents the activation function of the encoder.
The decoder reconstructs the output x̂ given the representation
z via a nonlinear transformation, which is formulated as follows:
x̂ = g ( W  z + b ) , (6)
 
where W and b represent the weight matrix and the bias vector
of the decoder and g denotes the activation function of the
decoder.
Fig. 3. Flow chart of the proposed framework.
The AE is trained by minimizing the reconstruction error
between the output x̂ and the input x, which is the loss function
L(θ), typically the mean square error (MSE), defined as:
 D. Generative Adversarial Networks (GANs)
L (θ) = x − x̂2 (7) GANs [17] are a type of generative models able to create
 new content such as an image, text, or audio. The GAN model
where θ = {W , b, W  , b } denotes the set of the parameters architecture consists of two sub-models, namely the generator
to be optimized. G and the discriminator D, which are trained and optimized
simultaneously while competing with each other. G is trained
C. Attention Mechanism to produce realistic samples from a random noise input z, to
Inspired by the human brain that focuses on the distinctive fool D, which is trained to distinguish the real samples x from
parts rather than processing the entire data, attention mech- the fake ones made by G. The objective function of GAN model
anisms have been developed to give more ‘attention’ to the (LGAN ), whereby G tries to minimize it and D tries to maximize
relevant parts of the input while ignoring the others, in order to it, can be formulated as follows:
boost the performance of the deep learning models. The attention 
LGAN = min max Ex∼pdata (x) [log (D (x))]
mechanism has been applied successfully in many tasks such as G D

speech recognition [15] or machine translation [16]. +Ez∼pz (z) [log(1 − D(G (z) ))] (12)
Let H = { h1 , h2 , . . . hk } denote the extracted features
by a neural network model (i.e., the outputs of the model). The III. PROPOSED FRAMEWORK
attention mechanism takes H as input and computes an attention
score (i.e., weight) αi for each hi to decide which features Fig. 3 illustrates the proposed predictive maintenance frame-
should have more attention. αi is calculated as follows: work for a semiconductor laser. After the deployment of the
laser device in an optical network, the current of the laser (i.e.,
ei = tanh(W h hi ) (8) degradation parameter) is monitored periodically under a con-
  stant output power. The collected laser current measurements are
αi = softmax wT ei (9) then stored in a database. Thereafter, the last k monitored current
where W h , w denote weight matrices. The softmax, a mathe- measurements {It−k …It }, are extracted, preprocessed, and fed
matical function that converts a vector of numbers into a vector to an ML model for real-time prediction of the performance
to one, is used to normalize αi and to
of probabilities, that sum degradation trend. The ML model predicts the next value of
ensure that αi ≥ 0, and i αi = 1. The softmax function can the laser current It+1 , which is saved in the database. Finally,
be expressed as follows: the sequence of current measurements {It−k …It+1 } is given
to an anomaly detection model to identify any degradation or
exp (z i ) abnormal behavior. If a degradation is detected, a notification
softmax (z)i = k f or i = 1 . . . k (10)
j =1 exp (z j ) is sent to the maintenance planning unit for root cause analysis
and an RUL prediction model is triggered to estimate the RUL
where z = (z1 . . . zk ) ∈ Rk denotes the input vector and zi of the device, which is used then to schedule the maintenance
represents an element of the input z. activities.
The different computed weights αi are aggregated to obtain a In the following subsections, the architecture of each model
weighted feature vector (i.e., attention context vector) c, which involved in the framework is described.
captures the relevant information to improve the performance of
the neural network model. c is computed as follows: A. ML Based Performance Degradation Prediction Model

c = αi hi (11) The proposed ML model for real-time prediction of per-
i formance degradation (i.e., increase of laser current) adopts a

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:10:16 UTC from IEEE Xplore. Restrictions apply.
ABDELLI et al.: MACHINE LEARNING-BASED FRAMEWORK FOR PREDICTIVE MAINTENANCE 4701

Fig. 5. Structure of the proposed convolutional autoencoder for laser anomaly


detection.
Fig. 4. Architecture of the proposed attention based GRU model for perfor-
mance degradation prediction.

combination of the GRU and the attention mechanism. The


GRU is used to perform a one-step prediction, whereas the
attention mechanism helps the model to concentrate more on
the relevant features in order to improve the prediction accuracy
and to boost the robustness of the model. As shown in Fig. 4,
the attention based GRU model takes as input a sequence of
length 9 of historical current measurements [ It−8 , It−7 , …It ]
and predicts the next current measurement It+1 . The structure
of the proposed model is composed of two GRU layers with
64 and 32 cells, respectively, followed by an attention layer,
succeeded by a fully connected layer with no activation function
and output It+1 . The GRU layers process the sequential input
to capture the temporal dependency modelling the degradation
trend, and outputs the hidden states [ ht−8 , ht−7 , …ht ] (i.e., the
learned or extracted features). Then, the attention layer assigns Fig. 6. Flow chart of the process of instance classification as anomalous or
to each extracted feature hi a weight (i.e., attention score) αi normal.
in order to compute a context vector ct capturing the relevant
information. Afterwards, the weighted attention features are
fed to a fully connected layer with output It+1 . The model layers containing 32, 16, 32 filters (i.e., kernels) of size 3×1 with
is optimized by minimizing the MSE (i.e., the cost function) a stride (i.e., the step of the convolution operation) of 2, 1, 1,
between the predicted current value and the true one, by adopting respectively. Then the decoder attempts to reconstruct the input,
the Adaptive moment estimation (Adam) optimizer. given the compressed representation output of the encoder. The
decoder is inversely symmetric to the encoder part. It consists of
4 transposed convolutional layers used to up-sample the feature
B. ML Based Anomaly Detection Model maps. The last transposed convolutional layer with one filter
The proposed model for laser anomaly detection is based on of size 3×1 and a stride of 1 is used to generate the output.
a convolutional autoencoder. Please note that the autoencoder The Rectified Linear Unit (ReLU) is selected as an activation
is selected as an ML model for the anomaly detection as it is function for the hidden layers of the model. The cost function is
capable of detecting rare or unseen abnormal behavior such as the MSE, which is adjusted by using the Adam optimizer.
sudden degradation without requiring, neither for the training Note that the model is trained with normal data modelling
nor the learning phase, the need to get faulty data representing the normal state of the laser device in order to learn the dis-
all types of faults or anomalies that are accurately labeled, tribution characterizing the normal behavior. Once the model
which can be prohibitively expensive and cumbersome to obtain. is trained, the classification of an instance or observation as
Furthermore, the autoencoder works well for the case of highly anomalous/normal is performed by following the process illus-
unbalanced data (the number of normal samples are higher than trated in Fig. 6. First, an anomaly score quantifying the distance
the abnormal or faulty samples), which is predominant due to or the error between the input I and the reconstructed input Iˆ
the scarcity of the failures during the system operation of the (the output) is computed. In this study, the mean absolute error
semiconductor lasers. The architecture of the proposed model (MAE) is selected as an anomaly score. Then, if the calculated
is illustrated in Fig. 5. The model contains an encoder and a anomaly score is higher than a set threshold θ, the instance is
decoder sub-model with 7 layers. The encoder takes as input a classified as “anomalous”, else it is assigned as “normal”. θ is a
current measurement sequence of length 10. It encodes the input hyperparameter optimized based on the number of true and false
into low dimensional features through a series of 3 convolutional positives.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:10:16 UTC from IEEE Xplore. Restrictions apply.
4702 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 40, NO. 14, JULY 15, 2022

Fig. 7. Structure of the proposed model for laser RUL prediction.


Fig. 8. Methodology for the validation of the proposed framework.

C. ML Based RUL Prediction


Fig. 7 illustrates the proposed attention-based deep learn-
ing model for RUL prediction. The proposed approach makes
full use of the fusion of the sequential and temporal features
learned by the GRU-attention based layers and the statistical
features characterizing the degradation trend, namely the root
mean square (RMS), kurtosis (β) and skewness (δ), in order to
improve the prediction accuracy. The sequential input [ It−8 ,
It−7 , …It+1 ] is fed to 2 GRU layers composed of 64 and 32
cells to learn the representative sequential features, whereas the
statistical features are given to a fully connected layer containing
32 neurons. Afterwards, the learned features are transferred to
the attention layer to identify the most important features, which
are then merged with the features output of the fully connected
layer. The fused features are then fed to a fully connected
layer with 32 neurons, followed by a dropout layer to avoid Fig. 9. Recorded current measurements of different laser devices conducted
overfitting, that finally outputs the RUL. The whole network is at 90 °C.
simultaneously trained by minimizing the cost function (MSE)
with an Adam optimizer.
failure. The current is monitored periodically under constant
IV. VALIDATION OF THE PROPOSED FRAMEWORK output power at time: 2, 20, 40, 60, 80, 100, 150, 500, 1000, 1500,
A. Methodology 2000, and 3000 h. The time to failure of the device tf is defined
as the time at which the current has increased beyond 20% of its
To validate the proposed framework, the methodology shown
initial value. Fig. 9 shows the recorded current measurements
in Fig. 8 is adopted. Firstly, accelerated aging tests under high
of the tested devices. It can be observed that some devices
temperature, to induce the laser degradation and thereby to
failed before the end of the aging test, and that few of the lasers
accelerate the failure of the device, are conducted, whereby the
exhibited an abnormal behavior.
laser current is monitored periodically under constant optical
In total, the dataset comprised 384 samples incorporating
output power. Then, the collected current measurement data is
the sequences of monitored current measurements of the tested
segmented and normalized. Afterwards, the preprocessed data is
devices (i.e., 384 semiconductor lasers). We assign to each
fed to a GAN model to train it to synthetically generate realistic
sample the RUL computed as the difference between tf and
data that resembles the real data. Once the GAN model is trained,
the time t at which the RUL is predicted and the state of the
the generator is employed to produce synthetic data which is
device (normal or anomalous/degraded).
then used to train the ML model. Afterwards, the trained model
For the training of the GAN model, we consider just the sam-
is tested with the real data to evaluate the performance of the
ples of the normal devices as the ML based anomaly detection
model.
model is trained just with normal data. Then, the first 10 current
values from each considered sample are extracted, whereas the
B. Experimental Data remaining current values are kept for testing the short-term and
Accelerated aging tests are performed for different tunable the long-term prediction capability of the ML based performance
laser devices operating at high temperature of 90 °C to strongly prediction model. In total, a dataset of 278 samples is used for
increase the laser degradation and thereby speed up the device GAN model training.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:10:16 UTC from IEEE Xplore. Restrictions apply.
ABDELLI et al.: MACHINE LEARNING-BASED FRAMEWORK FOR PREDICTIVE MAINTENANCE 4703

Fig. 10. Synthetic data generation using a generative adversarial network


(GAN) model.
Fig. 11. Assessment of the synthetic data using the metrics PRD, FD and
RMSE.
C. Synthetic Data
The process of the synthetic laser reliability data generation
using GAN is illustrated in Fig. 10. The GAN approach is trained
with the real data. The generator attempts to produce realistic
data from a random noise input, whereas the discriminator is
trained to distinguish the fake data generated by the generator
and the real data. The generator and the discriminator are updated
simultaneously. The training process continues till the gener-
ator can generate data samples that the discriminator cannot
differentiate from real data. The architecture of the generator
is composed of one LSTM layer with 8 cells, followed by
4 convolutional layers containing 32, 16, 16, 1 filters of size
3×1. The discriminator contains 3 convolutional layers with 32
filters of size 3×1. Leaky ReLU, an improved version of ReLU
function having a small slope for negative values instead of a
flat slope, is set as an activation function for the hidden layers
of the generator and the discriminator. The cost function of the Fig. 12. t-SNE visualization of the synthetic and real data distributions.
GAN model is the binary cross entropy, whereby the generator
tries to minimize it and the discriminator tries to maximize it. of couple of points {(ua1 , vb1 ), … (uan , vbn )} is computed as:
The optimizer is Adam with a learning rate of 0.001.
Once the training of the GAN model has ended, the generator d= max d (uai , vbi ) , (15)
i = 1,..n
is used to generate synthetic data. To assess the quality of
where d is the Euclidean distance.
the synthetic data, the evaluation metrics percent root mean
FD is calculated as:
square difference (PRD), root mean square error (RMSE) and
the Fréchet distance (FD) are adopted. PRD is used to evaluate F D (R, S) = min {  d } (16)
the difference between the real data and the generated data, and
it can be formulated as: Please note that a good or optimum synthetic data generation
 method should have very low or ideally close to zero P RD,
i=N 2
i = 1 (xi − x̂i ) RM SE, and F D metrics.
P RD = i = N 2 × 100, (13)
Fig. 11 shows that the different evaluation metrics are very
i = 1 xi
small, which demonstrates that the synthetic data is very close
where xi denotes the value of the sampling point i of the real
and similar to the real data.
sequence, x̂i represents the value of the sampling point i of the
To qualitatively assess how close the distribution of the syn-
generated sequence, and N is the length of the sequence.
thetic data is to the real data’s distribution, the t-distributed
RMSE quantifies the stability between the original data and
stochastic neighbor embedding (t-SNE) [18], a technique for
the synthetic data, and it is expressed as:
visualizing a high dimensional data into two-dimensional space
i = N 2 (tSNE1 and tSNE2), is used. Fig. 12 illustrates that the distribu-
i = 1 (xi − x̂i ) tion of the synthetic data resembles that of the original experi-
RM SE = (14)
N mental data, which proves the effectiveness of the generator in
FD measures the similarity between the real data and the producing realistic data.
generated data curves. Let OR = (u1 , u2 . . . uR ) be the or- Fig. 13 shows the histograms of the normalized current mag-
der of points along the segmented real curves, and OS = nitudes of the synthetic and original data after 80 h. It can be
(v1 , v2 . . . vS ) be the order of points along the segmented seen that both histograms are close and that the synthetic current
synthetic curves. The length  d  of the sequence consisting values are within the range of the real currents.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:10:16 UTC from IEEE Xplore. Restrictions apply.
4704 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 40, NO. 14, JULY 15, 2022

Fig. 14. Results of the comparison of the proposed method with MLP, CNN
and RNN in terms of MAPE and CVRMSE.

Fig. 13. Normalized current histograms of the real and synthetic data after TABLE I
80 h. COMPUTIONAL TIME OF PROPOSED MODEL AND OTHER METHODS

In total, a synthetic dataset of 5600 samples is generated.


The RUL and the state of the device are computed for each
sample based on the defined failure criteria. The said data is
then normalized and fed to the ML models for training.

D. Validation Results of the Performance Prediction Model


The proposed model is compared to other ML techniques,
namely Multilayer perceptron (MLP), CNN, and RNN by adopt-
ing as evaluation metrics the mean absolute percentage error
(MAPE) and the coefficient of variation of the root mean squared
error (CVRMSE), which are formulated as follows:
i=N

100 

xi − xi

M AP E = (17)
N i = 1
xi


N  2
100 i = 1 (xi − xi )
CV RM SE = (18)
x̄ N
where xi and xi denote the predicted and the true current values
respectively. N represents the number of test samples. x̄ is the
average of the true current values. It is to be noted that a lower
value of MAPE and CVRMSE indicates a better prediction
capability.
The different ML models are trained with the synthetic dataset
and tested with the real experimental dataset. The results of
the comparison illustrated in Fig. 14 show that the proposed
model achieves the smallest values of MAPE and CVRMSE,
which proves that the proposed method yields better prediction
performance.
The comparison of the results of computational inference time
between the proposed model and the other methods are shown
in Table I. As it can be seen, the proposed ML model consumes
slightly more time than MLP and CNN models due to its deeper
architecture, however, it executes faster than the RNN method.
We evaluate the short-term and long-term prediction capabil-
ity of the proposed model. Note that the model is trained with
current measurements till 1000 h, and that it is tested to forecast
Fig. 15. Histograms of prediction errors: (a) for short-term prediction (one
the current values at 1500 h, 2000 h and 3000 h. As shown in step forecasting), and (b) long-term prediction (multi-step forecasting).
Fig. 15, the ML model accurately predicts the next value of the

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:10:16 UTC from IEEE Xplore. Restrictions apply.
ABDELLI et al.: MACHINE LEARNING-BASED FRAMEWORK FOR PREDICTIVE MAINTENANCE 4705

Fig. 17. Architecture of the two-step ML model for performance degradation


prediction.

Fig. 16. Results of laser current prediction for two random test samples:
(a) short-term prediction of the current at 1500 h, (b) long-term prediction of
Fig. 18. Impact of the input sequence length on the performance of the ML
the current up to 3000 h.
model for performance degradation prediction.

current measurement (one step prediction) with small prediction CVRMSE decreases with reducing the input sequence length.
errors with a mean of 0.004 and a standard deviation of 0.027, Reducing the input sequence length leads to a loss of the infor-
and forecasts the next three values of the current measurements mation representing the degradation trend, which impacts the
for the next time frames by achieving low prediction errors with capability of the ML model in capturing the relevant features
a mean of -0.029 and a standard deviation of 0.032. for accurate prediction.
Fig. 16 shows the predicted values of two random samples. It
can be observed that the forecasted values are close to the actual E. Validation Results of the ML Model for Anomaly Detection
values and that they are following the same degradation trend
as the actual values, which demonstrates the effectiveness of the The ML model for anomaly detection is trained with synthetic
proposed model in predicting the current measurements. data, modelling the normal behavior of laser devices. After the
Instead of adopting the one-step prediction ML model mul- training, the model is tested with experimental data of normal
tiple times for performing long-term prediction, whereby the and anomalous lasers. To assess the anomaly detection capabil-
ity, the following metrics are adopted:
prediction for the prior time step is used as an input for making r Precision (P) quantifies the relevance of the predictions
a prediction on the following time step, we investigated the ca-
pability of the ML model in performing a multi-step forecasting made by the ML model. It is expressed as:
by predicting the entire forecast sequence in a one-shot manner. TP
P = , (19)
Fig. 17 shows the adjusted architecture of the ML model for TP + FP
performing two step prediction. The two step ML model is where T P denotes the number of “anomalous” sequences cor-
trained as well with the synthetic dataset and tested with the rectly classified, and F P represents the number of “normal”
experimental dataset. The test results show that the two-step sequences misclassified as “anomalous”.
model achieves higher values of CVRMSE (4.8%) and MAPE r Recall (R) provides the total relevant results correctly
(5.8%) compared to the performance yielded by the one step classified by the ML model. It is formulated as:
model, which proves that it is less accurate.
We investigated the impact of the input sequence length on TP
R = , (20)
the performance of the one step ML model. We trained the ML TP + FN
model with sequences of length 5, 6, 7, and 8, respectively. where F N denotes Number of “anomalous” sequences misclas-
Fig. 18 shows that the ML model’s performance in terms of sified as “normal”.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:10:16 UTC from IEEE Xplore. Restrictions apply.
4706 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 40, NO. 14, JULY 15, 2022

Fig. 19. The optimal threshold selection based on the precision, recall and F1
score scores yielded by the autoencoder.
Fig. 20. Influence of the sequence length on the performance of the ML model
for anomaly detection in terms of F1 score.

r F1 score is the harmonic mean of the precision and recall,


calculated as:

P.R
F 1 = 2. (21)
P + R

r The accuracy (A) can be defined as the total number of


correctly classified instances divided by the total number
of test instances. It is calculated as follows:

TP + TN
A = , (22)
TP + TN + FP + FN

where T N denotes the number of “normal” sequences correctly


classified.
The detection capability of the model is optimized by select-
Fig. 21. Results of the comparison of the proposed model (GRU+ attention+
ing the optimal threshold θ. Fig. 19 illustrates the precision, statistic features) with GRU model, attention based GRU method, and GRU +
recall and F1 score (i.e., the harmonic mean of the precision statistic features model using the RMSE and the MAE metrics.
and recall) curves along with θ. It can be seen that there is a
tradeoff between precision and recall. If the chosen threshold is
F. Validation Results of the ML Model for RUL Prediction
higher than 0.025, many normal laser devices will be classified
as anomalous devices, resulting in higher false negative and low After training the proposed model for RUL prediction us-
recall scores. Whereas if the selected threshold is less than 0.015, ing synthetic data is tested with experimental data by adopt-
many abnormal devices will be classified as normal, leading to ing the RMSE and the MAE evaluation metrics. We eval-
higher false positive and a low precision score. Therefore, the uate the prediction capability of the proposed method with
optimal threshold, that provides the best precision and recall the GRU model without attention mechanism and the atten-
tradeoff (i.e., maximizing the F1 score) is selected. For the tion based GRU method. Fig. 21 shows that the proposed
optimal chosen threshold of 0.019, the precision, the recall, the model achieves the lowest scores of RMSE and MAE, which
F1 score and the accuracy are 96.72%, 92%, 94% and 94.24%, proves adding the statistical features and the attention mecha-
respectively. nism helps to enhance the RUL estimation capability. Adding
The influence of reducing the input sequence length on the the attention mechanism boosts the performance by achieving
performance of the ML model for anomaly detection is inves- 10.3% and 8% improvements in RMSE and MAE metrics
tigated. Fig. 20 illustrates that decreasing the input sequence respectively. Whereas including both statistical features and
length leads to a reduced F1 score of the autoencoder model. attention mechanism enhances further the prediction capability
Reducing the sequence length too much (lower than 7) can cause by yielding 32.6% and 18.5% improvements in RMSE and
the loss of the information underlying the normal behavior trend MAE respectively.
leading to underfitting and thereby worsening the performance. The proposed model is compared with other ML techniques,
It is to be noted that the autoencoder model consumes 0.34s namely random forest (RF), support vector regression (SVR),
for performing the predictions given 247 test samples. MLP, RNN, LSTM and CNN, using as evaluation metrics the

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:10:16 UTC from IEEE Xplore. Restrictions apply.
ABDELLI et al.: MACHINE LEARNING-BASED FRAMEWORK FOR PREDICTIVE MAINTENANCE 4707

TABLE II
RESULTS OF THE COMPARISON OF THE PROPOSED METHOD WITH OTHER ML
TECHNIQUES USING RMSE, MAE, AND COMPUTATIONAL TIME

Fig. 23. Influence of the sequence length on the performance of the ML model
for RUL prediction.

V. CONCLUSION
An ML-based predictive maintenance framework for semi-
conductor lasers is proposed for real-time monitoring and prog-
nosis of the laser device during operation. The proposed ap-
proach contains three main steps: real-time performance degra-
dation prediction, degradation detection and RUL prediction.
An attention based GRU model is used to predict the laser
Fig. 22. Results of Predicted RULs by the proposed model vs. actual RULs.
performance degradation (i.e., the laser current increase). The
convolutional autoencoder is adopted to detect any degradation
or abnormal behavior of the laser. An attention-based deep
learning model is used to estimate the RUL of the laser. The
different models are trained with synthetic data generated by the
RMSE and the MAE. The results shown in Table II demonstrate
GAN model, and tested with experimental data of tunable lasers.
that the proposed method outperforms the other ML models
The results demonstrate that the attention-based GRU model
by achieving the smallest scores of RMSE and MAE. The
achieves a good degradation performance prediction (RMSE
comparison results of the computational time (inference time)
of 0.01), the convolutional autoencoder yields a high detection
of the proposed model and other ML methods in Table II
accuracy of 94.2%, and the attention-based deep learning model
show that the proposed method consumes more time in testing
achieves a good RUL estimation (RMSE of 142 hours), which
(inference) due to its deep architecture, and that the shallow
demonstrates the effectiveness of the proposed framework. The
ML techniques RF and SVR are much less time consuming in
results show also that adding statistical features underlying
testing.
the degradation trend helps to improve the performance of the
The proposed approach achieves also better prediction capa-
RUL prediction model, and that adding the attention mechanism
bility compared to the recently presented CNN-LSTM model
enhances the prediction capability. The results demonstrate as
for laser RUL estimation [12] (RMSE = 385 hours, MAE =
well that the GAN is able to produce laser reliability data that is
261 hours) by providing 63.14% and 75.9% improvements in
close to the real experimental data, and in case of limited in-field
RMSE and MAE metrics respectively.
data or experimental data, synthetic data generated by GAN is a
To assess further the RUL estimation capability of the pro-
good solution to train the ML model, and that the performance of
posed model, we compare the predicted RUL values to the true
the ML model trained with the synthetic data is good when tested
RULs at different stages of degradation. As shown in Fig. 22,
with real data. The same concept of the proposed framework
the RUL values estimated by the model are very close to the
is readily applicable to other optoelectronic devices such as
true RUL values, which proves the effectiveness of the proposed
semiconductor optical amplifiers due to the similarity of their
model in accurately predicting the RUL of the laser device.
structures.
The impact of reducing the length of the input sequence
on the performance of the ML model for RUL prediction is
REFERENCES
explored as well. Fig. 23 shows that the prediction capability
of the ML model decreases with the reduction of the length of [1] T. Katsuyama, “Development of semiconductor laser for optical commu-
nication,” SEI Tech. Rev., vol. 69, pp. 13–20, 2009.
the input sequence due to the loss of the relevant information [2] F. Ujager, S. M. H. Zaidi, and U. Younis, “A review of semiconductor
describing the degradation trend which adversely impacts the lasers for optical communications,” in Proc. 7th Int. Symp. High-Capacity
performance. Opt. Netw. Enabling Technol., Dec. 2010, pp. 107–111.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:10:16 UTC from IEEE Xplore. Restrictions apply.
4708 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 40, NO. 14, JULY 15, 2022

[3] J. Jimenez, “Laser diode reliability: Crystal defects and degradation [11] K. Abdelli et al., “Machine learning based laser failure mode detection,”
modes,” Comptes Rendus Physique, vol. 4, no. 6, pp. 663–673, 2003. in Proc. 21st Int. Conf. Transp. Opt. Netw., 2019, pp. 1–4.
[4] I. Gontijo et al., “Reliability of semiconductor laser packaging in space [12] K. Abdelli et al., “A hybrid CNN-LSTM approach for laser remaining
applications,” in Proc. 2nd Electron. System-Integration Technol. Conf., useful life prediction,” in Proc. Optoelectron. Commun. Conf., 2021
Sep. 2008, pp. 1127–1130. Paper S3D-3.
[5] T. Yuasa et al., “Degradation of (AlGa)As DH lasers due to facet oxida- [13] K. Cho et al., “Learning phrase representations using RNN encoder-
tion,” Appl. Phys. Lett., vol. 32, 1978, Art. no. 119. decoder for statistical machine translation,” in Proc. Conf. Empirical
[6] S. Nakamura, “The roles of structural imperfections in InGaN-based blue Methods Natural Lang. Process., 2014, pp. 1724–1734.
light-emitting diodes and laser diodes,” Sci., pp. 956–961 vol. 281. 1998. [14] M. A. Kramer, “Nonlinear principal component analysis using autoasso-
[7] M. Fukuda et al., “Reliability and degradation mechanism of InGaAsP/lnP ciative neural networks,” AIChE J., vol. 37, no. 2, pp. 233–243, 1991.
semiconductor lasers,” Ann. Télécommun., vol. 45, pp. 625–629, 1990. [15] J. Chorowski et al., “Attention-based models for speech recognition,” in
[8] J. S. Huang, “Temperature and current dependences of reliability degrada- Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 577–585.
tion of buried heterostructure semiconductor lasers,” IEEE Trans. Device [16] T. Luong et al., “Effective approaches to attention-based neural machine
Mater. Rel., vol. 5, no. 1, pp. 150–154, Mar. 2005. translation,” in Proc. Conf. Empirical Methods Natural Lang. Process.,
[9] K. Abdelli et al., “Lifetime prediction of 1550 nm DFB laser using 2015, pp. 1412–1421.
machine learning techniques,” in Proc. Opt. Fiber Commun. Conf., 2020, [17] I. Goodfellow et al., “Generative adversarial nets,” in Proc. Int. Conf.
Paper Th2A.3. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
[10] K. Abdelli et al., “Federated learning approach for lifetime prediction of [18] V. D. Maaten et al., “Visualizing data using t-SNE,” J. Mach. Learn. Res.,
semiconductor lasers,” in Proc. Opt. Fiber Commun. Conf., 2022. vol. 9, pp. 2579–2605, 2008.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:10:16 UTC from IEEE Xplore. Restrictions apply.

You might also like