IEEE Conf 2018 TrackNet - A - Deep - Learning - Based - Fault - Detection - For - Railway - Track - Inspection

TrackNet - A Deep Learning Based Fault Detection
for Railway Track Inspection

Ashish James∗† , Wang Jie∗† , Yang Xulei† , Ye Chenghao‡ , Nguyen Bao Ngan‡ , Lou Yuxin⊥ , Su Yi⊥ ,
Vijay Chandrasekhar† , and Zeng Zeng§†
† Institute
for Infocomm Research, A*STAR, Singapore
‡ SMRT Trains Ltd, Singapore
⊥
SMRT Corporation Ltd, Singapore
Email: {ashish james, wang jie, yang xulei, vijay, zengz}@i2r.a-star.edu.sg
{YeChenghao, NguyenBaoNgan, LouYuxin, SuYi}@smrt.com.sg
Abstract—Reliable and economical inspection of rail tracks of rolling stock over rail components such as welds, joints,
is paramount to ensure the safe and timely operation of the and switches; or because of the impacts from damaged wheels
railway network. Automated vision based track inspection uti- [4]. Railway operators worldwide have been highly concerned
lizing computer vision and pattern recognition techniques have
been regarded recently as the most attractive technique for about such defects as if treated late, they may lead to major
track surface defect detection due to its low-cost, high-speed, revenue loss and have bigger implications like loss of life due
and appealing performance. However, the different modes of to accidents.
failures along with the immense range of image variations that Traditionally, a trained person visually inspects the rail for
can potentially trigger false alarms makes the vision based track defects which makes the whole process slow, subjective and
inspection a very challenging task. In this paper, a multiphase
deep learning based technique which initially performs segmenta- dangerous. This led to many advanced non-destructive testing
tion, followed by cropping of the segmented image on the region (NDT) techniques, which acquire the condition of a rail from
of interest which is then fed to a binary image classifier to sensors (such as visual and ultrasonic) with the information
identify the true and false alarms is proposed. It is shown that the fed to some sophisticated software to detect defects. Currently,
proposed approach results in improved detection performance by the available NDT techniques for rail inspection utilizes visual
mitigating the false alarm rate.
Index Terms—Railway track inspection; track fault detection; cameras, ultrasonics, eddy current, etc. [5], [6]. One of the
deep learning; deep convolution neural networks best performance for detecting internal rail cracks has been
inspection utilizing ultrasonics [7], [8]. However, its inspection
I. I NTRODUCTION speed is slow (no more than 75 km/h) [5] and it cannot detect
surface defects. In order to improve the inspection speed, sev-
Recently, feature learning using deep neural networks eral improved ultrasonic techniques such as electromagnetic
has proved to be successful when applied to a variety of acoustic transducers, lasers, and air-coupled ultrasonics were
computer vision and classification problems in diverse ap- proposed, but they did not achieve enough progress to detect
plication domains. The accuracy of such systems for several surface defects [5].
benchmark datasets have improved over classical hand-crafted The NDT technique using eddy current uses magnetic field
feature learning approaches and have achieved state-of-the- generated by eddy currents to identify defects [9]. This tech-
art performance on many use cases [1], [2]. Some of the nique has relatively high inspection speed and is able to detect
advances made in these domains can be applied for detection surface defects, so it is widely combined with ultrasonics for
and identification of faults in railway tracks which is crucial rail inspection. However, the sensor of eddy current is very
for the safety and availability of railway networks. sensitive to the lift-off variation with the probe positioned at a
Around the world rail systems are among the most pre- constant distance (no more than 2 mm) from the surface of the
ferred public transportation methods and are becoming busier rail head [10]. As a result, the operation of eddy current testing
requiring them to operate with increasing levels of availability is complex and sensitive; furthermore, the reported highest
and reliability [3]. In addition, the speed and loads of trains speed of this testing is also no more than 100 km/h [5].
have also been increasing greatly in recent years, and all these With the recent advances in computer vision techniques,
factors inevitably raise the risk of producing rail defects. There visual based track inspection system (VTIS) for rail surface
are different reasons for the occurrence of rail surface defects, detection have been developed. In VTIS, a high speed camera,
for example as a result of fatigue, due to the repetitive passing installed under a test train captures the images of the track
∗ Joint first author
as it moves over them; with further analysis of the images
§ Corresponding author being performed by an image processing software for custom
applications such as bolt detection [11], corrugation inspection
[12], and crack detection [13]. Visual based track inspection
978-1-5386-7528-1/18/$31.00 ©2018 IEEE systems have the advantages of high speed, low cost, and
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on March 31,2022 at 15:07:42 UTC from IEEE Xplore. Restrictions apply.
appealing performance and is regarded as the most attractive depth and activation functions are studied for the detection and
technique for track surface defect detection [5]. However, classification of rail surface defects.
many of the commercial off-the-shelf (COTS) VTIS systems
III. P ROBLEM D ESCRIPTION
have high false alarm rate resulting in a considerable amount
of maintenance man-hours spent to screen such images for In VTIS, the images are initially captured by the image
eliminating the false alarms. This paper focuses on the VTIS acquisition subsystem which are then fed to the image analysis
system and proposes a multiphase deep learning based rail subsystem for rail surface anomaly detection and classification.
surface anomaly detection and classification technique. The defects flagged by the image analysis subsystem are then
The proposed technique, named TrackNet performs image inspected by a human reviewer and it has been observed that
segmentation in the first phase to extract the rail tracks and there is a high false alarm rate. This is caused due to varying
locate the Regions of Interest (ROI) from the raw image. This reasons such as animal droppings, writings etc. on the rail
step benefits the classification being performed in the final track which are then subsequently classified as a rail surface
phase by discarding the noisy background and other non- defect by the VTIS. A case of true and false alarm (caused
relevant information. The cropped images focusing on the ROI by writing on the track) flagged by a COTS VTIS is shown
are then fed to the final classification phase where the rail in Fig. 1.
defects are identified and classified.
The rest of the paper is organized as follows. In Section
II, a review of the related works on rail defect detection
is presented. Section III describes the problem addressed in
this work followed by the proposed approach in Section IV.
In Section V, the experimental results are presented with a
comparison of different deep learning based VTIS techniques.
Finally, Section VI provides the conclusions and suggests the (a) True alarm (b) False alarm
possible future investigations.
Fig. 1: Images of true and false alarm from a COTS VTIS
II. R ELATED W ORKS The high false alarm rate results in considerable amount
Vision based track inspection technology has been gradually of maintenance man-hours expended in just screening through
adopted by the railway industry since the pioneering work by thousands of images to identify the correct defects. However,
Torsino et. al. [14], [15]. Classically, the common choices of such a task is quite cumbersome and the multiphase deep
features for detection from visual data has been gradient-based learning technique proposed in this paper, named TrackNet
features such as the histogram of oriented gradients (HoG), will enhance the VTIS performance by mitigating the false
scale-invariant feature transforms (SIFT), spacial pyramids, alarm rate and is explained in the next Section.
and basis function representations such as Gabor filters. In IV. A PPROACH
[11], [16], a two 3-layer neural network running in parallel Conventionally, the images from the VTIS with additional
is used to detect hexagonal headed bolts. In [17], a VisiRail data augmentation are used for fault classification. Deep con-
system which collects images on each rail side, and find volutional neural networks are popular models when dealing
cracks on joint bars using edge detection and a support vector with image classification and fault detection problems. In this
machine (SVM) classifier that analyzes the extracted features work, state-of-the-art convolutional neural networks such as
from the edges is proposed. A system for detecting tie plates ResNet and DenseNet are adopted as the baseline techniques
and missing spikes using an AdaBoost-based object detector for performance comparison with the proposed TrackNet.
is proposed in [18].
In recent years, the expansion of feature learning using TrackNet
neural networks has provided a better tool for extracting The proposed technique named TrackNet is a multiphase
features that are specifically tailored for each domain. In [19], deep learning approach which integrates track segmentation
a convolutional neural network trained on a database of photo- and true/false alarm classification tasks. This is achieved
metric stereo images for detecting steel defects on rail surfaces through two neural networks with one dealing with semantic
is proposed. By means of differently colored light-sources track segmentation and another on classifying the segmented
illuminating the rail surfaces, the defects are made visible in images as either true or false alarms.
a photometric dark-field setup. A max-pooling convolutional For semantic segmentation, a U-Net is used to extract rail
neural network is used for steel defect classification in [20]. tracks and locate ROI. The U-Net is a convolutional neural
In [21], [22], deep convolutional neural networks have been network architecture for fast and precise image segmentation
used for rail fastening condition monitoring with the focus on [23] and have up to now outperformed the prior best method
identification of track components such as ballast, concrete, (a sliding-window convolutional network). The architecture
wood, and fastener. In [4], deep convolutional neural networks consists of a contracting path to capture context and a sym-
with different network architectures characterized by different metric expanding path that enables precise localization. It
introduces the shortcut/skip connections to preserve the pixel- For the final phase of the TrackNet, the weights of the
level information for different image resolutions. In the present ResNet/DenseNet model are initialized from a model trained
context, this is crucial to minimize the ROI as the rail tracks on ImageNet [26]. Typical weights for pre-trained model is
appear in various forms, i.e. tracks split and merge for different based on ImageNet as the training is based on classification of
routes. In TrackNet, the initial segmentation is done by the U- real-life objects for 1000 classes. The use of ImageNet weight
Net with the architecture specified in [23] and trained using substantially reduces the cost for fine-tuning. The network
Adam algorithm with an initial learning rate of 1e − 4 and is trained end-to-end using stochastic gradient descent with
binary cross entropy as the loss function. The model is trained standard parameters and using images in mini-batches of size
with images in mini-batches of 4 and the best model is selected 16. In this work, the final fully connected layer is replaced
from 10 epochs. with a small customized convolutional neural network model
After the initial segmentation by the U-Net and extraction of that consists of two layers. The number of units in the two
ROI, image processing tools are used to crop the portion of the layers are 256 and 2, respectively. The final fully connected
potential faulty region in the images indicated by the bounding layer corresponds to the two-classes representing the true/false
boxes. The size of the cropping window is approximately alarm cases.
64x64 and a case of true and false alarm image after the The fine-tuning process for training the TrackNet consists of
cropping is shown in Fig. 2. The cropped images are saved two phases. In the first phase, all layers except the customized
and then fed to the next phase for classification. convolutional neural network layers are frozen and trained
so as to customize the added layers according to the present
dataset. By making use of the features from the previous frozen
blocks, the final customized layers can be well tuned to the
dataset. In the second phase, all layers are unfrozen and a
typical classification training is executed. This two phase fine-
tuning can improve the average accuracy of classification as
well as speed up the fine-tuning process. For each of the cases,
the training is performed for 50 epochs and one with the lowest
(a) True alarm (b) False alarm validation loss is selected as the best model.
Fig. 2: Cropped images of true and false alarm case after V. E XPERIMENTAL R ESULTS
segmentation by U-Net
In this section, initially the experimental setup is described
followed by the performance results of the proposed technique.
For the last phase in TrackNet, the cropped images with
ROI are fed to a neural network that classifies the images into A. Experiment Setup & Data Description
either True or False alarms. The performance of the proposed All images used in this experiment are actual rail tracks
technique is compared for different classifier architectures that are collected by a COTS VTIS. The top view of the rail
such as ResNet [24] and DenseNet [25], which are cur- track is captured by the VTIS as shown in Fig. 1. The dataset
rently the pinnacles of neural network architectures for image consists of 138 images with each image having potentially
classification. Among this DenseNet have several compelling at least one faulty area. Of the 138 images only 14 are true
advantages such as alleviating the vanishing-gradient problem, alarms and rest of the images are false alarms flagged by the
strengthening feature propagation, encouraging feature reuse, COTS VTIS.
and substantially reducing the number of parameters. This The raw images from the image acquisition subsystem
makes the optimization of very deep neural networks track- contains many information such as machine generated text-
able and robust. The architecture of the proposed technique, based label and comment, other parts of the rail track, rocks
TrackNet is illustrated in Fig. 3. and ties, etc as shown in Fig. 1, which are irrelevant for the
type of track defect classification namely rail discontinuity
considered in this paper. This motivates to use segmentation
for extracting the rail tracks in TrackNet. The images are
initially resized to 512 × 512 before being fed to the U-Net
for segmentation. The segmented images are then cropped to
a size 64 × 64 around the ROI. The images are then upscaled
to 224 × 224 before being fed to the ResNet/DenseNet for
the final classification task. This is performed according to
the mean and standard deviation of images in the ImageNet
training dataset. Further, data augmentation is performed on
the images being fed to the U-Net and DenseNet/ResNet
architectures. The images in the training dataset are augmented
Fig. 3: TrackNet architecture using standard parameters such as rotation = 0.2, shift = 0.05,
shear = 0.05, zoom = 0.05, and by vertical and horizontal
mirroring of the images. This basically expands the dataset
by a factor of around 30 resulting in nearly 4000 images for
training.
An additional trick is used in the final phase of TrackNet
when training it for classification. In the first phase of fine-
tuning the ResNet/DenseNet, instead of training the entire
model at runtime, a typical ResNet/DenseNet model trained (a) Segmented image (b) Sharpened image
on the ImageNet dataset is used to make predictions. The
bottleneck features at the last layer before the final fully Fig. 5: Segmented image sharpened through thresholding
connected layer are extracted and saved in static data files. The operation
customized layers of the TrackNet will use these saved features
as input during training. This saves considerable amount of
and DenseNet as the classifier are quite close in terms of
runtime computational resources.
accuracy with the DenseNet appearing to be a slightly better
The experiments are run on an Ubuntu 16.04 machine
at distinguishing true/false alarms.
with four Nvidia Titan X GPUs. For the classification task,
the dataset is divided into training and testing set with 75% TABLE I: Performance Comparison
allocated for training and 25% for testing. Further, K-fold cross
validation is performed on the model with K being 4. TrackNet TrackNet us-
Fold ResNet DenseNet using ResNet ing DenseNet
B. Performance Results Classifier Classifier
1 0.6332 0.6746 0.8644 0.9036
In the initial phase of TrackNet, segmentation is performed 2 0.6050 0.6988 0.9211 0.8795
to extract the rail track from the random camera shot image 3 0.7095 0.6482 0.8892 0.8915
as shown in Fig. 4. 4 0.5914 0.5549 0.8774 0.9390
Average 0.6347 0.6441 0.8880 0.9034
C. Baseline Comparison
As mentioned in Section IV, state-of-the-art deep learning
models for classification such as ResNet and DenseNet trained
on the raw images from the image acquisition subsystem are
used as the baseline system for comparison with the proposed
TrackNet. The performance comparison results are shown in
(b) Segmented image Table I. Recall that the difference in TrackNet is that instead
(a) Original image of using the raw images for classification as in the baseline
Fig. 4: Extracted rail track from the original image systems, it uses the track only content extracted from the raw
images for classification.
The performance of this phase is validated in terms of the Given that the faulty region in majority of our dataset
dice coefficient, which is defined by only occupies a few thousandth of the area of the whole
image as shown in Fig. 1, hence a large margin of image
2 × |X ∩ Y |
DICE = (1) content does not contribute directly to track fault namely rail
|X| + |Y | discontinuity classification. By extracting the ROI, it can be
where X and Y corresponds to the prediction and the target, re- clearly seen that TrackNet satisfactorily distinguishes the true
spectively. For the initial segmentation task, a dice coefficient and false alarms. When compared with the baseline systems
of 0.99 is obtained for the configuration explained in Section which has the best accuracy of 71%, the TrackNet brings in
V. The segmented images are then sharpened by setting a a huge improvement by having an average accuracy of 90%.
threshold of 0.6. The output after the sharpening is shown in Intuitively, by focusing on the ROI, TrackNet minimizes the
Fig. 5. input noise from external and environmental noise. This makes
The cropped images after the first phase of TrackNet are it a suitable model for detecting track faults on large scale
then fed to the final classification phase which differentiates industry level environments for railway track inspection.
the true and false alarms. In this paper, two neural network
architectures namely DenseNet and ResNet are employed and D. Limitations
performance is compared for the classification phase. The The limitations of the proposed TrackNet has been identified
performance metric used for comparison is the accuracy of into three. Firstly, this work focuses only on one type of
classification and the results are illustrated in Table I. It can track fault namely rail discontinuity for classification which
be observed that the performance of the TrackNet with ResNet enhances the need for extracting the ROI. However, in many
other types of fault detection and classification scenarios rely [7] R. Clark, S. Singh, and C. Haist, “Ultrasonic characterisation of defects
on the status of surrounding objects and environmental con- in rails,” Insight - Non-Destructive Testing and Condition Monitoring,
vol. 44, no. 6, pp. 341–347, 2002.
dition. Secondly, the machine generated comment region on [8] R. S. Edwards, S. Dixon, and X. Jian, “Characterisation of defects in
raw images from the image acquisition subsystem introduces the railhead using ultrasonic surface waves,” NDT & E Intl., vol. 39,
unnatural noise, which have been shown to degrade the clas- no. 6, pp. 468–475, 2006.
[9] M. Bentoumi, P. Aknin, and G. Bloch, “On-line rail defect diagnosis
sification accuracy. Finally, majority of images in the dataset with differential eddy current probes and specific detection processing,”
show a clear pattern where the tracks are perfectly arranged Eur. Phys. J. Appl. Phys., vol. 23, no. 3, pp. 227–233, 2003.
vertically, which might not be the case when track crossings [11] F. Marino, A. Distante, P. L. Mazzeo, and E. Stella, “A real-time
visual inspection system for railway maintenance: Automatic hexagonal-
are involved. However, this can be addressed by training the headed bolts detection,” IEEE Trans. Syst., Man, Cybern. C, vol. 37,
TrackNet with images of track at different alignments. no. 3, pp. 418–428, May 2007.
[12] C. Mandriota, M. Nitti, N. Ancona, E. Stella, and A. Distante, “Filter-
VI. C ONCLUSION & F UTURE W ORKS based feature selection for rail defect detection,” Machine Vision and
Applications, vol. 15, no. 4, pp. 179–185, 2004.
In this paper, a multiphase deep learning technique is intro-
[13] L. Jie, L. Siwei, L. Qingyong, Z. Hanqing, and R. Shengwei, “Real-time
duced for detecting rail surface defects in vision based railway rail head surface defect detection: A geometrical approach,” in IEEE Intl.
track inspection system. The first phase in this technique Symp. Indust. Electron., Jul. 2009, pp. 769–774.
extracts the track through segmentation and the extracted [14] J. J. Cunningham, A. E. Shaw, and M. Trosino, “Automated track
inspection vehicle and method,” U.S. Patent 6 064 428, May, 2000.
tracks are then used for classification. Such an approach [15] M. Trosino, J. J. Cunningham, and A. E. Shaw, “Automated track
enables the classifier to focus on the ROI and results in inspection vehicle and method,” U.S. Patent 6 356 299, Mar., 2002.
better performance. In the current context, binary classification [16] P. D. Ruvo, A. Distante, E. Stella, and F. Marino, “A gpu-based
vision system for real time detection of fastening elements in railway
is performed with emphasis on mitigating the false alarms inspection,” in IEEE Intl. Conf. Image Processing (ICIP), Nov. 2009,
in VTIS. However, there are several types of rail defects pp. 2333–2336.
and exploring a generalized deep learning approach that can [17] X. Gibert, A. Berry, C. Diaz, W. Jordan, B. Nejikovsky, and A. Tajaddini,
automatically detect other types of defects will be the future “A machine vision system for automated joint bar inspection from
a moving rail vehicle,” in ASME/IEEE Joint Rail Conf. & Internal
direction of our work. Combustion Engine Spring Technical Conf., Mar. 2007, pp. 289–296.
[18] Y. Li, H. Trinh, N. Haas, C. Otto, and S. Pankanti, “Rail component
ACKNOWLEDGMENT detection, optimization, and assessment for automatic rail track inspec-
tion,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 2, pp. 760–770, Apr.
This work was performed as one of the initiatives under the 2014.
A*STAR-SMRT Urban Mobility Innovation Centre and the [19] D. Soukup and R. Huber-Mörk, “Convolutional neural networks for steel
labelled images used were provided by SMRT Trains Ltd. surface defect detection from photometric stereo images,” in Advances
in Visual Computing (ISVC), 2014, pp. 668–677.
R EFERENCES [20] J. Masci, U. Meier, D. Ciresan, J. Schmidhuber, and G. Fricout, “Steel
defect classification with max-pooling convolutional neural networks,”
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification in Intl. J. Conf. Neural Networks (IJCNN), Jun. 2012, pp. 1–6.
with deep convolutional neural networks,” in Proc. Intl. Conf. Neural [21] X. Gibert, V. M. Patel, and R. Chellappa, “Deep multitask learning for
Information Processing Systems, 2012, pp. 1097–1105. railway track inspection,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 1,
[2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for pp. 153–164, Jan. 2017.
large-scale image recognition,” in Proc. Intl. Conf. Learning Represen-
[22] X. Giben, V. M. Patel, and R. Chellappa, “Material classification and
tations (ICLR), vol. abs/1409.1556, 2014.
semantic segmentation of railway track images with deep convolutional
[3] J. Chen, C. Roberts, and P. Weston, “Fault detection and diagnosis for
neural networks,” in IEEE Intl. Conf. Image Processing (ICIP), Sep.
railway track circuits using neuro-fuzzy systems,” Control Engineering
2015, pp. 621–625.
Practice, vol. 16, no. 5, pp. 585–596, 2008.
[4] S. Faghih-Roohi, S. Hajizadeh, A. Nez, R. Babuska, and B. D. Schut- [23] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-
ter, “Deep convolutional neural networks for detection of rail surface works for biomedical image segmentation,” Medical Image Computing
defects,” in IEEE Intl. Joint Conf. Neural Networks (IJCNN), Jul. 2016, and Computer-Assisted Intervention (MICCAI), vol. 9351, pp. 234–241,
pp. 2584–2589. 2015.
[5] M. P. Papaelias, C. Roberts, and C. L. Davis, “A review on non- [24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
destructive evaluation of rails: State-of-the-art and future development,” recognition,” in IEEE Conf. on Computer Vision Pattern Recognition
Proc. Institution Mech. Eng., Part F: J. Rail Rapid Transit, vol. 222, (CVPR), Jun. 2016, pp. 770–778.
no. 4, pp. 367–384, 2008. [25] G. Huang, Z. Liu, L. v. d. Maaten, and K. Q. Weinberger, “Densely
[6] R. Clark, “Rail flaw detection: Overview and needs for future develop- connected convolutional networks,” in IEEE Conf. Computer Vision
ments,” NDT & E Intl., vol. 37, no. 2, pp. 111 – 118, 2004. Pattern Recognition (CVPR), Jul. 2017, pp. 2261–2269.
[10] H.-M. Thomas, T. Heckel, and G. Hanspach, “Advantage of a combined [26] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
ultrasonic and eddy current examination for railway inspection trains,” Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-
Insight - Non-Destructive Testing and Condition Monitoring, vol. 49, Fei, “Imagenet large scale visual recognition challenge,” Int. J. Comput.
no. 6, pp. 341–344, 2007. Vis., vol. 115, no. 3, pp. 115–211, 2015.

IEEE Conf 2018 TrackNet - A - Deep - Learning - Based - Fault - Detection - For - Railway - Track - Inspection

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IEEE Conf 2018 TrackNet - A - Deep - Learning - Based - Fault - Detection - For - Railway - Track - Inspection

Uploaded by

Copyright:

Available Formats

TrackNet - A Deep Learning Based Fault Detection

for Railway Track Inspection

You might also like