DL 6

Computer Methods and Programs in Biomedicine 213 (2022) 106501
Contents lists available at ScienceDirect
Computer Methods and Programs in Biomedicine

journal homepage: www.elsevier.com/locate/cmpb
HFRU-Net: High-Level Feature Fusion and Recalibration UNet for

Automatic Liver and Tumor Segmentation in CT Images
Devidas T. Kushnure a,b,1,2,∗, Sanjay N. Talbar a
a
Department of Electronics and Telecommunication Engineering, Shri Guru Gobind Singhji Institute of Engineering and Technology, Nanded, Maharashtra,
India
b
Department of Electronics and Telecommunication Engineering, Vidya Pratishthan’s Kamalnayan Bajaj Institute of Engineering and Technology, Baramati,
Maharashtra, India
a r t i c l e i n f o a b s t r a c t
Article history: Automatic liver and tumor segmentation are essential steps to take decisive action in hepatic disease de-
Received 17 May 2021 tection, deciding therapeutic planning, and post-treatment assessment. The computed tomography (CT)
Accepted 21 October 2021
scan has become the choice of medical experts to diagnose hepatic anomalies. However, due to advance-
ments in CT image acquisition protocol, CT scan data is growing and manual delineation of the liver and
Keywords: tumor from the CT volume becomes cumbersome and tedious for medical experts. Thus, the outcome
Feature Fusion becomes highly reliant on the operator’s proficiency. Further, automatic liver and tumor segmentation
Local Feature Reconstruction from CT images is challenging due to complicated parenchyma, highly variable shape, and fewer voxel
Feature Recalibration intensity variation among the liver, tumor, neighbouring organs, and discontinuity in liver boundaries.
Multiscale Features
Recently deep learning (DL) exhibited extraordinary potential in medical image interpretation. Because
Modified High-level Features
of its effectiveness in performance advancement, the DL-based convolutional neural networks (CNN)
Liver and tumor segmentation
CT Images gained significant interest in the medical realm. The proposed HFRU-Net is derived from the UNet ar-
chitecture by modifying the skip pathways using local feature reconstruction and feature fusion mech-
anism that represents the detailed contextual information in the high-level features. Further, the fused
features are adaptively recalibrated by learning the channel-wise interdependencies to acquire the promi-
nent details of the modified high-level features using the squeeze-and-Excitation network (SENet). Also,
in the bottleneck layer, we employed the atrous spatial pyramid pooling (ASPP) module to represent the
multiscale features with dissimilar receptive fields to represent the rich spatial information in the low-
level features. These amendments uplift the segmentation performance and reduce the computational
complexity of the model than outperforming methods. The efficacy of the proposed model is proved by
widespread experimentation on two datasets available publicly (LiTS and 3DIrcadb). The experimental
result analysis illustrates that the proposed model has attained a dice similarity coefficient of 0.966 and
0.972 for liver segmentation and 0.771 and 0.776 for liver tumor segmentation on LiTS and the 3DIRCADb
dataset. Further, the robustness of the HFRU-Net is confirmed on the independent LiTS challenge test
dataset. The proposed model attained the global dice of 95.0% for liver segmentation and 61.4% for tumor
segmentation which is comparable with the state-of-the-art methods.
© 2021 Elsevier B.V. All rights reserved.
1. INTRODUCTION other body parts. In addition, the liver has a double blood source
from the portal vein and the hepatic artery, which is a unique char-
The liver is one of the massive and critical internal body parts acteristic of the liver that makes the liver an extraordinary body
of the human [1]. It is involved in the detoxification process. It is organ [2].
responsible for the production and supplies the essential fluids to The physician used the most preferred types of diagnostic ra-
diology tests (radiological imaging modalities), which include CT
or computerized axial tomography (CAT), ultrasound, and magnetic
∗
Corresponding author. resonance imaging (MRI) for liver anomalies detection and thera-
E-mail address: devidas.kushnure@gmail.com (D.T. Kushnure). peutic planning. The medical experts prefer CT due to its sturdi-
1
Corresponding author is a Ph.D. research scholar at Department of Electronics
ness, immense availability, quick acquiring procedure, and higher
and Telecommunication Engineering, Shri Guru Gobind Singhji Institute of Engineer-
ing and Technology, Nanded, Maharashtra, India. spatial resolution [3]. Liver segmentation is extracting the voxels
2
ORCID(s): 0 0 0 0-0 0 03-2522-680X (Devidas T. Kushnure). associated with the liver region from the abdomen CT or MRI im-
https://doi.org/10.1016/j.cmpb.2021.106501
0169-2607/© 2021 Elsevier B.V. All rights reserved.
D.T. Kushnure and S.N. Talbar Computer Methods and Programs in Biomedicine 213 (2022) 106501
Fig. 1. CT slices from (a) to (f) indicates different challenges in liver segmentation: (a) represents the abdominal organs surrounded by the liver, (b) intensity difference with
other organs, (c) represents fuzzy boundaries and overlapping with other organs, (d), and (e) represents intensity difference between liver and tumor region and, complex
liver shape, and (f) represents a discontinuity in the liver boundaries.
ages. It has developed as the chosen approach for liver volume the liver has squashy anatomical properties. The fewer intensity
measurement, and it is a crucial stage for the diagnosis and ther- variations between the liver, tumor, and adjacent organs, overlap
apeutic decision-making for hepatic complications. According to with nearby organs, uncertain liver boundaries are the troubles for
GLOBOCAN (Global Burden of Cancer) worldwide 2018 status re- automatic liver segmentation, as shown in Fig. 1. Further, the CT
port, liver cancer has become the sixth most cancer cause and images are acquired by injecting a contrast agent for enhancement;
the second foremost cause of cancer deaths worldwide [4]. Clin- the quantity of contrast decides the noise in CT images, which is
ically liver segmentation for volumetric assessment from CT im- already noisy without a contrast agent [5]. Due to these compli-
ages aids medical experts in the accurate disease diagnosis., it is cations, liver and tumor delineation from abdominal CT volume is
an essential tool for deciding preoperative surgical planning that challenging.
helps in reducing the risks of surgical resection in major hepate-
ctomy, liver transplantation, hepatic interventions, portal vein em- 2. RELATED WORK
bolization, and accurate liver segmentation is the prerequisite for
computer-aided diagnosis (CAD) tools which is envisioned as the The automatic liver segmentation challenges were addressed
other eye for the medical specialists. The liver tumor extraction by many researchers using different image and computer vision
from the volumetric CT data aids the medical experts in treating techniques. The conventional automatic liver and tumor separa-
liver cancer and tumor resection planning, embolization therapy, tion methods based on the dissemination of the Hounsfield Units
and target drug therapy [3,5]. (HU) values in CT images utilized to represent the features, shape
In routine clinical practices, medical experts delineate the liver information, and texture of the liver portion was classified into
images manually, which is tedious and time-consuming. However, three categories such as gray level-based methods, structure-based
manual delineation may hamper the accuracy of the segmenta- methods, and texture-based methods [7]. In addition, the cus-
tion due to human limitations, and opinion relies on the expert’s tomized level-set, automatic seed point identification method, au-
knowledge and experience. On the other hand, advancements in tomated threshold methods were applied for the automatic seg-
medical imaging technologies and acquisition protocols increase mentation [8]. Furthermore, the level-set method was applied with
the resolution and volume size. Therefore, the manual delineation mixed intensity bias and position control for segmentation. Finally,
of hundreds of images per patient is becoming the main con- segmentation results were refined with a sparse shape composi-
cern for the expert [6]. Because of these grounds, automatic liver tion method and further optimized by the graph cut method [9].
and tumor segmentation has been the field of interest among re- However, the performance of these models was inadequate be-
searchers. cause model parameters were determined based on the prior un-
In the past decade, many researchers attempted liver region derstanding and experience of the expert that significantly affected
segmentation in CT volume automatically using image processing expert to expert.
and computer vision techniques to offer a subsequent opinion for Moreover, the substantial development in computer vision us-
the medical expert without human intervention to take the correct ing machine learning algorithms showed improvement in the seg-
decision in less time. However, automatic liver and tumor segmen- mentation performance. Therefore, machine learning algorithms
tation remain challenging for researchers due to their anatomical were utilized by researchers in medical imaging for the develop-
characteristics and variable parenchyma. The shape of the liver is ment of CAD systems. Several techniques applied machine learning
exceptionally irregular and reliant on surrounding organs because methodologies such as supervised support vector machine (SVM)
2
classifier, statistical hidden markov model (HMM), feedforward nections with an additional convolutional layer and residual path-
neural network-based probabilistic neural network (PNN) classifier, way [23]. The CNN-based automatic liver lesion segmentation ap-
and clustering-based alternative fuzzy C-means (AFCN) were uti- proach was proposed with short-range residual connection using
lized to develop CAD schemes [5]. ResNet and long-range concatenated skip connection of UNet. The
Since the last decade, due to the continuous efforts of the re- residual connections benefit from the smooth flow of gradient in
searchers and the availability of heavy computational power, the forwarding and backward direction through the network and en-
computer vision field expanded to the next level, and researchers hance the model convergence speed and performance. The UNet
extended its application in the medical realm. As a result, the DL derived architecture was proposed by integrating Squeeze-and -
concept evolved among the research community in radiological im- Excitation block in encoder and decoder path to uplift the gen-
age examination. The DL methods have publicized the noteworthy eralization ability segmentation performance of the network on
improvement over the traditional algorithms in numerous medi- prostates MRI datasets [24]. The attention gate (AG) mechanism
cal imaging areas, including disease classification, organ segmen- is employed in the skip connection of the baseline UNet architec-
tation, lesion detection, tumor classification, diabetic retinopathy ture to learn the region of interest with varying shape and size
screening [10,11]. DL is a subclass of artificial intelligence. It is an and eliminate irrelevant details from the features to uplift the seg-
emerging field that incorporates a wide range of neural network mentation performance [25]. Many researchers upgraded the UNet
designs devised to carry out various tasks [12]. Specifically, the performance for a variety of segmentation applications by modify-
most established DL model utilized for the images is CNN. It has ing the basic architectures. The key reason behind the modification
become a leading technique in the computer vision field due to in the basic UNet architecture is that UNet suffers from a few lim-
its robust nonlinear feature characterization potential by employ- itations discussed in the following section.
ing several non-identical filters at various layers of the network
and competence to manipulate an enormous data volume [13]. The 2.1. UNet Limitations
CNN-based architecture presented the astonishing performance in
the ILSVRC (ImageNet Large Scale Visual Recognition Competition) The UNet architecture for biomedical image segmentation has
organized in 2012 [14]. Subsequently, the researchers expanded the become the choice for medical image analysis. It has an encod-
utilization of CNN in several computer vision areas, including med- ing (contraction) and decoding (expansion) pathway to process the
ical image analysis, satellite image analysis, driverless car systems, input images. An encoding path extracts the input image features
video processing, natural image analysis. and learns details of the object in an input image at each encoding
Specifically, in medical image analysis, the researchers em- stage. The encoding path encompasses successive convolution and
ployed CNN-based architectures in the medical realm. As a result, pooling layers, which shrinks the feature maps, extracts high-level
they demonstrated the effectiveness of DL for disease classifica- features and passes the low-level information to the subsequent
tion, disease detection, organ segmentation, tumor segmentation, encoding stage. The encoding feature maps are downsampled by
CAD design, progressive disease assessment [14,15]. Moreover, the the convolution and pooling layers to obtain the low-resolution
medical field researchers continuously strive to build state-of-the- features from input images. The decoder is intended to recon-
art techniques based on CNN to support the research progress in struct the low-resolution feature maps into the input image size
the domain, transform the routine practices of the medical experts, using upsampling operation. However, the UNet performance is re-
and ensure the quality of services to the patients. Recently, deep stricted due to substantial loss of spatial information of the object
CNN became the choice of many researchers for automatic liver while consecutive downsampling by pooling and convolution oper-
and tumor delineation from abdomen CT volume. The fully convo- ations at each encoding stage. As a result, the size of the feature
lutional neural network (FCN) built on the principle of CNN with map becomes reduced, and in successive encoding steps network
encoder-decoder is exploited to segment the liver and tumor from learns high-resolution features of the object extracted at each stage
the medical images. The FCN based UNet architecture presented and passes low-level details of the object to the next stage. There-
for biomedical image segmentation [16] was a steppingstone for fore, the network acquired the specific details of an object and lost
the researchers in the domain, which became the decision for nu- localization information. The number of encoding stages increases
merous FCN based buildings. Subsequently, several FCN based ap- the network’s ability to learn more details of the object and lose
proaches were exploited the UNet derived architectures for medi- contextual information.
cal image analysis. In coordination with MICCAI and ISBI, the LiTS Moreover, an increasing number of encoding stages in the seg-
(Liver Tumor Segmentation Benchmark) challenge was conducted mentation does not ensure better performance because of contex-
in 2017. In the challenge, most of the approaches were proposed tual information loss that hampers the network’s ability to learn
on the deep CNN; especially those were UNet inspired architec- the spatial details of the object. The contextual information of the
tures. Nearly all of the methods exploited precise preprocessing on object cannot be propagated to the deeper layers as the number
datasets such as HU -value clipping, normalization, and standard- of layers increases. However, skip connections are utilized to prop-
ization. In addition, some of the methods used post-processing al- agate the high-resolution information of the object at the respec-
gorithms to enhance the segmentation quality [17]. The segmen- tive decoding stage. The high-resolution information concatenates
tation performance improved further with post-processing tech- with the upsampled feature maps to rebuild the region of interest
niques demonstrated using the level Set, graph cut, conditional from input images. However, the spatial loss cannot be entirely re-
Random Field (CRF), and random forest algorithms [18–20]. covered because of the poor representation of high-level features
Lately, various additions of UNet by altering the core construc- in the skip connection. In medical image segmentation, contextual
tion are presented for liver and tumor segmentation. The hybrid information has a critical role in the accurate analysis of the in-
densely connected UNet was presented to investigate intra-slice terested region. This loss cannot be recovered in the upsampling
and inter-slice features by adding a hybrid feature combination stages completely. Therefore, the performance of the UNet is lim-
layer employing 2D and 3D DenseUNet [21] for liver and tumor ited up to a certain level [23].
segmentation. The 3D residual attention-aware RA-UNet suggested In medical image segmentation, the semantic details play a cru-
the residual learning method to represent multiscale attention de- cial role to delineate the region of interest from the images. The
tails and fuse low-level and high-level features [22]. The modified baseline UNet model restricts the segmentation performance due
UNet model is proposed for liver and tumor separation, exploiting to its limitations. In UNet, feature size gets reduced after the pool-
object-dependent feature characterization and modifying skip con- ing operations that cause loss of contextual details. To recover this
3
loss, skip connections are introduced from the corresponding en- the region of interest in the image and suppresses the other un-
coding stage to the decoding stage. However, skip connections can wanted details. As a result, the adaptively recalibrated features
not recover the contextual loss fully and hamper the segmentation enhance the network’s learning and improve the segmentation
outcome. For uplifting the segmentation performance further, the ability of the network.
multiscale information extraction and fusion approach has proven • We also demonstrated the effect of multiscale features at the
its competence in medical image segmentation [26] [27]. The mul- bottleneck layer of the encoder-decoder architecture to enhance
tiscale feature fusion can impart rich semantic details that could be the contextual information of the object with atrous spatial
noteworthy for semantic segmentation. We reviewed several mul- pyramid pooling (ASPP) module. The ASPP module represents
tiscale feature extraction and fusion techniques. the contextual information using atrous convolution with var-
The UNet derived multiscale feature fusion architectures ious dilation rates that can extract the features at multiple
demonstrated a significant advancement in the segmentation per- scales and enhance the receptive field of the convolution op-
formance. The alterations in the baseline architecture were ac- eration to describe features with rich contextual details of the
complished by modifying the skip connections, network encoder- object.
decoder operations, and modifying both to fuse the multiscale • The proposed modifications improve the model sensitivity to
feature maps for extracting more contextual details. The multi- foreground pixels without demanding complex heuristics and
scale information extraction was achieved [28] by utilizing the less computational burden. We designed and trained the end-
dense connections from the encoder to the decoder to enhance to-end network and experimentally analyzed the liver and tu-
the high-level features and nested UNet. The nested UNet archi- mor segmentation performance on two different well-known
tecture with deep supervision allowed model pruning to capture datasets (3Dircadb and LiTS). The ablation study signifies that
output at different stages with reduced complexity and less time. the performance of the proposed model is uplifted than the
However, the model pruning benefits from obtaining the result as state-of-the-art methods.
per the requirement of the application and reduces the compu-
tational complexity. The densely connected encoder and decoder The rest of the paper is organized as follows. The methods and
UNet architecture [29] fused the feature maps of different scales materials are explained in section 3, experimental study, and the
from various network stages to extract multiscale features. The result analysis is discussed in section 4, and section 5 concludes
network proposed [30] to finely exploits the multiscale features the paper.
using the dense encoder-decoder network build with dilated con-
volution operation. Consecutive densely connected convolution fil- 3. METHOD AND MATERIALS
ters designed the MultiRes module with residual connections and
modified residual skip connections to extract the rich spatial in- The proposed methodology is based on modifying high-level
formation [31]. The modifications in the network operations using features using local feature reconstruction, feature fusion, and fea-
the dilated convolution operations with different rates and differ- ture recalibration methods. The high-level features represent the
ent pooling kernels were utilized to abstract the multiscale infor- spatial information of the object in the input image. In UNet, high-
mation [32]. level features are concatenated with the respective decoding stages
Furthermore, the methods proposed to remap the multiscale in- using skip connection. For semantic segmentation, high-level fea-
formation fused from the network using various attention mech- tures characterize the prominent semantic information about the
anisms. The multiscale attention network [27] was designed to object in the input image, enabling the corresponding decoder
combine the local and global features and the multiscale atten- stages to segment the interested region. The segmentation ability
tion module that learns global dependencies. The multiscale UNet of UNet can be uplifted by refining the high-level features in the
architecture [33] exploited the multiscale feature extraction prop- skip connections. The high-resolution information carries semantic
erty of the Res2net module with the multiscale feature recalibra- details of the object and supports the decoding stages to delineate
tion technique to improve the network learning and generalization the liver region more accurately.
ability. Recently, The multiscale context extraction module was de- The proposed HFRU-Net architecture with modified high-level
signed with a context residual attention mechanism [34] to capture information improves the liver and tumor separation quality in CT
the local and global information of the object. Thus, the multiscale images, as shown in Fig. 2. The heuristic alteration in the skip
information extraction and fusion techniques have shown substan- pathways is represented with M-block. The skip connection fea-
tial performance gain in CNN-based architecture. tures are transformed by fusing the reconstructed low-resolution
This paper proposed the novel HFRU-Net DL framework which features with high-resolution to extract rich semantic information.
extracts the rich semantic details by modifying high-level features The low-resolution features converted back to high-resolution fea-
using local feature reconstruction, feature fusion, and adaptive features using deconvolution and relu activation is referred local fea-
ture recalibration scheme and extract the multiscale feature to im- ture reconstruction. This step aims to extract the spatial informa-
prove the representation of low-level features in the bottleneck tion of the object from downsampled features. The locally recon-
layers for automatic liver and tumor segmentation from CT images. structed features after ith stage in the network is fused with the
The contributions incorporated in this paper are summarized be- high-level features of the same stage channel by channel. The fea-
low. ture fusion step enriches the high-level feature representation in
the skip pathways. The modified high-level features were further
• We proposed an UNet derived architecture with modification in refined using the SENet. The SENet is utilized to adaptively recali-
the skip pathways features. The skip connection features were brate the high-level features to describe contextual details promi-
modified by fusing the high-level features and reconstructing nently by modelling the channel-wise interdependencies of the in-
low-level features using the upsampling and nonlinear activa- put features with a simple gating mechanism that operates in two
tion function. The fused features better characterize the high- steps called squeeze and excitation. In the squeeze operation, the
level details of the object in the image. network learns the channel-wise interdependencies, and the exci-
• Further, we employed the Squeeze-and-Excitation (SE) network tation operation model the responses channel-wise by scaling the
in the skip pathways to adaptively recalibrate fused features input features with the modelled channel responses. The adap-
channel-wise by modelling the channel interdependencies. The tively recalibrated features are prominently signifying the spatial
channel-wise adaptive feature modelling explicitly focuses on description of the input features [35]. The modified skip connec-
4
Fig. 2. Proposed HFRU-Net architecture.
tions with feature recalibration confirm the significant details of activation for the reconstruction of the features locally to derive
the high-resolution features at each encoding stage of the network explicit detail of the object and fused with the ith stage features as
and improve the feature depiction capability of the network. In shown in (Eq. 2),
segmentation, semantic information has a vital role in segment-
Fi = f i A U i Di f i (2)
ing the exact shape and location of the liver and tumor in the CT
images. The modified skip connection allows the network in the Where Fi - channel-wise fused features, D i -downsampled features
decoding process to rebuild the exact shape and location of the of ith stage using max-pooling operation, U i - Upsampled features
liver and tumor in the predicted segmentation map. using deconvolution operation, A- nonlinear relu activation func-
Furthermore, during the encoding process, the feature resolution and symbol - represent the channel wise feature fusion op-
tion reduced significantly at each stage of the network. At the bot- eration at each ith stage where i = 1, 2, 3, ....., (L − 1).
tleneck layer, the network learns about the characteristics of the The modified high-level features are recalibrated channel-wise
object in the input image. However, the network lost the contex- by modelling the interdependencies between the channels using
tual information of the object in the input image. To represent the SENet to refine the features further [37]. The SENet architec-
the contextual information at multiscale with an improved field ture utilized in skip connection is shown in Fig. 3. The feature re-
of view and extracts better contextual information from the fea- calibration achieved using the squeeze and excitation operations is
tures at the bottleneck layer. We employed the ASPP module at the explained below.
bridge layer of the network that comprises many parallel convolu- In squeeze operation, the fused input features after ith stage
tional filters to process the input features with different dilation is F i = [F1i , F2i , F3i , ..., Fki ] k- number of input channels with size
rates [36]. The multiscale features with various receptive fields are H × W. Where H- is the height and W- is the width of the feature
aggregated that represent the more contextual information on the map, respectively. The feature map converted into a 1-dimensional
upsampling stages for better segmentation performance. vector of size 1 × 1 × k using the global average pooling op-
erations, where∗∗∗ Fki ∈ RH×W is kth channel from the fused in-
3.1. High-Level Feature Modification and Feature Recalibration put feature Fi of size H × W. The global pooling denotes the 1-
dimensional vector Mi of size∗∗∗ Rk . For kth - channel, the elements
The high-level feature preserves the object’s contextual details in the vector is expressed by (Eq. 3),
that play a crucial role while upsampling segmented maps during
decoding stages. The high-level features at each (L − 1) stages out 1
H
W
Mki = Ssqe (Fk ) = Fk (i, j ) (3)
of L stages are locally reconstructed using deconvolution operation H ×W
i=1 j=1
followed by nonlinear relu activation to extract the prominent de-
tails of the object. The Mi = [M1i , M2i , M3i , ..., Mki ] is the transformed fused features
The features after (convolution, batch normalization, and acti- of Fi . It is the accumulation of renovated features of the local de-
vation) from each ith stage is as shown in (Eq. 1), scriptors whose figures are sensitive to the exclusive features of an
input image.
f i = { f1 , f2 , f3 , ..., fk } (1)
In excitation operation, the network fully captures the channel-
Where i = 1, 2, 3, ....., (L − 1) and k- is the number of features in wise dependencies by employing a simple gating mechanism. It is
the respective stages and fi - features in the ith stage. The features essential to learn a nonlinear interaction among the channels and
from ith stage is upsampled and passed through the nonlinear relu learn a non-mutually exclusive association with multiple channels
5
Fig. 4. Atrous spatial pyramid pooling (ASPP) module in the bottleneck stage.
Fig. 3. Squeeze-and-Excitation network (SENet) for feature recalibration.
Fig. 4. In the ASPP module, the atrous depthwise separable convo-

is allowed to be stressed. The gating mechanism with two fully lution filters (atrous depthwise convolutions followed by pointwise
connected (FC) layers with A- nonlinear relu activation and σ - convolution (1X1)) can process the features channel by channel
sigmoid activation is specified by (Eq. 4), and pointwise to reduce the complexity of the network [38]. The
depthwise convolution is a robust operation to decrease the convo-
E i = SEx Mi , W i = σ A Mi , W i = σ W2i A W1i Mi (4) lution operation parameters and computational complexity [39,40].
k k The atrous depthwise convolution adopted in the ASPP module is
Where∗∗∗W1 ∈ R r ×k and∗∗∗W2 ∈ Rk× r , r- is the dimensionality re- represented as follows (Eq. 6),
duction factor of the FC layer that controls the computational cost
of SENet. The network generalization capability improved with two F [i ] = f [i + d.k]w[k] (6)
FC layers with dimensionality reduction layer with relu activation k
and dimensionality increasing layer with sigmoid activation that The d - decides the dilation rate of the convolution operation,
returns the transformation of the input features F. The effect of which controls the field of view of the convolutional filters. The
varying dimensionality reduction factor r illustrated experimen- different dilation rate gives various features. The ASPP module ex-
tally in the result section. The excitation operation adaptively re- tracts the multiscale features, which provide rich semantic infor-
calibrates the features by learning interdependencies between the mation to the decoding stages. In the proposed model, we em-
channels. The concluding output of SENet is obtained by rescal- ployed the ASPP module as illustrated in Fig. 4, which has atrous
ing the features with the input features and the excitation vector depthwise convolution with dilation or atrous rate (d) = 4, 8, 12.
values of each channel that emphasized the prominent features The feature size is 16 × 16 at the bottleneck stage. The multi-
from the input features. The scaling operation for kth -channel is scale features represent the fine-grained contextual details of the
expressed by (Eq. 5), object for the decoding stages. The encoded information is utilized

F˜ki = Sscale Fki , Eki = Fki Eki (5) for decoding the segmentation map by upsampling and concate-
nating with the recalibrated features in the respective location.
Where F˜ i
= [F˜1i , F˜2i , F˜3i , . . . F˜ki ] and scaling operation Sscale (Fki , Eki )
is The proposed method exploits the multiscale information to
channel-wise multiplication of the Eki ∈ [0, 1] and∗∗∗ Fki ∈ RH×W . represent the rich semantic details from the input features by uti-
The fused features are adaptively recalibrated to explicitly sig- lizing the local feature extraction with the feature recalibration ap-
nify the object’s details and improve the network segmentation proach and the ASPP module. In local feature reconstruction, the
performance by concatenating the recalibrated features with the locally reconstructed features are fused using short skip connec-
corresponding decoding stages. The modification in high-level fea- tions with high-level features, which portray the semantic infor-
tures and adaptive feature recalibration is represented in Fig. 2 by mation precisely and consecutively transformed through the atten-
M-block. tion mechanism to represent the prominent details in the features.
These modified features are concatenated with the respective de-
3.2. Multiscale Feature Extraction at Bridge Stage using ASPP Module coder stages using a skip connection. In addition, we have em-
ployed the ASPP at the bottleneck layer of the network where fea-
The bottleneck layer Lth stage of the network where feature size ture size reduces considerably, and the network loses the spatial
is reduced significantly and the network loses contextual details information of the object of interest. The ASPP module can extract
of the object. In semantic segmentation, contextual information is the multiscale features using parallel convolution operation with
crucial for the accurate segmentation of the object from an input a varying dilation rate that enhances the contextual details in the
image. The bottleneck layer features are modified using the ASPP low-level features of the bottleneck layer. Thus, the HFRU-Net ex-
module [36]. It processes the low-level features through different tracts rich semantic information by modifying both high-level and
receptive fields to extract the object’s rich contextual details us- low-level features using the simple heuristic approach to improve
ing multiple parallel filters with different dilation rates shown in the segmentation performance.
6
Table 1
Feature size and network layer configuration of the proposed network.
Encoding Path Feature Size Network layers Decoding Path Feature Size Network layers

Input 256 × 256 × 1 Conv2D_9 256 × 256 × 1 1 × 1, 1 Sigmoid
Conv2D_1 Conv2D+ BN+ Relu Conv2D_8 256 × 256 × 64 Conv2D+ BN+ Relu
256 × 256 × 64 3×3 64 BN + Relu 256 × 256 × 64 3×3 64 BN + Relu
256 × 256 × 64 3×3 64 BN + Relu 3×3 64 BN + Relu
MaxPooling_1 2×2 Concatenate_4 256 × 256 × 128 Upsampling_4

128 × 128 × 64
M1-Block
⎡ ⎤
⎢Deconv2D + Relu⎥
M1-Block ⎣
⎦ Upsampling_4 256 × 256 × 64 2×2
256 × 256 × 64
256 × 256 × 64
↓
SENet
256 × 256 × 64
Conv2D_2 Conv2D+ BN+ Relu Conv2D_7 128 × 128 × 128 Conv2D+ BN+ Relu
128 × 128 × 128 3×3 128 BN + Relu 128 × 128 × 128 3×3 128 BN + Relu
128 × 128 × 128 3×3 128 BN + Relu 3×3 128 BN + Relu
MaxPooling_2 64 × 64 × 128 2×2 Concatenate_3 128 × 128 × 256 Upsampling_3

M2-Block
⎡ ⎤
M2-Block ⎣
⎦ Upsampling_3 128 × 128 × 128 2×2
128 × 128 × 128
128 × 128 × 128
↓
SENet
128 × 128 × 128
Conv2D_3 64 × 64 × 256 Conv2D+ BN+ Relu Conv2D_6 64 × 64 × 256 Conv2D+ BN+ Relu
3×3 256 BN + Relu 3×3 256 BN + Relu
64 × 64 × 256 3×3 256 BN + Relu 64 × 64 × 256 3×3 256 BN + Relu

M3-Block
⎡ ⎤
M3-Block 64 × 64 × 256 ⎣
⎦ Upsampling_2 64 × 64 × 256 2×2
64 × 64 × 256
↓
SENet
64 × 64 × 256
Conv2D_4 32 × 32 × 512 Conv2D+ BN+ Relu Conv2D_5 32 × 32 × 512 Conv2D+ BN+ Relu
32 × 32 × 512 3×3 512 BN + Relu 32 × 32 × 512 3×3 512 BN + Relu
3×3 512 BN + Relu 3×3 512 BN + Relu

M4-Block
⎡ ⎤
M4-Block 32 × 32 × 512 ⎣
⎦ Upsampling_1 32 × 32 × 512 2×2
32 × 32 × 512
↓
SENet
32 × 32 × 512
ASPP Module 16 × 16 × 1024
The network configuration in terms of input and output fea- ternal covariate shift and allowing the network to converge faster
tures at each layer, layer input, and output features are repre- [41]. The symbol depicts channel-wise feature fusion operation
sented in Table 1. The M-block describes the modified high-level with locally reconstructed features and high-level features from
features comprise fused high-level features with low-resolution re- the that ith stage of the network. The symbol indicates the con-
constructed features and adaptively recalibrated features by SENet. catenation of the upsampled features and the recalibrated features
The convolution operations from the traditional UNet modified in the skip connection at the respective location.
with the convolution action trailed by batch normalization and relu The proposed network’s training and the testing pipeline is il-
activation. The batch normalization layer help to smooth the flow lustrated in Fig. 5. The input dataset is preprocessed first and then
of the gradients during the training process by reducing the in- trained the model using the backpropagation algorithm. Finally, the
7
Table 3
Proposed Network Training configuration.
Parameter Value
Learning rate 1e−5

Optimizer Adam
Weight regularization factor (L2 Regularization) 1e−5
Loss function (LDsc ) dice loss
Mini-batch size 8
Convolution operation 2D size 3 × 3
ASPP module with dilation rate 4, 8, 12
SENet factor (r) 4,8,16
4.2. Data Preprocessing
Preprocessing steps are essential to clear the liver area for seg-
mentation by removing unwanted particulars from the CT volume.
The Hounsfield Units (HU) measure the unit to measure the com-
parative densities of the inside body organs. Generally, the range
of HU values is from -10 0 0 to 10 0 0. Thus, the radiodensities for
liver matters vary in the range from 40 HU to 50 HU [22] in the
CT volume.
The entire CT volume is preprocessed slice by slice. The pre-
processing steps applied on the CT images, as indicated in Fig. 6.
include resizing of 512 × 512 slices into 256 × 256 for reduc-
Fig. 5. Illustration of training and testing pipeline of the proposed model. ing the computational load. In addition, the global windowing of
HU values with a window size of (-250, 200) ensures the removal
of irrelevant details from CT images. Finally, the dataset is normal-
trained model tests the unknown data and evaluates the model
ized on the scale [0,1] to streamline network learning convergence
segmentation performance using statistical measures.
speed and image enhancement to acquire the enhanced liver sec-
tion for segmentation.
4. EXPERIMENTAL STUDY AND RESULT ANALYSIS
Fig. 7 shows the effect of preprocessing on the CT images. The
preprocessing steps provide a clean liver area for segmentation by
4.1. Data Preparation
eliminating the unwanted details from the input images.
The network performance was evaluated on the two different
datasets available publicly LiTS (Liver Tumor Segmentation) chal- 4.3. Training Approach
lenge dataset and 3DIRCADb (3D Image Reconstruction for Com-
parison of Algorithm Database) dataset [42]. The LiTS dataset The deep CNN network training process involved the proper se-
[17] is collected from various hospitals worldwide. It comprises 131 lection of training parameters. The proposed network is trained
contrast-enhanced CT volumes with ground truth for liver and tu- end-to-end by using the training configuration indicated in Table 3.
mor region annotated by the experts and 70 exclusive test volumes The network trained using Adam optimizer with an initial learn-
without ground truth. Each axial slice has a 512 × 512 spatial reso- ing rate of 1 × 10−5 and the learning rate decay on the plateau
lution. Out of 131 cases, we utilized 70 cases for training, 10 cases with patience for five epochs up to a learning rate of 1 × 10−10
for validation, and 20 cases for testing. Also, we used the publicly using the decay factor of 0.1. The dice loss is utilized as a loss
available 3DIRCADb dataset for experimentation and testing [43]. It function and the dice coefficient as a metric while training. The
comprises anonymous CT volumes of various patients and various network training continues to 100 epochs. The network weight
structures of interest manually annotated by expert radiologists. A regularized using L2 regularization with a regularization factor of
dataset containing the 20 CT volumes (10 men and 10 women) was 1 × 10−5 help to regularize the network learning and avoid the
acquired in an enhanced venous phase from numerous European drastic changes in the training weight by penalizing the weight
hospitals with numerous CT scanners. With hepatic tumors in 15 factor. The input data were provided in the batches while training
cases. The spatial resolution is 512 × 512 of each axial slice. The progressed with mini-batches of size 8.
details about the dataset are briefed in Table 2. Due to the scarcity of abundant medical data and medical con-
straints, while acquiring the medical images, the available dataset
has a high imbalance in the background, liver, and tumor voxels.
Table 2
3DIrcadb and LiTS Dataset Specifications.
The dice loss has the potential to combat the class imbalance issue
and is preferred for the segmentation task [44]. The proposed net-
Dataset specifications 3DIRCADb dataset LiTS Dataset work is trained with the dice loss as a loss function and the dice
# 3D CT volumes 20 131 coefficient as a metric. The dice loss is defined as the complement
Spatial Resolution of each Slice 512 × 512 512 × 512 of the dice coefficient [18], and it is stated as (Eq. 7),
Slices in the volumes [74 – 260] [42 – 1026]
N
i=1 P Ri × GTi
[minimum–maximum] 2
X-axis Voxel spacing in (mm) [0.56 – 0.87] [0.60 - 0.98] LDsc = 1 − N N (7)
i=1 P Ri +
2 2
[minimum–maximum] i=1 GTi
Y-axis Voxel spacing in (mm) [0.56 – 0.87] [0.60 - 0.98]
[minimum–maximum] Where PRi , GTi and N- are the binary segmentation, ground truth,
Z-axis Voxel spacing in (mm) [1.60 – 4.00] [0.45 - 0.5] and the number of voxels, respectively. It computes the similar-
(Slice Thickness) ity between two images and consequently optimizes the network
[minimum–maximum]
weights and minimizes the loss.
8
Fig. 6. Preprocessing steps applied to the dataset.
Fig. 7. Sample preprocessed input CT images from 3DIRCADb and LiTS dataset: the first row indicates the resized (256 × 256) CT images, the second row denotes global
HU value windowing, and the third row shows the enhanced images.
4.3.1. Dataset Augmentation level Keras API for artificial neural networks running on the top
The deep neural networks required a huge dataset to avoid of the TensorFlow [46,47]. The high-performance computer termi-
overfitting the network during the training phase and better gen- nal employed for training and testing has configuration: Proces-
eralization in the testing phase. However, in the medical domain, sor: Intel(R) Xeon(R) CPU E5-16200, 3.60GHZ, Memory: 16GB RAM,
it is challenging and time-consuming to generate massive medical Graphics card: Nvidia GeForce GTX TITAN-X, GPU Memory: 12GB
data with an annotation due to the unavailability of medical ex- memory, and Operating System: Windows 10.
perts and data secrecy that limits DL in the medical domain.
Data augmentation is the way to enable researchers to use DL 4.4. Evaluation Metrics
on the small-size medical dataset. The small-size datasets are ex-
panded by employing a data augmentation technique to train DL The volume overlap-based, volume similarity-based, and surface
models. It allows for augmenting the dataset by applying various distance metrics were utilized to assess the segmentation perfor-
rigid and elastic transformations. We used geometric transforma- mance of the model. The Dice Similarity Coefficient (DSC), Inter-
tions and elastic deformations [45] to augment the dataset. The ef- section of Union (IoU), Volumetric Overlap Error (VOE), and Rela-
fects of various transformations on the dataset are shown in Fig. 8. tive Absolute Volume Difference (RAVD) are based on the volume
The data augmentation is instrumental in diminishing the prob- overlap and relative size between gold standard or ground truth
ability of overfitting throughout training progression and refining (GT) and the segmented result. The surface distance measures are
the model’s generalization ability on test data. Average Symmetric Surface Distance (ASSD) and Maximum Sym-
metric Surface Distance (MSSD), which decide the surface distance
4.3.2. Implementation Platform between the surface voxels of GT, and predicted maps (PR) [42,48].
The network was implemented using TensorFlow, an open- These metrics used to assess the superiority of segmentation re-
source machine learning and DL library in the backend, and high- sults are summarized below:
9
Fig. 8. Data augmentation operations: Sample images from the LiTS dataset (a) Preprocessed input image (b) Upscaled image (c) Downscaled image (d) Random rotation
with positive angle (e) Random rotation with negative angle (f) Elastic deformation.
The DSC measures overlap between GT and PR. It is expressed The MSSD denoted as (Eq. 13),
by (Eq. 8), MSSD(GT , P R )
2|GT ∩ P R|
DSC (A, B ) = (8) = max max d (SGT , S (P R ) ), max d (SPR , S (GT ) )
|GT | + |PR| SGT ∈S (GT ) SPR ∈S (PR )
(13)
The DSC value varies between 0 and 1. The DSC value 1 de-
The ASSD and MSSD were measured in millimeters (mm). The 0
notes the accurate segmentation, and 0 represents the no overlap
value of ASSD and MSSD indicates the surface voxels are perfectly
between GT and PR.
segmented, and the maximum value represents the non-perfect
The VOE is the complement of the Jaccard index (JC), referred
segmentation of surface voxels.
to as the intersection of Union (IoU) as given below (Eq. 9),
Also, we estimate the tumor burden, which plays a decisive
V OE (GT , P R ) = 1
− JCGT=∩PR1 − IoU role in the surgical resection planning of the tumor. The tumor
(9)
V OE (GT , P R ) = 1 − ||GT ∪PR|| burden analysis offers a valuable intuition in the progression of
the disease. Additionally, tumor burden investigation is significant
The VOE varies between 0 and 1. The VOE value 0 represents in measuring the efficacy of the various treatment and therapies.
the perfect overlap between GT and PR, and 1 denotes no overlap. Fully automatic segmentation of liver and tumor permits the more
Relative Absolute Volume Difference (RAVD) is defined as straightforward computing of tumor burden and streamlining in-
(Eq. 10), vasive liver resection strategies preparation [17]. The term tumor

|PR| − |GT | burden is defined as the portion of liver occupied by cancerous
RAV D = Abs (10) cells expressed as follows (Eq. 16),
|GT |
No. o f T umor voxels
The RAVD denotes the relative difference between GT and PR. It T umor Burden = (14)
No. o f l iver voxel s
varies between 0 and 1. The RAVD value 0 indicates no difference
The root mean square error (RMSE) is used to estimate tumor
between GT and PR, which refers to perfect segmentation, and 1
burden in the predicted CT volume. It is denoted as follows (Eq.
shows the worst segmentation.
17),
The surface distance metrics ASSD and MSSD measure the cor-
related surface voxels between the GT and PR. ASSD designated by 1
n
(Eq. 12), Let S(GT) specifies surface voxels of GT. The shortest dis- RMSE = (GTi − PRi )2 (15)
n
tance of a random voxel V to S(GT) is stated as below (Eq. 11), i=1
d (v, S (GT ) ) = min v − SGT (11) Where, GTi - is tumor burden of a reference volume, PRi -is the tu-
SGT ∈S (GT ) mor burden of predicted volume, and n- is the number of volumes.
Where, • Signify Euclidean distance. The RMSE value 0 represents an accurate estimation of tumor bur-
den between GT and PR.
1
ASSD(GT , P R ) =
|S(GT )| + |S(PR )| 4.5. Experimental Results and Analysis

d (SGT , S (P R ) ) + d (SPR , S (P R ) ) (12) The liver and tumor segmentation performance is verified on
SGT ∈S (GT ) SPR ∈S (PR ) the two different datasets (LiTS and 3DIRCADb) that offer sig-
10
Table 4
Quantitative result analysis for liver segmentation on LiTS dataset (20 Test volumes).
Performance Metrics
Method
DSC IoU VOE RAVD ASSD (mm) MSSD (mm)
Network with ASSP and SENet 0.968 0.949 0.051 0.004 1.058 43.358
r=4
r=8
r = 16
Network with SENet (without ASPP) 0.955 0.924 0.076 0.010 1.467 50.151
r=4
r=8
r = 16
Network with ASPP (Without SENet) 0.950 0.907 0.093 0.025 1.885 61.945
Network without ASPP and SENet 0.924 0.892 0.108 0.028 2.607 79.196
Table 5
Quantitative result analysis for liver tumor segmentation on LiTS dataset (20 Test volumes).
Performance Metrics
Method
r=4
r=8
r = 16
r=4
r=8
r = 16
Table 6
Quantitative result analysis for liver segmentation on 3DIRCADb dataset.
Performance Metrics
Method
r=4
r=8
r = 16
r=4
r=8
r = 16
nificant variability and complexity in the CT volume. The model segmentation results and removing the ASPP module. The differ-
performance was analyzed using various performance metrics ex- ent statistical metrics were utilized to observe the efficacy of the
plained in the above section. The quantitative analysis on LiTS and network. The most significant segmentation performance measure
3DIRCADb datasets are presented in Table 4,5,6,7. Extensive exper- is the dice similarity coefficient used in most medical image seg-
iments were performed to illustrate the efficacy and effect of the mentation applications for deciding the overlap between GT and
amendment of the heuristics on the segmentation performance. PR. The proposed network with SENet (r = 4) and ASPP module
Results demonstrated the effect of the SENet with various values has achieved a dice coefficient of 0.968 and 0.972 for liver seg-
of dimensionality reduction factor (r) and ASPP module on the mentation and 0.774 and 0.779 for the tumor segmentation on the
segmentation performance. For the experimental analysis, we uti- LiTS dataset 3DIRCADb dataset, respectively. The network perfor-
lized SENet with r = 4, 8, 16 with and without the ASPP module. mance degraded slightly by increasing the value of the dimension-
We also experimented by removing SENet from the network ar- ality reduction factor (r). The network without SENet could achieve
chitecture to quantify the consequence of feature recalibration on the DSC 0.924 and 0.935 for liver segmentation and 0.727 and
11
Table 7
Quantitative result analysis for liver tumor segmentation on 3DIRCADb dataset..
Performance Metrics
Method
r=4
r=8
r = 16
r=4
r=8
r = 16
Table 8 The liver and tumor segmentation is challenging due to com-

Computational complexity of the models.
plex liver parenchyma with fuzzy liver boundaries, variation in
Method # Parameters # FLOPs shape, and fewer intensity variations in the tumor and liver vox-
Network with ASSP and SENet r = 4 33,730,817 67,403,887 els. The segmentation outcomes of the model with ASPP and SENet
Network with ASSP and SENet r = 8 33,643,777 67,229,807 with r = 8 on the LiTS and 3Dircadb datasets are displayed in Fig. 9
Network with ASSP and SENet r = 16 33,600,257 67,142,767 and Fig. 10, respectively. The segmentation results demonstrated
Network with SENet (without ASPP) r = 4 32,621,825 65,210,455 the network competence to delineate the complex liver structure
Network with SENet (without ASPP) r = 8 32,534,785 65,036,375
and tumor with a minimum segmentation error.
Network with SENet (without ASPP) r = 16 32,491,265 64,949,335
Network with ASPP (Without SENet) 33,556,737 67,112,247 In addition, the liver segmentation attained fewer differences
Network without ASPP and SENet 32,447,745 64,862,287 between the GT and the predicted segmentation map. However,
the tumor segmentation ability of the network is better for the ad-
equately bulky tumor, and the network segments the single tumor
with a sufficiently small size. In contrast, the multiple tumors are
0.702 for tumor segmentation on LiTS and the 3DIRCADb dataset. not classified with the same accurateness as that of the single tu-
The network achieved a DSC of 0.955 and 0.960 for liver segmen- mor, and the network shows false-positive results. Tumor segmen-
tation and 0.765 and 0.761 for tumor segmentation without the tation is the classification of voxels in the tumor and nontumor
ASPP module on LiTS and the 3DIRCADb dataset. From the quanti- regions. The tumor present on the liver boundary or edges of the
tative analysis, it is clear that the outcome of the presented model liver region is not classified into the tumor voxels; thus, the tumor
can vary by changing the dimensionality reduction factor. The pro- is over-segmented. Hence, the proposed model offers better perfor-
posed model performs well with r = 4, and when r = 8 the model mance for the liver parenchyma segmentation and limited perfor-
performance reasonably degraded concerning DSC value. However, mance on the tumor segmentation.
it is computationally less complex than the model with r = 4. Furthermore, we demonstrated the statistical significance anal-
The model complexity in terms of the total parameters and ysis for the proposed model in terms of p-value using the sta-
floating-point operations (FLOPs) for all the amalgamations is pre- tistical Wilcoxon signed ranks test (Nonparametric paired t-test)
sented in Table 8. The computational complexity illustrates that as [49,50]. The statistical test is performed to demonstrate the sta-
r increases, the computational complexity of the model reduces, tistical difference between the models by testing the null hypoth-
whereas segmentation performance starts degrading. We also es- esis (H0 ) against the alternative hypothesis (H1 ) with the signif-
timate the tumor burden from the predicted results and ground icance level (α = 0.05) and estimated the p-value that provides
truth, which is significant for the clinical analysis of the liver tu- the evidence against the rejection of the H0 in favour of H1 . The
mor and deciding the surgical planning and therapies for can- H0 - is performance of the models are same and the H1 - is perfor-
cer treatment. The tumor burden estimation analysis is presented mance of the models are different. The test is performed on the
in Table 9. The RMSE is the measure that determines the tumor set of samples (samples = 50, 100, 150) selected arbitrarily from
burden error and the maximum error in the tested volumes. The the test dataset and analyzed the models’ performance in terms of
experimental results indicate that the performance of the model dice scores. The statistical test estimate p-value which provides ev-
varies with the dimensionality reduction factor (r). The r should be idence to rejecting H0 . If the p-value < α signifies H0 is rejected in
selected precisely to achieve better performance and reduced com- favour of H1 . Therefore H1 is accepted which indicates the perfor-
plexity. There is a tradeoff between computational complexity and mance of the models are statistically different. The statistical anal-
model performance. The experimentation with SENet shows that ysis on the liver and tumor segmentation models in terms of p-
the results are merely affecting as we increase the complexity of value with α = 0.05 is shown in Table 10. The optimal performing
the network. Therefore, the model with r = 8 achieved reasonable model (Network with ASSP and SENet r = 4) is compared with
performance with reduced complexity. the rest of the models. The analysis shows that the statistical per-
The main objective of the experimentation is to verify the capa- formance of the proposed models is statistically different as test
bility of SENet and ASPP module for the liver and tumor segmenta- sample increases.
tion from CT volume with the proposed modifications. The perfor- Moreover, apart from the ablation experimentation on LiTS and
mance of the proposed model with and without SENet and ASPP 3DIRCADb dataset by the typical approach of splitting the entire
model clearly shows the significant segmentation performance en- dataset into training, validation, and testing; we tested the model
hancement for liver and tumor using the SENet and ASPP module. performance on the independent LiTS challenge test data volumes,
12
Table 9
Tumor burden estimation on LiTS (20 Test volumes) and 3DIRCADb Dataset.
Tumor Burden
LiTS Dataset 3DIRCADb dataset

Method
RMSE Max. Error RMSE Max. Error
Network with ASSP and SENet r = 4 0.0345 0.474 0.0271 0.405

Network with SENet (without ASPP) r = 4 0.0368 0.390 0.0293 0.328
Network with ASPP (Without SENet) 0.0402 0.391 0.0382 0.358
Network without ASPP and SENet 0.0462 0.565 0.0388 0.413
Table 10
Statistical significance analysis for comparison between the models in terms of p-value and significance level α = 0.05 for different sample sets.
p-Value
Tumor Segmentation Liver Segmentation

Methods comparison Number of Samples Number of Samples
50 100 150 50 100 150
Network with ASSP and SENet r = 4 Vs Network with ASSP and SENet r = 8 0.064 0.003 3.5e-6 1.1e-5 2.0e-5 1.5e-8
Network with ASSP and SENe r = 4 Vs Network with ASSP and SENet r = 16 0.052 0.003 0.056 0.004 0.016 1.4e-23
Network with ASSP and SENet r = 4 Vs Network with SENet (without ASPP) r = 4 0.010 0.008 4.8e-7 7.7e-6 0.009 2.0e-7
Network with ASSP and SENet r = 4 Vs Network with SENet (without ASPP) r = 8 0.041 0.014 2.3e-9 1.7e-7 1.5e-11 4.1e-7
Network with ASSP and SENet r = 4 Vs Network with SENet (without ASPP) r = 16 0.030 0.073 7.1e-8 7.6e-10 5.6e-18 4.4e-24
Network with ASSP and SENet r = 4 Vs Network with ASPP (without SENet) 0.003 0.018 1.2e-6 7.2e-6 3.8e-6 1.3e-6
Network with ASSP and SENet r = 4 Vs Network without ASPP and SENet 0.008 0.026 2.2e-8 0.002 8.0e-18 4.6e-24
Table 11
Segmentation results on LiTS challenge test dataset for liver, tumor and tumor burden estimation.
Methods ASD MSD RMSD Tumor Burden

DPC GDC JC VOE RVD (mm) (mm) (mm)
RMSE Max Error
Network with ASSP and SENet Liver 0.943 0.950 0.895 0.105 0.066 3.027 102.919 10.188 0.052 0.227
r=4 Tumor 0.430 0.614 0.616 0.384 0.223 1.245 7.425 1.847
r=8 Tumor 0.329 0.582 0.586 0.414 0.213 2.246 11.192 3.048
r = 16 Tumor 0.348 0.595 0.643 0.357 0.242 1.457 9.286 2.144
Network with SENet (without ASPP) Liver 0.918 0.924 0.854 0.146 0.035 9.444 206.483 26.647 0.038 0.130
r=4 Tumor 0.354 0.606 0.644 0.356 0.145 1.314 8.814 1.998
r=8 Tumor 0.281 0.548 0.573 0.427 0.253 2.401 12.171 3.309
r = 16 Tumor 0.320 0.494 0.556 0.444 0.150 1.390 7.523 2.012
Network with ASPP (Without SENet) Liver 0.918 0.926 0.859 0.141 0.103 6.888 217.264 22.872 0.040 0.138
Tumor 0.410 0.586 0.623 0.377 0.178 1.452 8.950 2.139
Network without ASPP and SENet Liver 0.928 0.935 0.871 0.129 0.049 7.765 233.422 24.416 0.055 0.285
Tumor 0.399 0.532 0.623 0.377 0.292 1.334 7.977 1.919
which comprises the 70 test volumes provided without ground the dice per case (DPC) of 43.0%, global dice (GDC) of 61.4%, Jac-
truths to the participants. The LiTS test dataset offers significant card coefficient (JC) of 61.6%, volume overlap error (VOE) of 38.4%,
variation and complexity in the liver and tumors in the CT scans. relative volume difference (RVD) of 22.3%, average symmetric sur-
Also, many cases have low contrast, unclear liver and tumor inten- face distance (ASD) of 1.245 mm, maximum symmetric surface dis-
sities and multiple tumors with sufficiently small in size. The LiTS tance (MSD) of 7.425 mm and root mean square symmetric surface
challenge metrics on the test dataset for liver and tumor segmen- distance (RMSD) of 1.847 mm. The tumor burden is estimated by
tation using the same experimental settings are shown in Table 11. the model in terms of the root mean square error (RMSE) of 0.052
The proposed model with ASPP module and SENet with r = 4. with a maximum error of 0.227. The results are available on the
The proposed model achieved the dice per case (DPC) of 94.3%, leaderboard with the username SD. The sample segmentation re-
global dice (GDC) of 95.0%, Jaccard coefficient (JC) of 89.5%, volume sults on the LiTS challenge test dataset are indicated in Fig. 11.
overlap error (VOE) of 10.5%, relative volume difference (RVD) of
6.6%, average symmetric surface distance (ASD) of 3.027 mm, max- 4.5.1. Comparison with other Methods on LiTS and 3DIRCADb Dataset
imum symmetric surface distance (MSD) of 102.919 mm and root The segmentation result of the proposed model is compared
mean square symmetric surface distance (RMSD) of 10.188 mm for with the state-of-the-art methods derived from the baseline UNet
liver segmentation. For lesion segmentation, the model achieved structure. The methods utilized for comparison are the basic UNet,
13
Fig. 9. Sample segmentation results for liver and tumor on the LiTS dataset: row1- Preprocessed input images, row2- Overlaid liver GT (green colour) and Tumor GT (pink
colour) on input images, row3- Overlaid liver segmentation (red colour) and tumor segmentation (blue colour) results on input images, row4- overlayed GT and segmented
results on input images yellow colour represent liver overlapping with GT, and white colour represents the tumor overlapped region with GT, row5- represent the annotated
regions of the input images with GT and segmentation results (green colour-liver GT, red colour- liver segmentation, pink colour- tumor GT and blue colour-segmented
tumor) and row6- the magnified liver region of the annotated images.
and UNet derived architectures. All the models used for the com- has trivial segmentation error than other methods for both liver
parative analysis have experimented with the same dataset and and tumor. The proposed method has revealed better performance
environment. The comparative analysis shows that the proposed in other metrics except for the MSSD, which is slightly better for
method’s liver segmentation results on the LiTS dataset in terms the Attention U-net model for liver and tumor segmentation. How-
of the DSC is 0.966 that is 7.5% better than the UNet model [16], ever, VOE, RAVD, and ASSD are better for the proposed method
2% improved than USE-Net [24], and 1.5% improved than the At- than all other methods. The proposed method shows the signif-
tention U-net model [25], and on the 3Dircadb the DSC is 0.972 icant enhancement in the liver and tumor segmentation on LiTS
which is 10.1%, 3.6% and 1.3% improved than the UNet, USE-Net and 3DIrcadb dataset.
and Attention U-net, respectively. The detailed comparison analy-
sis of the proposed method with other liver segmentation meth- 4.5.2. Comparison of the recently proposed methods on the LiTS
ods is shown in Table 12. The tumor segmentation DSC result on challenge test dataset and the 3DIRCADb dataset
the LiTS dataset of the proposed model is 27.9%, 24.3%, and 13.7% The comparative results in Table 14,15,16 on the LiTS challenge
better than the UNet, USE-Net, and Attention U-net. The DSC re- test dataset and 3DIRCADb dataset reveals that the proposed model
sult on the 3DIRCADb dataset is 30%, 19.1%, and 11.8% better than attained comparable performance with state-of-the-art methods.
the UNet, USE-Net, and Attention U-net, respectively. The detailed Performance on the LiTS test dataset, for liver segmentation, DPC
comparison analysis of the proposed method with other methods and GD achieved by the model are desired whereas MSD value is
for tumor segmentation is shown in Table 13. The liver and tumor high comparatively and for tumor segmentation, DPC and GD of
segmentation differences of the proposed model and other meth- the model are slightly reduced whereas other metrics such as JC,
ods are visualized in Fig. 12. It shows that the proposed method VOE, RVD, ASD, and MSD are comparable with the other methods.
14
Table 12
Comparison with the other methods on LiTS (20 Test volumes) and 3DIRCADb dataset for liver segmentation.
Performance Metrics
Dataset Method
LiTS UNet 0.891 0.855 0.145 0.075 9.277 105.555

USE-Net 0.946 0.903 0.097 0.030 4.325 75.039
Attention U-net 0.951 0.929 0.071 0.022 2.768 32.856
Proposed 0.966 0.945 0.055 0.005 1.021 40.374
3DIRCADb UNet 0.871 0.805 0.195 0.084 10.631 112.125
USE-Net 0.936 0.873 0.127 0.040 3.766 82.039
Attention U-net 0.959 0.922 0.078 0.033 3.174 34.247
Proposed 0.972 0.943 0.057 0.011 1.146 36.244
Table 13
Comparison with the other methods on LiTS (20 Test volumes) and 3DIRCADb dataset for liver tumor seg-
mentation.
Performance Metrics
Dataset Method
LiTS UNet 0.492 0.402 0.598 0.342 2.460 15.313

USE-Net 0.528 0.498 0.502 0.280 1.325 7.1506
Attention U-net 0.634 0.608 0.392 0.124 1.115 6.4207
Proposed 0.771 0.735 0.265 0.092 1.972 6.081
3DIRCADb UNet 0.476 0.382 0.618 0.342 2.784 12.724
USE-Net 0.585 0.512 0.488 0.250 1.447 7.758
Attention U-net 0.658 0.595 0.405 0.103 1.266 7.116
Proposed 0.776 0.707 0.298 0.090 1.407 6.282
Table 14
Comparison of liver and tumor segmentation on LiTS challenge test dataset.
Methods DPC DG JC VOE RVD ASD MSD
Kaluva, Krishna Chaitanya, et al. [52] Liver 0.912 0.923 0.850 0.150 -0.008 6.465 45.928
Tumor 0.492 0.625 0.589 0.411 19.705 1.441 7.515
Bi, Lei, et al. [54] Liver 0.959 - 0.922 0.078 - - -
Tumor 0.500 - 0.388 0.612 - - -
Li, Xiaomeng, et al. [21] Liver 0.961 0.965 0.926 0.074 -0.018 1.450 27.118
Tumor 0.722 0.824 0.634 0.366 4.272 1.102 6.228
Liu, Tianyu, et al. [53] Liver 0.937 0.955 0.894 0.106 - 3.678 -
Tumor 0.592 0.746 0.584 0.416 - 1.585 -
Jin, Qiangguo, et al. [22] Liver 0.961 0.963 0.926 0.074 0.002 1.214 26.948
Tumor 0.595 0.795 0.611 0.389 -0.152 1.289 6.775
Yuan, Yading. [41] Liver 0.963 0.967 0.929 0.071 -0.010 1.104 23.847
Tumor 0.657 0.820 0.622 0.378 0.288 1.151 6.269
Isensee, Fabian, et al. [51] Liver 0.9670 0.9700 0.936 0.064 0.005 0.967 22.479
Tumor 0.7630 0.8580 0.688 0.312 -0.052 0.804 5.212
Zhang, Jianpeng, et al. [55] Liver 0.965 0.968 - - - - -
Tumor 0.730 0.820 - - - - -
Tang, Youbao, et al. [56] Liver 0.966 0.968 - - - - -
Tumor 0.724 0.829 - - - - -
Proposed Liver 0.943 0.950 0.895 0.105 0.066 3.027 102.919
Tumor 0.430 0.614 0.616 0.384 0.223 1.245 7.425
Table 15
Comparison of liver segmentation on 3DIRCADb dataset.
Liver Segmentation
Methods
DSC VOE RVD ASD MSD
Christ, P. F, et al. [20] 0.943 0.107 -0.014 1.6 24

Li, Changyang, et al. [57] 0.945 0.068 -0.112 1.6 28.2
Moghbel, Mehrdad, et al. [58] 0.912 0.060 0.075 - -
Huang, Qing, et al. [59] - 0.086 -0.007 1.6 26.9
Tran, Song-Toan et al. [60] 0.964 0.061 0.019 - -
Kavur, A. Emre, et al. [61] 0.827 - 0.283 22.306 127.884
Li, Xiaomeng, et al. [21] 0.982 0.036 0.0001 1.28 -
Jin, Qiangguo, et al. [22] 0.977 0.045 −0.001 0.587 18.617
Liu, Liangliang, et al. [62] 0.976 0.033 0.003 0.32 2.19
Proposed 0.973 0.052 0.009 1.977 40.682
15
Fig. 10. Sample segmentation results for liver and tumor on the 3DIRCADb dataset: row1- Preprocessed input images, row2- Overlaid liver GT (green colour) and Tumor
GT (pink colour) on input images, row3- Overlaid liver segmentation (red colour) and tumor segmentation (blue colour) results on input images, row4- overlayed GT and
segmented results on input images yellow colour represent liver overlapping with GT, and white colour represents the tumor overlapped region with GT, row5- represent
the annotated regions of the input images with GT and segmentation results (green colour-liver GT, red colour- liver segmentation, pink colour- tumor GT and blue colour-
segmented tumor) and row6- the magnified liver region of the annotated images.
Table 16
Comparison of tumor segmentation on 3DIRCADb dataset.
Tumor Segmentation
Methods
DSC VOE RVD ASD MSD
Christ, P. F, et al. [20] 0.56 - - - -

Tran, Song-Toan et al. [60] 0.733 0.373 -0.158 - -
Li, Xiaomeng, et al. [21] 0.937 0.117 -0.0001 0.58 -
Jin, Qiangguo, et al. [22] 0.830 0.255 0.740 2.230 53.324
Liu, Liangliang, et al. [62] 0.948 0.069 0.001 0.82 6.74
Moghbel, Mehrdad, et al. [58] 0.750 0.228 0.086 - -
Proposed 0.779 0.288 0.083 1.809 6.303
On the other hand, the proposed model is computationally efficient mance. However, the results are comparable with the methods like
than the outperforming hybrid dense UNet [21], nnUnet [51], and 2D-Densely connected CNN [52], Spatial feature fusion CNN [53],
3D RA-UNet [22]. Our model has 33.7 million parameters which nnUnet [51], which employed the post-processing techniques to re-
are significantly less than the outperforming hybrid dense model fine the segmentation results. Thus, the segmentation performance
[21], which has 80 million parameters. Also, we have not utilized of the proposed model is comparable with the recently developed
the post-processing technique to refine the segmentation perfor- methods for liver and tumor segmentation.
16
Fig. 11. Sample segmentation results for liver and tumor on the LiTS challenge test dataset: the first column indicates input CT slices, the second column indicates prepro-
cessed images, the third column indicates segmented map, and the fourth column indicates the overlapped segmentation on the input image.
4.6. Discussion Furthermore, we demonstrated the use of the multiscale feature

extraction ability of the ASPP module to extract contextual details
Automatic liver and tumor segmentation have immense im- from the low-level features at the bottleneck layer of the encoder.
plications in hepatic disease diagnosis, treatment planning, surgi- The feature maps at the bottleneck layer have a small size due to
cal procedure planning, hepatic cancer therapies, and tumor treat- consecutive downsampling and convolutional operations that cause
ment planning. The liver has complicated pathology, and the liver the network to lose the object’s contextual details. In semantic seg-
parenchyma fluctuates drastically because of the livers appearance mentation, contextual details have an essential role in recovering
and its nature in the human body. In addition, the liver shape and the segmentation maps accurately. For extracting more spatial in-
structure are highly dependent on the surrounding abdomen or- formation, the ASPP module was added, which has various dilation
gans. Due to the complex structure and pathology, automatic liver rates that enhance the receptive field of convolutional operations
segmentation became a challenging task for researchers. The pro- and represent the more spatial details at multiple scales. As a re-
posed DL-based encoder-decoder architecture shows significant result, the contextual information from different atrous convolutional
finement over the state-of-the-art methods proposed in the liter- operations aggregated has multiple features extracted with diverse
ature. The proposed approach has uncovered the effect of modi- receptive fields. The experimental outcomes exhibited the effect of
fied high-level and low-level features on the segmentation perfor- the ASPP module and SENet on the proposed network in terms of
mance and its importance in the segmentation task. The high-level performance and computational complexity. The SENet has a di-
features were modified with simple heuristics using local feature mensionality reduction factor (r) that decides the computational
reconstruction and feature fusion techniques. The local feature re- complexity of SENet, and it affects the performance of the net-
construction is achieved using deconvolution followed by a non- work that is demonstrated with different values of r = 4, 8, 16. To
linear relu activation function at each encoding stage after down- achieve satisfying performance with less computational complexity
sampling to extract the detail features of the object, and feature the r value needs to be appropriately selected. Therefore, there is
fusion provide the additional detail of the spatial contents of the a tradeoff between r and the performance of the model. From the
object. Further, we employed the channel-wise feature recalibra- experimental analysis, the r = 8 could be chosen because of the
tion technique by utilizing the attribute of SENet to describe the trivial impact on the segmentation outcome, and it is computation-
channel-wise responses adaptively by learning interdependencies ally efficient. The network performance is considerably degraded
between the channels. The SENet enhances the representation of when SENet is removed from the network, and its effect on the
fused features prominently and improves the segmentation ability segmentation result is experimentally demonstrated. It is evident
of the model by concatenating in the respective decoding stages. that the SENet has attributes to enhance the network’s high-level
17
Fig. 12. Comparative visualization of overlaid segmented results with input images and GT: row1- Preprocessed input images, row2- annotated liver GT and segmented
results, row3- annotated tumor GT and Segmented results, row4- annotated liver GT, tumor GT, Segmented liver and tumor, row5- row6- a magnified liver region of the
annotated images (green colour- liver and tumor GT, blue color-UNet, yellow color-USE-Net, pink color-Attention U-Net and red colour- Proposed method)
features by learning the interdependencies between the channels methods. Notably, the performance obtained is without employing
and recalibrate the channel responses; thus, the network’s general- post-processing methods. Most of the methods that demonstrated
ization ability is upgraded. The model efficacy was verified on the the state-of-the-art performance on the LiTS challenge test dataset
two publicly available LiTS and 3DIRCADb datasets, which offers have utilized post-processing methods for refining the segmenta-
sufficient variations and complexity in the samples. The compar- tion results [17]. The performance of the proposed method can be
ative result analysis demonstrated that the proposed method out- upgraded further by applying the post-processing methods to the
performed the other methods. The overall performance of the tech- segmentation results. On the other hand, the design of HFRU-Net is
nique on the 3DIRCADb dataset is moderately better than the LiTS simpler and computationally inexpensive as compared to the com-
dataset. Further, the model robustness was confirmed under the plex architecture with high computational costs.
same experimental environment, and different settings on the LiTS The model trained by using a supervised learning approach can
challenge test dataset. The model performance obtained for liver be upgraded further by fine-tuning the hyperparameters of the
and tumor segmentation is comparable with the recently proposed network such as learning rate, mini-batch size, optimization algo-
18
rithm. Also, supervised learning algorithm performance is highly [6] O.I. Alirr, A.A.A. Rahni, Survey on Liver Tumour Resection Planning Sys-
reliable on the volume and quality of the data utilized to train and tem: Steps, Techniques, and Parameters, J. Digit. Imaging. 33 (2020) 304–323,
doi:10.1007/s10278- 019- 00262- 8.
test the network. In the future, it is possible to extend this model [7] S. Luo, X. Li, J. Li, Review on the Methods of Automatic Liver Segmentation
for the 3D segmentation of liver and tumor to attain better perfor- from Abdominal Images, J. Comput. Commun. 02 (2014) 1–7, doi:10.4236/jcc.
mance. 2014.22001.
[8] X. Yang, J. Do Yang, H.P. Hwang, H.C. Yu, S. Ahn, B.W. Kim, H. You, Segmenta-
tion of liver and vessels from CT images and classification of liver segments for
5. CONCLUSION preoperative liver surgical planning in living donor liver transplantation, Com-
put. Methods Programs Biomed. 158 (2018) 41–52, doi:10.1016/j.cmpb.2017.12.
008.
In this paper, we presented the novel approach to segment liver [9] Y. Li, Y. qian Zhao, F. Zhang, M. Liao, L. li Yu, B. fan Chen, Y. jin Wang, Liver
and tumor from CT images. This method optimized the high-level segmentation from abdominal CT volumes based on level set and sparse shape
composition, Comput. Methods Programs Biomed. 195 (2020) 105533, doi:10.
features using local feature reconstruction and feature fusion to ex- 1016/j.cmpb.2020.105533.
tract the rich semantic details from the features, which has a cru- [10] P. Chea, J.C. Mandell, Current applications and future directions of deep learn-
cial role in semantic segmentation. We also exploited the potential ing in musculoskeletal radiology, Skeletal Radiol 49 (2020) 183–197, doi:10.
10 07/s0 0256- 019- 03284- z.
of SENet to adaptively recalibrate the fused features to character-
[11] M.I. Razzak, S. Naz, A. Zaib, Deep learning for medical image processing:
ize the prominent high-level detail of the object. The ASPP mod- Overview, challenges and the future, Lect. Notes Comput. Vis. Biomech. 26
ule is employed to extract the multiscale features at the bottle- (2018) 323–350, doi:10.1007/978- 3- 319- 65981- 7_12.
neck stage to represent rich contextual details of the object from [12] R. Yamashita, M. Nishio, R. Kinh, G. Do, K. Togashi, Convolutional neural net-
works: an overview and application in radiology, Insights Imaging. 9 (2018)
the low-level features. The modification in high-level and low-level 611–629. https://doi.org/10.1007/s13244-018-0639-9.
features uplifts the network learning capability, resulting in signif- [13] K. Suzuki, Overview of deep learning in medical imaging, Radiol. Phys. Technol.
icant performance gain in liver and tumor segmentation. We ex- 10 (2017) 257–273, doi:10.1007/s12194- 017- 0406- 5.
[14] D. Ueda, A. Shimazaki, Y. Miki, Technical and clinical overview of deep learning
perimented with the proposed model by varying the value of the in radiology, Jpn. J. Radiol. 37 (2019) 15–33, doi:10.1007/s11604- 018- 0795- 3.
dimensionality reduction factor (r) of the SENet and the effect of [15] J. Ker, L. Wang, J. Rao, T. Lim, Deep Learning Applications in Medical Image
the ASPP module on segmentation performance as well as com- Analysis, IEEE Access 6 (2017) 9375–9379, doi:10.1109/ACCESS.2017.2788044.
[16] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedi-
putational complexity. The experimental results on the LiTS and cal image segmentation, Lect. Notes Comput. Sci. (Including Subser. Lect. Notes
3DIRCADb dataset demonstrated that HFRU-Net uplifted the liver Artif. Intell. Lect. Notes Bioinformatics). 9351 (2015) 234–241, doi:10.1007/
and tumor segmentation performance and tumor burden estima- 978- 3- 319- 24574- 4_28.
[17] P. Bilic, P.F. Christa, E. Vorontsov, G. Chlebusr, H. Chenm, Q. Doum, C.W. Fum, X.
tion. Hanp, P.A. Hengm, J. Hesserq, S. Kadourye, T. Kopczyskiv, M. Leo, C. Lio, X. Lim,
In addition, we have confirmed model efficacy with the in- J. Lipkova, J. Lowengrubn, H. Meiner, J.H. Moltzr, C. Pale, M. Pirauda, X. Qim,
dependent LiTS challenge test dataset that indicates HFRU-Net J. Qil, M. Rempera, K. Rothq, A. Schenkr, A. Sekuboyinaa, P. Zhouk, C. Hulse-
meyera, M. Beetza, F. Ettlingera, F. Gruena, G. Kaissisb, F. Lohferb, R. Brarenb,
achieved comparable performance with the state-of-the-art meth-
J. Holchc, F. Hofmannc, W. Sommerc, V. Heinemannc, C. Jacobsd, G.E.H. Ma-
ods. The computational cost and design complexity are less than manid, B. Van Ginnekend, G. Chartrande, A. Tange, M. Drozdzale, S. Kadourye,
the outperforming models. Further, we employed minimum pre- A. Ben-Cohenf, E. Klangf, M.M. Amitaif, E. Konenf, H. Greenspanf, J. Moreaug,
processing steps for the data preparation in the training pipeline, A. Hostettlerg, L. Solerg, R. Vivantih, A. Szeskinh, N. Lev-Cohainh, J. Sosnah,
L. Joskowiczh, A. Kumarw, A. Korex, C. Wangy, D. Fengz, F. Liaa, G. Krishna-
and no post-processing technique is utilized to refine the results. murthix, J. Heab, J. Wuaa, J. Kimx, J. Zhouac, J. Maad, J. Liaa, K.K. Maninisae,
Therefore, HFRU-Net can be extended to other medical image seg- K.C. Kaluvax, L. Bix, M. Khenedx, M. Beliverae, Q. Linaa, X. Yangad, Y. Yuanaf,
mentation tasks with different radio imaging modalities. Y. Chenaa, Y. Liad, Y. Qius, Y. Wuad, B. Menzea, The liver tumor segmentation
benchmark (LiTS), ArXiv. (2019) 1–43.
[18] Y. Zhang, Z. He, C. Zhong, Y. Zhang, Z. Shi, Fully convolutional neural network
Declaration of Competing Interest with post-processing methods for automatic liver segmentation from CT, in:
Proc. - 2017 Chinese Autom. Congr. CAC 2017. 2017-Janua, 2017, pp. 3864–3869,
doi:10.1109/CAC.2017.8243454.
The authors have no conflicts of interest to disclose. (None De- [19] G. Chlebus, H. Meine, J.H. Moltz, A. Schenk, Neural Network-Based Automatic
clared) Liver Tumor Segmentation With Random Forest-Based Candidate Filtering,
ArXiv. (2017) 5–8. http://arxiv.org/abs/1706.00842.
[20] P.F. Christ, F. Ettlinger, F. Grün, M.E.A. Elshaer, J. Lipková, S. Schlecht, F. Ah-
Acknowledgment maddy, S. Tatavarty, M. Bickel, P. Bilic, M. Rempfler, F. Hofmann, M. D’Anastasi,
S.A. Ahmadi, G. Kaissis, J. Holch, W. Sommer, R. Braren, V. Heinemann, B.
Menze, Automatic liver and tumor segmentation of CT and MRI volumes using
The authors would like to extend sincere thanks to the Faculty cascaded fully convolutional neural networks, ArXiv. (2017) 1–20.
and Management of Vidya Pratishthan’s, Kamalnayan Bajaj Institute [21] X. Li, H. Chen, X. Qi, Q. Dou, C.W. Fu, P.A. Heng, H-DenseUNet: Hybrid Densely
Connected UNet for Liver and Tumor Segmentation from CT Volumes, IEEE
of Engineering and Technology, Baramati, and Vidya Pratishthan’s,
Trans. Med. Imaging. 37 (2018) 2663–2674, doi:10.1109/TMI.2018.2845918.
Institute of Information Technology, Baramati for providing compu- [22] Q. Jin, Z. Meng, C. Sun, L. Wei, R. Su, RA-UNet: A hybrid deep attention-aware
tational resources to accomplish the research work. network to extract liver and tumor in CT scans, Front. Bioeng. Biotechnol. 8
(2018), doi:10.3389/fbioe.2020.605132.
[23] H. Seo, C. Huang, M. Bassenne, R. Xiao, L. Xing, Modified U-Net (mU-Net) with
References Incorporation of Object-Dependent High Level Features for Improved Liver and
Liver-Tumor Segmentation in CT Images, IEEE Trans. Med. Imaging. 39 (2020)
[1] E. Trefts, M. Gannon, D.H. Wasserman, The liver, Curr. Biol. 27 (2017) R1147– 1316–1325, doi:10.1109/TMI.2019.2948320.
R1151, doi:10.1016/j.cub.2017.09.019. [24] L. Rundo, C. Han, Y. Nagano, J. Zhang, R. Hataya, C. Militello, A. Tangher-
[2] J.C. Ozougwu, Physiology of the liver, Int. J. Res. Pharm. Biosci. 4 (2017) 13–24. loni, M.S. Nobile, C. Ferretti, D. Besozzi, M.C. Gilardi, S. Vitabile, G. Mauri, H.
[3] A. Gotra, L. Sivakumaran, G. Chartrand, K.N. Vu, F. Vandenbroucke-Menu, Nakayama, P. Cazzaniga, USE-Net: incorporating Squeeze-and-Excitation blocks
C. Kauffmann, S. Kadoury, B. Gallix, J.A. de Guise, A. Tang, Liver segmentation: into U-Net for prostate zonal segmentation of multi-institutional MRI datasets,
indications, techniques and future directions, Insights Imaging. 8 (2017) 377– 2019.
392, doi:10.1007/s13244- 017- 0558- 1. [25] O. Oktay, J. Schlemper, L. Le Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S.
[4] F. Bray, J. Ferlay, I. Soerjomataram, R.L. Siegel, L.A. Torre, A. Jemal, Global can- Mcdonagh, N.Y. Hammerla, B. Kainz, B. Glocker, D. Rueckert, Attention U-Net:
cer statistics 2018: GLOBOCAN estimates of incidence and mortality world- Learning Where to Look for the Pancreas, n.d.
wide for 36 cancers in 185 countries, CA. Cancer J. Clin. 68 (2018) 394–424, [26] L. Teng, H. Li, S. Karim, DMCNN: A Deep Multiscale Convolutional Neural Net-
doi:10.3322/caac.21492. work Model for Medical Image Segmentation, J. Healthc. Eng. (2019) 2019,
[5] M. Moghbel, S. Mashohor, R. Mahmud, M.I. Bin Saripan, Review of doi:10.1155/2019/8597606.
liver segmentation and computer assisted detection/diagnosis methods in [27] T. Fan, G. Wang, Y. Li, H. Wang, Ma-net: A multi-scale attention network for
computed tomography, Artif. Intell. Rev. 50 (2018) 497–537, doi:10.1007/ liver and tumor segmentation, IEEE Access 8 (2020) 179656–179665, doi:10.
s10462- 017- 9550- x. 1109/ACCESS.2020.3025372.
19
[28] Z. Zhou, M.M.R. Siddiquee, N. Tajbakhsh, J. Liang, UNet++: Redesigning Skip TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Sys-
Connections to Exploit Multiscale Features in Image Segmentation, IEEE Trans. tems, n.d. www.tensorflow.org. (accessed May 16, 2021).
Med. Imaging. 39 (2020) 1856–1867, doi:10.1109/TMI.2019.2959609. [47] F. Chollet, et.al, Keras, (2015). https://github.com/fchollet/keras.
[29] J. Zhang, Y. Jin, J. Xu, X. Xu, Y. Zhang, MDU-Net: Multi-scale Densely Con- [48] T. Heimann, B. Van Ginneken, M.A. Styner, Y. Arzhaeva, V. Aurich, C. Bauer,
nected U-Net for biomedical image segmentation, (2018). http://arxiv.org/abs/ A. Beck, C. Becker, R. Beichel, G. Bekes, F. Bello, G. Binnig, H. Bischof, A. Bornik,
1812.00352. P.M.M. Cashman, Y. Chi, A. Córdova, B.M. Dawant, M. Fidrich, J.D. Furst, D. Fu-
[30] V.V.A. Steven, M. Singer, Marc Y. Fink, High-Resolution Encoder–Decoder Net- rukawa, L. Grenacher, J. Hornegger, D. Kainmüller, R.I. Kitney, H. Kobatake,
works for Low-Contrast Medical Image Segmentation, Physiol. Behav. 176 H. Lamecker, T. Lange, J. Lee, B. Lennon, R. Li, S. Li, H.P. Meinzer, G. Németh,
(2019) 139–148. D.S. Raicu, A.M. Rau, E.M. Van Rikxoort, M. Rousson, L. Ruskó, K.A. Saddi,
[31] N. Ibtehaz, M.S. Rahman, MultiResUNet: Rethinking the U-Net architecture for G. Schmidt, D. Seghers, A. Shimizu, P. Slagmolen, E. Sorantin, G. Soza, R. Su-
multimodal biomedical image segmentation, Neural Networks 121 (2020) 74– somboon, J.M. Waite, A. Wimmer, I. Wolf, Comparison and evaluation of meth-
87, doi:10.1016/j.neunet.2019.08.025. ods for liver segmentation from CT datasets, IEEE Trans. Med. Imaging. 28
[32] Z. Gu, J. Cheng, H. Fu, K. Zhou, H. Hao, Y. Zhao, T. Zhang, S. Gao, J. Liu, CE- (2009) 1251–1265, doi:10.1109/TMI.2009.2013851.
Net: Context Encoder Network for 2D Medical Image Segmentation, IEEE Trans. [49] J. Derrac, S. Garcia, D. Molina, F. Herrera, A practical tutorial on the use
Med. Imaging. 38 (2019) 2281–2292, doi:10.1109/TMI.2019.2903562. of nonparametric statistical tests as a methodology for comparing evolution-
[33] D.T. Kushnure, S.N. Talbar, MS-UNet: A multi-scale UNet with feature recali- ary and swarm intelligence algorithms, Swarm Evol. Comput. 1 (2011) 3–18,
bration approach for automatic liver and tumor segmentation in CT images, doi:10.1016/j.swevo.2011.02.002.
Comput. Med. Imaging Graph. 89 (2021) 101885, doi:10.1016/j.compmedimag. [50] F. Zabihollahy, J.A. White, E. Ukwatta, Convolutional neural network-based ap-
2021.101885. proach for segmentation of left ventricle myocardial scar from 3D late gadolin-
[34] H. Xia, M. Ma, H. Li, S. Song, MC-Net: multi-scale context-attention net- ium enhancement MR images, Med. Phys. 46 (2019) 1740–1751, doi:10.1002/
work for medical CT image segmentation, Appl. Intell. (2021), doi:10.1007/ mp.13436.
s10489- 021- 02506- z. [51] F. Isensee, P.F. Jäger, S.A.A. Kohl, J. Petersen, K.H. Maier-Hein, Automated Design
[35] J. Hu, L. Shen, G. Sun, Squeeze-and-Excitation Networks, n.d. http://image-net. of Deep Learning Methods for Biomedical Image Segmentation, (2019) 1–55.
org/challenges/LSVRC/2017/results (accessed May 16, 2021). https://doi.org/10.1038/s41592-020-01008-z.
[36] L.-C. Chen, G. Papandreou, S. Member, I. Kokkinos, K. Murphy, A.L. Yuille, [52] K.C. Kaluva, M. Khened, A. Kori, G. Krishnamurthi, 2D-Densely Connected Con-
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, volution Neural Networks for automatic Liver and Tumor Segmentation, X
Atrous Convolution, and Fully Connected CRFs, n.d. http://liangchiehchen.com/ (2018) 1–4. http://arxiv.org/abs/1802.02182.
projects/ (accessed May 16, 2021). [53] T. Liu, J. Liu, Y. Ma, J. He, J. Han, X. Ding, C.T. Chen, Spatial feature fusion convo-
[37] J. Hu, L. Shen, S. Albanie, G. Sun, E. Wu, Squeeze-and-Excitation Networks, in: lutional network for liver and liver tumor segmentation from CT images, Med.
Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2018, pp. 7132–7141, doi:10. Phys. 48 (2021) 264–272, doi:10.1002/mp.14585.
1109/CVPR.2018.00745. [54] L. Bi, J. Kim, A. Kumar, D. Feng, Automatic Liver Lesion Detection using Cas-
[38] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-Decoder with caded Deep Residual Networks, (2017). http://arxiv.org/abs/1704.02703.
Atrous Separable Convolution for Semantic Image Segmentation, n.d. https:// [55] J. Zhang, Y. Xie, P. Zhang, H. Chen, Y. Xia, C. Shen, Light-weight hybrid convo-
github.com/tensorflow/models/tree/master/ (accessed May 16, 2021). lutional network for liver tumor segmentation, IJCAI Int. Jt. Conf. Artif. Intell.
[39] J. Zhang, Y. Xie, P. Zhang, H. Chen, Y. Xia, C. Shen, Light-Weight Hybrid Convo- (2019) 4271–4277 2019-Augus, doi:10.24963/ijcai.2019/593.
lutional Network for Liver Tumor Segmentation, 2019. [56] Y. Tang, Y. Tang, Y. Zhu, J. Xiao, R.M. Summers, E2 Net: An Edge Enhanced
[40] J. Wang, P. Lv, H. Wang, C. Shi, SAR-U-Net : squeeze-and-excitation block and Network for Accurate Liver and Tumor Segmentation on CT Scans, Lect. Notes
atrous spatial pyramid pooling based residual U-Net for automatic liver CT seg- Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinfor-
mentation, (2021) 1–20. matics) (2020) 512–522 12264 LNCS, doi:10.1007/978- 3- 030- 59719- 1_50.
[41] Y. Yuan, Hierarchical Convolutional-Deconvolutional Neural Networks for Au- [57] C. Li, X. Wang, S. Eberl, M. Fulham, Y. Yin, J. Chen, D.D. Feng, A likelihood
tomatic Liver and Tumor Segmentation, i (2017) 3–6. http://arxiv.org/abs/1710. and local constraint level set model for liver tumor segmentation from CT vol-
04540. umes, IEEE Trans. Biomed. Eng. 60 (2013) 2967–2977, doi:10.1109/TBME.2013.
[42] A. Al-Kababji, F. Bensaali, Sarada, P. Dakua, A. Al-Kababji, F. Bensaali, S.P. 2267212.
Dakua, Automated liver tissues delineation based on machine learning tech- [58] M. Moghbel, S. Mashohor, R. Mahmud, M. Iqbal Bin Saripan, Automatic liver
niques: A survey, current trends and future orientations, 2021. segmentation on computed tomography using random walkers for treatment
[43] Dataset, 3DIRCADb, (n.d.). https://www.ircad.fr/research/3dircadb/. planning, EXCLI J 15 (2016) 500–517, doi:10.17179/excli2016-473.
[44] C.H. Sudre, W. Li, T. Vercauteren, S. Ourselin, M.Jorge Cardoso, Generalised dice [59] Q. Huang, H. Ding, X. Wang, G. Wang, Fully automatic liver segmentation in CT
overlap as a deep learning loss function for highly unbalanced segmentations, images using modified graph cuts and feature detection, Comput. Biol. Med. 95
Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes (2018) 198–208, doi:10.1016/j.compbiomed.2018.02.012.
Bioinformatics) (2017) 240–248 10553 LNCS, doi:10.1007/978- 3- 319- 67558- 9_ [60] S.T. Tran, C.H. Cheng, D.G. Liu, A Multiple Layer U-Net, Un-Net, for Liver and
28. Liver Tumor Segmentation in CT, IEEE Access 9 (2021) 3752–3764, doi:10.1109/
[45] P.Y. Simard, D. Steinkraus, J.C. Platt, Best Practices for Convolutional Neural ACCESS.2020.3047861.
Networks Applied to Visual Document Analysis, Microsoft Res. 3 (2003) 1–6. [61] A.E. Kavur, L.I. Kuncheva, M.A. Selver, Basic Ensembles of Vanilla-Style Deep
[46] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, Learning Models Improve Liver Segmentation From CT Images, (2020) 1–10.
A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, http://arxiv.org/abs/2001.09647.
M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, [62] L. Liu, F.X. Wu, Y.P. Wang, J. Wang, Multi-receptive-field CNN for semantic
R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, segmentation of medical images, IEEE J. Biomed. Heal. Informatics. 24 (2020)
I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. 3215–3225, doi:10.1109/JBHI.2020.3016306.
Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, G. Research,
20

DL 6

Uploaded by

Copyright:

Available Formats

DL 6

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DL 6

Uploaded by

Copyright:

Available Formats

Computer Methods and Programs in Biomedicine 213 (2022) 106501

Contents lists available at ScienceDirect

Computer Methods and Programs in Biomedicine

HFRU-Net: High-Level Feature Fusion and Recalibration UNet for

Fig. 2. Proposed HFRU-Net architecture.

Fig. 3. Squeeze-and-Excitation network (SENet) for feature recalibration.

Fig. 4. In the ASPP module, the atrous depthwise separable convo-

MaxPooling_1 2×2 Concatenate_4 256 × 256 × 128 Upsampling_4

MaxPooling_2 64 × 64 × 128 2×2 Concatenate_3 128 × 128 × 256 Upsampling_3

MaxPooling_3 32 × 32 × 256 2×2 Concatenate_2 64 × 64 × 512 Upsampling_2

MaxPooling_4 16 × 16 × 512 2×2 Concatenate_1 32 × 32 × 1024 Upsampling_1

ASPP Module 16 × 16 × 1024

Learning rate 1e−5

4.2. Data Preprocessing

Fig. 6. Preprocessing steps applied to the dataset.

Table 8 The liver and tumor segmentation is challenging due to com-

LiTS Dataset 3DIRCADb dataset

Network with ASSP and SENet r = 4 0.0345 0.474 0.0271 0.405

Tumor Segmentation Liver Segmentation

50 100 150 50 100 150

Methods ASD MSD RMSD Tumor Burden

LiTS UNet 0.891 0.855 0.145 0.075 9.277 105.555

LiTS UNet 0.492 0.402 0.598 0.342 2.460 15.313

Methods DPC DG JC VOE RVD ASD MSD

Christ, P. F, et al. [20] 0.943 0.107 -0.014 1.6 24

Christ, P. F, et al. [20] 0.56 - - - -

4.6. Discussion Furthermore, we demonstrated the use of the multiscale feature

You might also like