1. Introduction
In recent years, there has been a notable increase in the development of technologies applied to the health area, bringing several benefits to professionals responsible for giving diagnosis. One such technology is Computer-Aided Diagnosis (CAD) which is applied on detecting and diagnosing various kinds of clinical conditions by using different varieties of medical imaging. These systems aim to assist with medical decisions about treatment and prognosis and improve the patient’s quality of life [
1].
CAD systems often present a high computational cost, since many of these are based on the use of machine learning algorithms and digital image processing that can be computationally costly. An alternative to provide better performance in terms of execution time and power consumption is the implementation of these algorithms in a Field Programmable Gate Array (FPGA), a reconfigurable hardware consisting of several configurable logic blocks and programmable interconnects, which can be designed to implement a desired circuit.
According to [
2], the use of FPGAs provides the system with fast execution and low power consumption, allowing the development of real-time embedded systems. The use of FPGAs in the development of CAD systems is the object of several works in the literature [
3,
4,
5,
6,
7,
8].
The work of [
9] proposes an embedded system for recording real-time multimodal images applied to non-invasive tracking of skin cancer. The work presents the development of a hybrid architecture of hardware and software, implemented in a Xilinx Z-7010 SoC device. The hardware implements the detection and extraction of image characteristics in the visible and infrared spectrum, while the software estimates the geometric transformation that maps one image in the other and then applies the transformation to the frames of the infrared images.
The work proposed by [
10] presents the implementation of an on-chip Multilayer Perceptron (MLP) to ensure the safety of electronic devices used in the treatment of diabetes. These devices are usually insulin pumps, which can have its system invaded and send false insulin dosing commands. The implemented MLP architecture has two hidden layers, the first with seven neurons and the second with four and an output layer with only one output neuron. In the neurons of the hidden layers, the hyperbolic tangent activation function was used, while the sigmoid function was used in the neurons of the output layer.
Only one research group has been found in the literature that proposes a melanoma detection system using FPGA. This group has several works in the literature reporting the advances of this research of melanomas. In all works, the target FPGA was the Xilinx XC7Z020CLG484-1 from the Zynq-7000 ZC702 Evaluation Board. The implementations proposed by this research group were developed in C/C ++ using the UltraFast High-Level Synthesis (HLS) tool available in the Xilinx Vivado Design Suite, and a Support Vector Machine (SVM) is designed as an HLS Intellectual Property (IP).
The first paper presented by this group in the literature, in [
11], proposed the implementation of a hardware/software co-design, implementing in hardware only the scalar product present in the SVM classifier function and all other necessary calculations were performed in software. In this work, the test data are transmitted using a streamming interface with an Direct Memory Access (DMA) IP. Then, the hardware design was extended to implement the full function of the SVM classifier, as presented in [
12]. Another similar project was proposed in [
13], which used embedded block RAM (BRAM) interfaces to pass required data instead of using the stream interface with the DMA IP. The work presented in [
14] sought to simplify previously designed IPs to reduce hardware area, power and cost for project extension, as well as to improve rating performance. It proposed a reconfigurable hardware system for the implementation of an adaptive, flexible, and a scalable embedded cascade SVM classifier system.
The most recent article from this group is presented in [
15]. It proposes three SVM models generated from the available feature set training, using 356 instances with 27 features each. The first model was generated using the complete original data set, employing in this model 346 Support Vectors (SVs). Next, scaling and normalization techniques were applied to the original dataset, which generated a second model with 248 SVs and achieved higher classification accuracy. The third, smaller-scale model was implemented to be used as a case study for performance validation by running on a Zynq SoC, while the other two models were validated using only simulation results. This model consists of 61 SVs, generated using only part of the data set (144 instances) normalized in the training phase. Several versions of these models were implemented using different optimization techniques through the available optimization directives of the Vivado HLS tool. This article presents several results which are later presented for comparison with the architecture proposed here.
In this scenario, the objective of this work is the hardware implementation of a skin cancer detection system, using digital image processing techniques and a multilayer perceptron artificial neural network. The hardware developed for the DIP techniques is responsible for extracting the desired descriptors from the skin nevus image, while the hardware developed for the feedforward phase (inference step) of the MLP is in charge of performing the classification of the skin signal as melanoma or non-melanoma based on the descriptors extracted by the DIP, using the weights previously obtained by training the network in a software implementation. In addition, it is intended to achieve high speed performance and low energy consumption. Results regarding hardware resources occupation, runtime and power consumption are detailed and presented for an Intel Cyclone V FPGA SE 5CSEBA6U23I7. Intel Cyclone V FPGA SE 5CSEBA6U23I7 has 41,910 ALMs, 166,036 registers, 5570 M10K memory blocks, and 224 DSP blocks.
The paper is structured as follows:
Section 2 presents a skin cancer detection technique developed based on existing methods and its software implementation used to define the parameters of the neural network through the training phase.
Section 3 presents the details of the hardware architecture, describing the various modules and submodules used to implement the system. In
Section 4, the results of the proposed hardware validation and the implementation synthesis are presented. Finally,
Section 5 presents the final considerations about the manuscript.
2. Skin Cancer Detection Technique
With digital image processing, nine descriptors are extracted from a nevus image. These descriptors are then forwarded to the MLP, which is trained to classify the images into two distinct classes: melanoma and non-melanoma. After the network training, its validation is performed using a public database of skin cancer images that was previously diagnosed by specialists.
2.1. Technical Overview
Figure 1 represents the proposed detection structure, in which the classification of skin nevi is performed using a database of dermatoscopic images.
The
pixels image from the database
is expressed as
where
corresponds to a pixel of
b-bit pixel of the
k-th image from the database. A binary image,
, is used as a mask to extract the region of interest from the original image. The
image is defined as
where
corresponds to a pixel of the
k-th binary image from the database.
With the mask , the original image passes through a block referred as Character Extractor Module (CEM), which extracts from the image nine descriptors expressed as . This set of descriptors is then processed by an MLP-BP artificial neural network that contains nine inputs, three layers of neurons with two hidden layers, and the output layer with two outputs which are expressed as .
A database containing 200 dermatoscopic images provided by the ADDI Project [
16] was used. The images have a resolution of 768 × 560 pixels and magnification of 20×. In this work, the resolution was halved. Masks for each image containing the delimitation of the Nevus region are also available. The images were diagnosed by specialists and were divided into 160 non-melanoma and 40 melanomas.
Figure 2b shows the binary image
correspondent to the sample shown in
Figure 2a.
2.2. Character Extractor Module (CEM)
In the literature, there are several attempts to simplify the dermoscopic approach to diagnose benign melanocytic lesions and melanomas, as presented in [
17], such as the ABCD rule, the Menzies method, and the 7-point checklist.
The three approaches presented are for the recognition of melanoma based on dermatoscopic images, but there is also a rule for detection with the naked eye, which is called ABCDE. This rule is very similar to the ABCD rule and in it each letter indicates a characteristic of the signal to be analyzed, with A referring to asymmetry, B to edge, C to color, D to diameter and E to evolution [
18,
19,
20].
Thus, based on these approaches, the work presented here defined a set of nine descriptors with mathematical representation: symmetry in x, symmetry in y, diameter, variance and mean in the R channel, variance and mean in the channel G and variance and mean in the channel B. These descriptors were represented by the variables, , respectively. Thus, for a given image , there is a vector of descriptors.
2.2.1. Symmetry Calculation
The symmetry in
x and
y are represented by the descriptors
and
. The calculation of these descriptors for a given image
k are expressed as
and
where
and
are the center of mass values from the binary image,
. The center of mass is calculated by initially extracting the boundary points of the binary image,
, using an OpenCV library function,
findContours() and later, using the result of this function as input to another OpenCV function,
moments(), which returns the center of mass.
Figure 3 shows the image divided in four quadrants with intersection of the axes in the center of mass.
2.2.2. Diameter Calculation
The second descriptor is the diameter, represented by
. For the calculation of this descriptor, an OpenCV library function
minEnclosingCircle() was used, which locates a circle of minimum area from the set of 2D points provided, which in this case are the points that form the contour of the binarized image. This function outputs the radius of the circle.
Figure 4 shows the result of obtaining the diameter in the binarized image shown in
Figure 2b.
2.2.3. Calculation of Mean and Variance
The last descriptors are related to the color variation associated with the nevus and can be found through the mean and the variance in the RGB channels of the original image. The mean and variance of the R channel are represented by the descriptors
and
, for channel G, the descriptors are
and
and for channel B the descriptors are
e
. The calculation of each descriptor can be expressed as
and
in which
L is the number of pixels equal to 1 in the binarized matrix,
, and
is the pixel value
in channel
R. Following the same idea, the value of the descriptors for the other channels can be expressed as
and
2.3. MLP
In order to solve the melanoma classification, the work proposed here uses an MLP-BP with nine inputs (the descriptors), two hidden layers, and two outputs. The proposed network architecture contains 10 neurons in the first hidden layer, represented by
, and 24 neurons in the second, represented by
. The activation function used in all the neurons of the network was the sigmoid function.
Figure 5 shows the described network structure.
The outputs of the network, , classify the image as melanoma or non-melanoma, being for melanoma and a non-melanoma. When , the classified signal is non-melanoma and when indicates that the nevus is melanoma.
All of the image descriptors are normalized between 0 and 1 in order to to improve the convergence of the network. The normalization of each descriptor was performed by dividing all elements of it by the highest value within each descriptor, taking into account 200 images of the database.
For the training and validation of the network, all the images made available by the database PH
Database were used. During the network training phase, 170 randomly selected images from the bank were used, divided into 138 non-melanomas and 32 melanomas. The MLP converged into a
error, as shown in
Figure 6, which illustrates the network mean squared error by the number of epochs. During training, the neural network weights are adjusted. The final network model employs the weights wherewith they obtained the lowest error.
After defining the final model weights in the training phase, the model validation is performed using 30 images, which were divided into 22 non-melanomas and 8 melanomas. According to the result of the validation data, only three errors were obtained, one false negative, and two false positives.
Figure 7 illustrates the validation results in the confusion matrix. The actual classes are arranged in rows while the predicted classes are arranged in columns. The correct classified nevis are represented on the main diagonal of the matrix and the incorrect on the antidiagonal. The classes are represented by the acronyms M and NM, which respectively indicate melanoma and non-melanoma.
In order to evaluate the performance of the proposed technique, the work proposed here presents three measures of common use in several works with similar classification problems [
21,
22,
23,
24], which are: accuracy, specificity, and sensitivity [
25]. Accuracy is the model’s ability to correctly classify cases of melanoma and non-melanoma, being the number of correct classifications divided by the number of all data classified. Occurring 27 correct diagnosis in 30 cases, the technique obtained an accuracy of
. Specificity is the proportion of non-melanoma correctly identified by classifier, so with 20 non-melanomas correctly recognized in 22 non-melanoma cases, a specificity of
was obtained. Finally, the sensitivity is the proportion of melanomas that are correctly identified by classifier, with seven melanomas correctly recognized in eight cases of melanoma, the sensitivity was
.
Table 1 shows results obtained by the technique proposed here together with the ones found in the literature.
The equivalent hardware implementation of the proposed project to be described in the next section targeted designing an embedded system with the same classification results previously obtained but optimizing the performance and energy consumption.
3. Design Description
The hardware architecture was developed using fixed point number representation, with the values in the integer part ranging from 0 to 35 bits and in the fractional part ranging from 0 to 15 bits. The system’s inputs are represented in fixed point by 8 bits in the integer part and 0 in the fractional part, since the value of the RGB channels are unsigned integers ranging from 0 to 255. The image descriptors and the system outputs are represented with 0 bits in the integer part and 10 bits in the fractional part because they are unsigned values between 0 and 1.
Figure 8 presents a general hardware architecture. This figure shows the two main modules of the system, the module of Digital Image Processing techniques and the Artificial Neural Network. The inputs of this architecture are the pixels of the image,
V, to be classified and the image mask,
P, which is provided by the database used.
The input refers to the intensity of the pixel in the channel R of the image V in the i-th line and j-th column, the inputs and follow the same idea, referring respectively to the intensity of the pixel in channel G and B. The input is the pixel value of the binarized image P in the i-th row and j-th column, in which the pixels of the region of interest of the nevus are represented by 1 and the pixels of the background by 0. The output of the DIP module are the nine descriptors extracted from the nevus image, represented by , , …, . These descriptors are the inputs of the ANN module, responsible for performing the image classification. The outputs of this module indicate the result of the classification, and .
3.1. Digital Image Processing Module (DPIM)
The Digital Image Processing Module (DPIM) aims to perform the necessary operations on the pixels of the input image, V, to obtain the descriptors, , , …, . In this module, the technique of Stream Processing was used, requiring two image scans for the extraction of all the descriptors, since there are calculations that require values obtained only at the end of the first complete scan of the image. The image input was performed pixel by pixel, starting with the first line from left to right. This module has five main submodules, which are intended for: the application of the mask, calculation of symmetry, calculation of the diameter, calculation of the mean, and calculation of the variance.
Figure 9 shows the general architecture of DPIM, with all its submodules. In this figure,
represents a constant with a value equal to the total number of pixels of the analyzed image of the nevus. Some submodules have as input the variables represented in the figure as
,
and
. The variable
represents the number of pixels in the region of interest, this value is obtained through a counter,
Counter2, which is enabled by the pixels of the binary image equal to 1. The value of this counter is stored in a register after the first scan of the image. The boolean variable
indicates the end of the first scan of the image and the boolean variable
indicates the end of the second scan of the image, when they assume a value of 1. The counter block,
Counter1, has a maximum value equal to
, and the counter block,
Counter2, has a maximum value equal to
.
3.1.1. Mask Application Submodule
The Mask Application Submodule is responsible for applying the binary mask, P, on each channel of the original image V. The mask used is provided by the database.
Figure 10 shows the architecture of the Mask Application Submodule. The application of the mask on each channel is performed using an AND logic gate with eight bits in the integer part. The pixels of the binarized image with a value of 1 are converted to 255 by a multiplexer (MUX). Thus, after the MUX, the binarized image inputs equal to zero are represented by the binary value
and equal to one by
. Thereby, after the AND operation, the pixels of the image within the region of interest have their value equal to the value of the original image, while the others representing the background have their value equal to zero. The RM, GM, and BM outputs are the value of the pixel intensity in each channel after the mask is applied.
3.1.2. Symmetry Calculation Submodule
The Symmetry Calculation Submodule is responsible for calculating the values of the descriptors
and
. For this, it is necessary to first determine the position of the center of mass of the nevus,
and
. The calculation of the center of mass in software was performed using functions of the OpenCV library. In the hardware implementation, this calculation was performed using the mask image,
P, being expressed as
and
where
indicates the
x coordinate of the center of mass and
the
y coordinate.
Based on this mathematical representation, the calculation of the center of mass was implemented as shown in
Figure 11. In this figure,
represents a constant with a value equal to the width of the image. The counter indicates the number of pixels of the already read image. Dividing its value by the width of the image produces the value of the position of the input pixel. The variable
lin indicates the line and the variable
col the column where the pixel is positioned. Thereafter, there are two multipliers that perform the multiplication operations present in Equations (
11) and (
12), followed by the implementation of an accumulator with an adder block and a delayed feedback, which performs the operation of the double summation present in the equations. After the image is completely read, the value of each accumulator is divided by the number of pixels of the region of interest, the results obtained are
and
. These values are stored in registers at the end.
After determining the center of mass, it is possible to calculate the symmetry. The proposed implementation for this calculation was based on Equations (
3) and (
4), and the designed architecture is shown in
Figure 12.
Initially, there are four conditional blocks, each block equivalent to a double summation. Conditional Blocks 1 and 2 implement the first and second double summation of Equation (
3), respectively, with the
i of the equation represented by
lin. Conditional Blocks 3 and 4 implement the first and second double summation of Equation (
4), respectively, with the
j of the equation represented by
col. With the pixel of the binarized image equal to 1 and the logical expression of the conditional block being true, there is the increment of 1 to the counter,
counter, presented after each conditional block. At the end of the second reading of the image, the difference between the values of the first and second counter in module is the result of Equation (
3) and the modulus of the difference between the values of the third and fourth counter is the result of Equation (
4). These values are further multiplied by specific gains,
and
, which normalize the value of the descriptors between 0 and 1. The result obtained after this are the descriptors
and
, which are stored in a register.
3.1.3. Diameter Calculation Submodule
The Diameter Calculation Submodule has the function of calculating the value of the descriptor
. Initially, it is necessary find the most extreme points,
,
,
, and
, of the region of interest, based on the binarized image. After that, the distance,
and
, between these pixels should be obtained. The largest distance is considered the diameter of the nevus.
Figure 13 illustrates the dots and the distance between ends of
Figure 2a.
In
Figure 14, the hardware architecture used to locate the extremity pixels in the region of interest is presented. Initially, there is a divisor block that receives its input from a counter and the constant
in order to calculate the row and column of the input pixel. The first output of the architecture,
, indicates the line of the first pixel of the detected nevus which is the line in which
first assumes the value of 1. The second output,
, indicates the last line containing a pixel from the nevus previously detected. The value of
is the last value stored in the register which is enabled by
. The third output,
, indicates the first column to display a pixel from the nevus area, and this value is found by using a conditional block and a multiplexer.
The Conditional Block 5 (CB5) is responsible for comparing the column value of each pixel in the region of interest, input b, to select the lowest column value. During the image scan, the lowest value found is saved and applied to input c of CB5, for comparison with the other column values to be analyzed. The output of CB5 is true when the column value of the current pixel, input b, is less than the last found value, input c. The output of this block feeds the input selection of a multiplexer which for an input of 0 the value stored in the register continues the same and for an input of 1 this same value is updated to the value of the column of the current pixel . On top of that, another multiplexer is used so that, in the beginning of the calculation of , the value of the c input is not equal to zero, but equal to the largest column value of the image.
Finally, the last output, , indicates the last column in the region of interest. A conditional block and a multiplexer are also used to determine this pixel. Its operation is similar to the one previously presented for output ; however, it is unnecessary to use a second multiplexer since there is no problem if the initial value c is equal to zero. At the end of the first image scan, the selected values – are stored in a register to, then, be forwarded to the diameter calculation architecture.
Having found the extremity points of the nevus, it is then possible to calculate the diameter by the greatest distance among these points. The architecture implemented for this calculation is displayed in
Figure 15. This module starts by calculating the difference between
and
, which gives a distance on the
y-axis and the difference between
and
, which gives a distance on the
x-axis. The result of these differences is compared and through a MUX the distance with greater value is selected, which is the approximate diameter of the nevus. At the end, the value found is multiplied by a gain,
, which normalizes the value of the descriptor between 0 and 1. The result obtained after this is the descriptor
, which is stored in a register.
3.1.4. Mean Calculation Submodule
The Mean Calculation Submodule is responsible for calculating the value of the descriptors
,
, and
. This submodule calculates the average of the intensities of the pixels in the region of interest, using the image obtained after applying the mask. This implementation is based on the Equations (
5), (
7) and (
9).
Figure 16 presents the proposed implementation of this submodule, which applies the same processing in all of the channels of the image. This architecture uses as inputs the outputs from the Mask Application Submodule,
,
, and
, represented in the figure by CH. Initially, there is an accumulator implemented by an addition block and a delay, equivalent to the double summation of Equations (
5), (
7) and (
9). This accumulator sums for each channel all the values of the pixels in the region of interest and the final value of this accumulator is divided by the number of pixels from the nevus region. The result of each division,
,
, and
, represented in the figure by
, is multiplied by a specific gain
, where i = [3, 5, 7]. The result obtained after this operation is the descriptor
, where i = [3, 5, 7], which are then stored into its respective registers.
3.1.5. Variance Calculation Submodule
The Variance Calculation Submodule calculates the value of the descriptors
,
, and
, during the second image scan. This submodule calculates the variance of the pixels values from the region of interest using the outputs of the Mask Application Submodule and the Average Calculation Submodule. This implementation is based on the Equations (
6), (
8) and (
10).
In
Figure 17, the architecture of the Variance Calculation Submodule is presented, which is also applied to each channel of the image. Initially, there is a block that calculates the difference between the channel average value and the pixel value from the Mask Application Submodule. This subtraction block is enabled when the pixel value in the binarized image
is equal to 1. In the following, there is a multiplier block with its two inputs fed by the output of the previous block; this is equivalent to a squared power operation as presented in Equations (
6), (
8) and (
10). The output of this multiplier block is the input of an adder block with feedback, i.e., an accumulator. After the second image reading, the accumulator value is then divided by the number of pixels of the nevus. The result of each division is multiplied by a specific gain,
, where i = [4, 6, 8]. The results from this operation are the descriptors
, where i = [4, 6, 8], which are, finally, stored into its respective registers.
3.2. Artificial Neural Network Module (ANNM)
The Artificial Neural Network Module (ANNM) aims to perform the image classification as melanoma or non-melanoma through an MLP neural network. This module was implemented with a full-parallel architecture. The general architecture of the ANNM is presented in
Figure 5 and its entries are the nine descriptors extracted in the DPIM. All the neurons in the network have the same architecture, which is shown in
Figure 18. The hardware was developed only for the feedforward phase of MLP, adopting the weights defined in the training phase of the model in the software implementation, as described in Sub
Section 2.3.
This architecture has two submodules: the transfer function and the activation function. The neuron inputs are represented by , , …, , where m is the number of inputs of the neuron specified. The represents the bias and it is set as a constant with a fixed value equal to . The output from the transfer function to the activation function is represented by and the neuron output is named .
3.2.1. Transfer Function Submodule
The Transfer Function Submodule is responsible for weighing the input of the neuron by their respective weights and summing these results, providing the output at the end.
The architecture of this submodule is presented in
Figure 19. The weighting of each neuron input is performed through a gain block configured with the value of the respective weight associated with the neuron input. The weights are represented by
,
,
, …,
, where the first subscript term,
k, indicates the neuron index and the second term indentifies the input to which the weight is associated. All the weights are fixed values represented as signed fractional numbers with 5 bits in the integer part and 10 bits in the fractional part. After the weighting of the inputs by their respective weights, all values obtained are summed, resulting in output
.
3.2.2. Activation Function Submodule
Finally, the Activation Function Submodule is responsible for calculating the neuron output, , based on the value provided by the Transfer Function Submodule.
The architecture of this submodule is presented in
Figure 20. The approximation of the Sigmoid function was performed using the PLAN [
26] approximation method, which has the mathematical representation presented in Equation (
13). Hence, the combinational blocks present in this architecture represent the conditions observed in this equation. Apart from the first conditional block from the architecture, the output of all other four conditional blocks are forwarded to the bus builder block, whose output selects in the MUX which equation is used to approximate the sigmoid value.
Some intervals of the approximated sigmoid are implemented by an adder which sums the value of a signal gain multiplied by the input of the sigmoid function to a constant from the equation expressed as
The first conditional block which refers to the last condition of Equation (
13) analyzes whether the input value is less than zero. Its output is connected to the selective input of the last MUX block, which outputs
if the
is less than zero or
b if it is greater than zero.
5. Conclusions
This work implemented a detection system of skin cancer based on reconfigurable hardware using FPGA based on artificial neural networks and digital image processing techniques. The Hardware architecture was developed using fixed-point representation. When comparing the hardware implementation to a software implementation using float-point precision, the same values of accuracy, sensitivity, and specificity were obtained. This means that, even with a reduced numerical precision, it was possible to maintain the same statistical result while saving enough hardware resources for additional logic to be implemented if needed in the Intel Cyclone V SE 5CSEBA6U23I7 FPGA.
The implementation of the DPI and MLP neural network techniques in FPGA showed better results than the respective implementations in the ARM processor, achieving better runtime and lower power consumption. The execution time of the complete hardware system was approximately faster than the equivalent software implementation. The FPGA implementation required about less power than the processor. Compared to other similar works in the literature, the implementation proposed here presented achieved a runtime up to faster, low hardware resource utilization, power consumption up to lower, and an accuracy value better than a few models and in the same range of precision of other models.
For future work, the authors from the presented implementation plan to keep working in the classification phase in order to improve the overall classification accuracy of the system and perform new tests with larger datasets. In addition to that, it is possible to design a sub-module in the Digital Image Processing Module to automatically generate the mask, which would significantly increase parallelism in the implementation.