Abstract
Breast MRI interpretation requires that radiologists examine several images, depending on the acquisition protocol that is managed in the health institution, a very subjective and time-consuming process that reports large variability, which affects the final diagnosis and prognosis of the patient. In this paper, we present a computational method for classifying lesions detected in breast MRI studies, which aims to reduce physician subjectivity. The proposed approach take advantage of the ability of the Multiple Kernel Learning (MKL) strategy for optimally fusing the features extracted from the different image sequences that compose a breast MRI Study, which describe the grey level distribution of the original image and its saliency map, computed using the Graph-based Visual Saliency (GBVS) algorithm. Breast lesions were classified as positive and negative findings with an accuracy of \(85.5\%\) and \(84.8\%\) when nine and five sequences were used, respectively.
Supported by Colciencias, Instituto Tecnológico Metropolitano, and Instituto de Alta Tecnología Médica. Project RC740-2017.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Magnetic resonance imaging (MRI) has proven to be the most sensitive method for breast cancer detection, it has allowed the location of the smaller tumors even in dense breast tissues, which are usually not completely visible in other image modalities such as mammography and ultrasound [6]. However, breast MRI presents a high cost due to the high time it takes to acquire and to interpret the different sequences that make up this study [14], since a typical MRI study consists of around 2000–2500 images. Additionally, a varied variability has been reported, especially, in novice radiologists, when differentiating between positive (probably cancer) and negative or benign findings. This variability can seriously affect the patient outcome because a false positive entails an unnecessary biopsy but a false negative prevents early detection of the tumor. Thus, computer-assisted interpretation can potentially reduce the radiologist workload by automating some of the diagnostic tasks, such as lesion detection or classification.
Multimodality involves the simultaneous using of multiple information sources to solve one specific problem [10], for this reason, the lesion detection in breast MRI studies can be taken as a multimodal task, understanding that each MRI sequence is used as one independent information source when the radiologists determine a diagnosis. Recently, a novel method called Multiple Kernel Learning (MKL) [12] has increased the interest in incorporating information from multiples sources over machine learning techniques, by the association of a similarity measure (Kernel) to each information source before its inclusion in the learning task; thus, this strategy allows to take the maximum advantage of the information provided by each source and generates models and results that are easy to interpret [4]. Although MKL has been used over multiple machine learning techniques, it has shown to be special powered when it is implemented over the Support Vector Machines (SVM) for classification tasks [12, 20].
In this paper, MKL is used for developing a classifier able to distinguish between positive and negative findings in regions of interest (ROI) from a MRI study with several image sequences. As it is expected, the overall process is composed of two main stages: feature extraction and classification model. In the feature extraction stage, we decided to explore the visual salience analysis since has recently aroused interest in the research area related to automatic identification of ROI in MRI [5, 11]. In this analysis, a typical salient region is defined as rare in an image and with high discriminative information, which could be associated to diagnostic findings in medical images [18]. In [16] was tested three popular computational models of salience (Itti-Koch, GBVS and Spectral Residual) to detect abnormalities in chest radiography images and in color retina images, in which, the GBVS [13] presented the best performance for radiography images, likewise, it has proven to be one of the salience models of better prediction performance in eye fixation. For this reason, the GBVS model was herein used to obtain a salient image per sequence, from which, first-order statistical measures were computed, obtaining as many spaces of features as sequences there are. Once the feature extraction process is completed, a MKL model combines the information from the different sequences described by those feature spaces.
2 Materials and Methods
In this section, the techniques implemented for feature generation and classification are presented, detailed specifications of the methods used for generating descriptive features from the MRI sequences are provided for establishing a well understanding of their implementation in the classification process, on which one optimization technique was used with the purpose of obtaining confident and accurate results.
The proposed method for the classification of breast lesions, depicted in Fig. 1, is composed of two main phases. The first one consists of the manipulation of breast MRI to extract the ROI corresponding to the findings selected by the specialists (Radiologists), with the aim of characterizing them by a perceptual relevance model (visual attention) and first-order statistical measures. In the second one, the model of MKL is defined with an SVM, whose aim is identifying between positive (probably malignant) and negative (probably benignant) regions from the feature spaces generated by the first phase.
2.1 Dataset
The dataset of breast MRI studies used in this work was retrospectively obtained from Instituto de Alta Tecnología Médica (IATM) of Medellín, Colombia. This dataset is composed of 189 regions containing 152 breast lesions, 2 undefined regions, and 35 normal tissues, which where extracted from 92 fully anonymized studies. Each study is made up by 16 image sequences (Axial T1, Axial T2, Stir Coronal, ADC maps, Axial Diffusion B800, 6 Axial Dynamic Contrast Enhanced images, and 5 subtracted images). The ROI were marked in the most visible sequence by two experimented radiologists, without considering any clinical data. This marking was used to triangulate the position of the same region in the others sequences with the help of Horos software (https://horosproject.org/) for enclosing these in a rectangular area. Correspondingly, the regions were marked in all the sequences of the 92 studies. In the course of this process, the information generated by the radiologists of each ROI was stored, i.e. the BI-RADS category, the type of finding, the sequence in which the finding was initially pointed out, among others. Overall, there were 3024 annotated regions, 560 regions of normal tissue, 36 undefined regions, and 2432 lesions that were classified as probably malignant (1424) or probably benign (1008).
2.2 Extraction of Regions
The ROI extraction process was based on the generation of a CSV (comma-separated values) file by each sequence with the help of Horos, in which spatial information of each marked ROI was stored, such as the coordinates of its four corners, the number of image slice on the volume, the area of the region in mm, among others. The spatial coordinates in terms of pixels and the slice number were used to locate the ROI in the other unmarked sequences. Due differences between width and height of most regions, the rectangular area was regularized to a completely square region by taking the highest value between the height and the width as the new length of the square. A range was established according to the normal distribution of the width values of the sizes of all the regions, as \(\mu \pm 2\sigma \), where \(\mu \) is the mean and \(\sigma \) is the standard deviation. This range was defined to include a little more than 80% of the total regions, to which we add the value of the standard deviation on each side of the already square region, to extend these areas and include a part of the tissue surrounding the finding. Finally, when we have the size of all regions defined, the crop is performed on the image corresponding to the slice of the volume.
2.3 Saliency Detection
The GBVS model was used to determine the saliency level of each pixel in the ROI. The purpose of this task is to emulate an important aspect of the interpretation process, in which, the hypo or hyper intensities of the lesions, with respect to the surrounding tissue, are considered as relevant to differentiate between positive or negative findings [22]. Thus, the computation of the GBVS salient map allows objectively to measure the findings made by radiologists when is considering their perceptual process.
GBVS Model. This model initially works with extracted feature maps at multiple spatial scales such as intensity (I), color (C), and orientation (O) or movement (M). Then, a Gaussian pyramid transform of scale space is derived from each feature, and a completely connected graph is generated over all the grid locations of each feature map, where the weights of two nodes are assigned proportionally to the similarity of the values of the features and their spatial distance. Further, the dissimilarity or difference between two positions \( \left( i, j \right) \) and \( \left( p, q \right) \) in the feature map, with their respective value \( M \left( i, j \right) \) and \( M \left( p, q \right) \), is defined as:
The directed edge from node \( \left( i, j \right) \) to the node \( \left( p, q \right) \) is then assigned a weight proportional to its dissimilarity and its distance in the lattice M [7].
The resulting graphs are treated as Markov chains to normalize the weights of the outbound edges of each node in one and defining an equivalence relation betwee benignantn nodes and states, as well as between the edge weights and the transition probabilities. Its equilibrium distribution is adopted as the activation and saliency maps, considering that in this distributions to the nodes that are very different from the surrounding nodes will be assigned large values. Finally, the activation maps are normalized to emphasize the conspicuous details and then combined into a single salient general map [7].
After an experimental process, we identified that the features maps of I, O and M favored the adjustment of levels of saliency with the regions marked by the radiologists. It is important to consider that on MRI images is not possible to generate the color feature map, therefore, we use the M map, since it allows to predict the fixation of the human eyes, as expressed in [15]. An example of the feature maps that make up the resulting saliency map (SaM) is shown in Fig. 2.
2.4 Feature Extraction
The characterization process was carried out with the first-order statistical measures, because these are basic features that allow us to describe in a simple way the visual information of regions that have greater visibility both on the saliency maps generated by the GBVS model and on the original extracted regions. The first-order statistical measures are defined as:
where, P(I) is the histogram, \(N_g\) is the number of possible gray levels, \(m_{1}\) is the mean and the central moments given by \(\mu _{k}\). With \(\mu _{2}\) representing the variance, which is the most common central moment, and indicates the variability of the data with respect to the mean, \(\mu _{3}\) is the asymmetry or (skewness) that allows establishes the degree of symmetry of the histogram with respect to the average, and the fourth moment \(\mu _{4}\) is the Kurtosis, which indicates the degree of concentration that the data have around the mean in the histogram [1].
2.5 Classification Method
Support Vector Machines are categorized as wide margin classifiers proposed for binary classification problems [9], for this reason are ideal to be applied in the breast cancer diagnosis when only want to differentiate between positive and negative findings. Given a dataset with N training samples \(\{(x_i,y_i)\}_{i=1}^N\) where \(x_i\) is a D-dimensional input vector and \(y\in \{-1,+1\}\) is the labels vector of dimension N, the SVM finds the discriminative line with maximal margin M that better separates the samples in the feature space.
The classification function of the SVM is given by \(f(x) = \langle w,x \rangle + b\), where w is the weights vector representing the coefficients of each sample \(x_i\), b is the hyperplane separation bias term and the \(\langle \cdot , \cdot \rangle \) operator represents the dot product between two vectors. The primal optimization problem of the SVM is given by \(w^* = \underset{w}{\min } \,\,\,\, \frac{1}{2} \Vert w\Vert _2^2 + C \sum _{i=1}^{N}{\xi _i}\) and must fulfill that \(y_i(\langle w,x \rangle + b) \ge 1-\xi _i\), where C is the regularization parameter and \(\xi \) is the slack variables vector that create the soft margin representation of the SVM. This quadratic optimization problem with restrictions is solved through Lagrange Multipliers obtaining the expression in Eq. (6), namely dual function of the SVM.
where, \(\alpha \) is the vector of dual variables and the new classification function is given by \(f(x) = \sum _{i=1}^{N}{\alpha _i y_i \langle x_i, x \rangle + b }\).
The term \(\langle x_i, x_j \rangle \) that appears in the dual function of the SVM (Eq. (6)) is the Kernel function and is expressed as \(K(x_i, x_j)\), where \(K:\mathbb {R}^D \times \mathbb {R}^D \longrightarrow \mathbb {R}\). The Kernel function is defined as a non-linear similarity measure. The Radial Basis Function (RBF) or better known as Gaussian Kernel was used in this work and is presented as \(K(x_i, x_j) = \exp \left( \frac{-\Vert x_i-x_j\Vert ^2_2}{\sigma ^2}\right) , \quad \sigma > 0\).
MKL methodology allows to use multiple linear or non linear combinations of Kernels instead of one single Kernel [12]. MKL proposes the use of one combination function composed by P independent feature groups that can be provided by different sources, even varying the acquisition and composition nature or the number of features in each representation. In this work, each MRI sequence can be associated whit an individual Kernel for explode its potential, thus, the combination function is defined as in Eq. (7).
where \(\eta _m\) represents the weight assigned to each Kernel function \(K_m\).
It is possible to apply a penalization to each Kernel weight, The most common penalization methods used in this type of problems are the \(\ell _1\)-norm and \(\ell _2\)-norm [12]. An efficient optimization strategy was proposed for update the kernel weights when arbitrary \(\ell _p\)-norms with \(p\ge 1\) are applied [17, 23], this optimization method solves an SVM in each iteration and update the kernel weights using the Eq. (8).
where \(\Vert w_m\Vert ^2_2 = \eta ^2_{m} \sum _{i=1}^{N} \sum _{j=1}^{N}{\alpha _i \alpha _j y_i y_j K_m \left( x_i^m,x_j^m \right) }\) from the dual function of the SVM.
2.6 Parameters Optimization
Optimization algorithms are fundamental in machine learning tasks because they always seek that the results obtained by tasks such as classification, can be improved respect to objective evaluation measures automatically by the selection of the parameters that determine the performance of the algorithms [21]. In this work, were optimized the parameters that determine the behavior of the Kernels associated with the information sources (MRI sequences), i.e. the different parameters \(\sigma \) that determine the bandwidth of the Gaussian Kernels and the regularization parameter C of the SVM. For doing so, we use the Particle Swarm Optimization (PSO) algorithm, which is a metaheuristic algorithm that uses cooperative and stochastic methods to find the optimum working point of the function to be optimized, in this case, the performance measure of the SVM. In addition, the PSO algorithm is generally able to find a global optimum, being less susceptible than other algorithms to fall in a local optima [8].
3 Experiments and Results
After applying the feature extraction process described in Sect. 2.4, two different dataset configurations were used to evaluate the performance of the proposed method, the first configuration was determined by using 9 MRI sequences corresponding to T1, T2, ADC, Diffusion and the subtractions from 1 to 5, this dataset configuration intends to take advantage of all the available information sources. DCE images were not used due that substracted images correspond to the DCE post-processed. In the second dataset only five MRI sequences corresponding to T1, T2, ADC, Diffusion and the second subtraction, were used. This dataset configuration was determinated to reduce the number of information sources used to solve the classification problem, in a similar way to abbreviated protocols proposed in the clinical practice [22]. Additionally, for each dataset configuration, three tests were performed, one of them using the first-order measures from the original regions, another one using the first-order measures from the saliency maps and the last one using the concatenated measures from original regions and saliency maps. During all the tests, a K-Folds cross-validation technique was applied over the classification algorithm with \(K = 10\), aiming to avoid biased or overfitted results.
The classification task was developed by SVM, one Gaussian Kernel was assigned to each information source and all the corresponding Kernels were incorporated to the SVM using the MKL method, both with the \(\ell _1\)-norm as with the \(\ell _2\)-norm to take advantage of each independent information source in a grouped way. The PSO algorithm was used to auto-tuning the Kernel parameters (\(\sigma \)) and the regularization parameter of the SVM (C), 40 particles were used and the algorithm was carried out until the convergence in each probe, around of 50 iterations were taken by the algorithm to converge at each test.
The results obtained when the method was applied over the dataset configuration conformed by 9 information sources are presented in Table 1, five performance measures were computed aiming to evaluate in detail the method performance. In this case the penalization by \(\ell _1\)-norm presents the better performance when only the features from saliency maps are used and the penalization by \(\ell _2\)-norm when the saliency maps features are concatenated with the original region features. Table 2 shows the results obtained over the dataset configuration that implemented five information sources, in this case a clear outperforming of the method is exhibited when the concatenated features from original regions and saliency maps are used to solve the classification problem and penalizing the Kernel wieghts with \(\ell _2\)-norm. Additionally, this second configuration shows less variability than the first one, although both result groups show a similar overall performance respect to maximum performance results.
It is important to stand out how both the \(\ell _1\)-norm and \(\ell _2\)-norm presented good performances in the tests, showing a slight outperforming with the \(\ell _2\)-norm specially in the Table 2, this behavior may be due to the fact that the \(\ell _1\)-norm aims to generate sparsity between the kernel weights (\(\eta _m=0\)), which can be interpreted as the elimination of some kernels, while the \(\ell _2\)-norm aims to assign a value to the kernel weights according to their relevance but without strictly eliminate the kernel, taking full advantage of each kernel even when they have few relevance to the classification task.
Considering some other works that boarded similar problems to the approach of this paper, obtained results show a equivalent performance respect state of the art. Most of the works that implement or propose methods based on visual attention as the saliency analysis are focused on the detection of the ROI into the whole image, which is more similar to the problem of segmentation than classification [19]. On the other hand, the specific classification problem based on the use of saliency information is boarded in a few papers; in [2] an algorithm was applied to a benign-versus-malignant image classification task in MRI images of brain; they randomly divide the dataset of images into a training set, containing 70 benign and 70 premalignant images, and a testing set, containing 30 benign and 30 premalignant images, after that they train an SVM model to classify images from the testing set obtaining a median over 20 runs classification accuracy rate of \(80\%\). Other related work is presented in [3], they propose a hybrid mass detection algorithm that combines unsupervised candidate detection with deep learning-based classification, the detection stage identifies image-salient regions and a convolutional neural network (CNN) is used to classify the detected candidates into true-positive and false-positive masses, the dataset used was composed by breast cancer MRI studies from 171 patients, with 1957 annotated slices of malignant and benign masses and the highest classification accuracy obtained was \(0.86 \pm 0.02\).
4 Conclusions and Future Work
A computational method for classifying lesions detected in breast MRI studies was presented, which takes advantage of MKL and regularization strategy to fuse multimodal sources of features as MRI sequences. An optimization strategy was also performed for SVM and Kernel parameter tuning. First-order measurements from both original and salient images were evaluated as feature descriptors of the image sequences, achieving accuracies over 80% with a 10-Fold cross-validation, in all cases. However, results showed that the use of saliency-based features improves the classification performance respect to use only original based features. Additionally, two different dataset configurations were considered, one containing all the nine sequences that make up the Breast MRI study and the other one using only a subset of them (five sequences). The results have shown that the use of a subset of sequences does not affect significantly the overall performance, which could be useful in the definition of less expensive breast MRI studies. It is important to note that regions were processed without requiring prior manual or automatic segmentation stage. Even more, the use of the saliency distribution is an attempt by capturing the relationship between the lesion and the contextual tissue, which is also relevant in the diagnosis process.
As future work, the use of MKL will be evaluated for fusing more specific and high-level features, such as quantitative image measurements, texture-based features, among others. Additionally, the performance of the method for predicting the final malignancy of a lesion will be also evaluated in a prospective study that will include biopsy-proven lesions.
References
Aggarwal, N., Agrawal, R.K.: First and second order statistics features for classification of magnetic resonance brain images. J. Signal Inf. Process. 03(02), 146–153 (2012). https://doi.org/10.4236/jsip.2012.32019
Alpert, S., Kisilev, P.: Unsupervised detection of abnormalities in medical images using salient features. In: Medical Imaging 2014: Image Processing, vol. 9034, p. 903416. International Society for Optics and Photonics (2014)
Amit, G., et al.: Hybrid mass detection in breast MRI combining unsupervised saliency analysis and deep learning. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 594–602. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_68
Areiza-Laverde, H.J., Díaz, G.M., Castro-Ospina, A.E.: Feature group selection using MKL penalized with \(\ell _1\)-norm and SVM as base learner. In: Figueroa-García, J.C., López-Santana, E.R., Rodriguez-Molano, J.I. (eds.) WEA 2018. CCIS, vol. 915, pp. 136–147. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00350-0_12
Banerjee, S., Mitra, S., Shankar, B.U., Hayashi, Y.: A novel GBM saliency detection model using multi-channel MRI. PLoS ONE 11(1), 1–16 (2016). https://doi.org/10.1371/journal.pone.0146388
Bickelhaupt, S., et al.: Fast and noninvasive characterization of suspicious lesions detected at breast cancer X-Ray screening: capability of diffusion-weighted MR imaging with MIPs. Radiology 278(3), 689–697 (2016). https://doi.org/10.1148/radiol.2015150425
Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 35(1), 185–207 (2013). https://doi.org/10.1109/TPAMI.2012.89
Clerc, M.: Standard particle swarm optimisation (2012)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Culache, O., Obadă, D.R.: Multimodality as a premise for inducing online flow on a brand website: a social semiotic approach. Procedia - Soc. Behav. Sci. 149, 261–268 (2014). https://doi.org/10.1016/j.sbspro.2014.08.227
Erihov, M., Alpert, S., Kisilev, P., Hashoul, S.: A cross saliency approach to asymmetry-based tumor detection. In: MICCAI 2015: 18th International Conference on Medical Image Computing and Computer Assisted Intervention, vol. 9351, pp. 636–643 (2015). https://doi.org/10.1007/978-3-319-24574-4
Gönen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12(8), 2211–2268 (2011)
Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Advances in Neural Information Processing Systems, pp. 545–552 (2007)
Heller, S.L., Moy, L.: Breast MRI screening: benefits and limitations. Current Breast Cancer Rep. 8(4), 248–257 (2016). https://doi.org/10.1007/s12609-016-0230-7
Itti, L., Dhavale, N., Pighin, F.: Realistic avatar eye and head animation using a neurobiological model of visual attention. In: Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation VI, vol. 5200, pp. 64–79. International Society for Optics and Photonics (2003)
Jampani, V., Sivaswamy, J., Vaidya, V.: Assessment of computational visual attention models on medical images. In: Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing - ICVGIP 2012, pp. 1–8 (2012). https://doi.org/10.1145/2425333.2425413
Kloft, M., Brefeld, U., Sonnenburg, S., Zien, A.: Non-sparse regularization and efficient training with multiple kernels. arXiv preprint arXiv:1003.0079, vol. 186, pp. 189–190 (2010)
Mitra, S., Banerjee, S., Hayashi, Y.: Volumetric brain tumour detection from MRI using visual saliency. PLoS ONE 12(11), 1–14 (2017). https://doi.org/10.1371/journal.pone.0187209
Mitra, S., Banerjee, S., Hayashi, Y.: Volumetric brain tumour detection from MRI using visual saliency. PloS ONE 12(11), e0187209 (2017)
Narváez, F., Díaz, G., Poveda, C., Romero, E.: An automatic BI-RADS description of mammographic masses by fusing multiresolution features. Expert Syst. Appl. 74, 82–95 (2017)
Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Courier Corporation, North Chelmsford (1998)
Strahle, D.A., Pathak, D.R., Sierra, A., Saha, S., Strahle, C., Devisetty, K.: Systematic development of an abbreviated protocol for screening breast magnetic resonance imaging. Breast Cancer Res. Treat. 162(2), 283–295 (2017). https://doi.org/10.1007/s10549-017-4112-0
Xu, Z., Jin, R., Yang, H., King, I., Lyu, M.R.: Simple and efficient multiple kernel learning by group lasso. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 1175–1182. Citeseer (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Areiza-Laverde, H.J., Duarte-Salazar, C.A., Hernández, L., Castro-Ospina, A.E., Díaz, G.M. (2019). Breast Lesion Discrimination Using Saliency Features from MRI Sequences and MKL-Based Classification. In: Nyström, I., Hernández Heredia, Y., Milián Núñez, V. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2019. Lecture Notes in Computer Science(), vol 11896. Springer, Cham. https://doi.org/10.1007/978-3-030-33904-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-33904-3_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33903-6
Online ISBN: 978-3-030-33904-3
eBook Packages: Computer ScienceComputer Science (R0)