- Software article
- Open access
- Published:
A novel computational method for automatic segmentation, quantification and comparative analysis of immunohistochemically labeled tissue sections
BMC Bioinformatics volume 19, Article number: 357 (2018)
Abstract
Background
In the clinical practice, the objective quantification of histological results is essential not only to define objective and well-established protocols for diagnosis, treatment, and assessment, but also to ameliorate disease comprehension.
Software
The software MIAQuant_Learn presented in this work segments, quantifies and analyzes markers in histochemical and immunohistochemical images obtained by different biological procedures and imaging tools. MIAQuant_Learn employs supervised learning techniques to customize the marker segmentation process with respect to any marker color appearance. Our software expresses the location of the segmented markers with respect to regions of interest by mean-distance histograms, which are numerically compared by measuring their intersection. When contiguous tissue sections stained by different markers are available, MIAQuant_Learn aligns them and overlaps the segmented markers in a unique image enabling a visual comparative analysis of the spatial distribution of each marker (markers’ relative location). Additionally, it computes novel measures of markers’ co-existence in tissue volumes depending on their density.
Conclusions
Applications of MIAQuant_Learn in clinical research studies have proven its effectiveness as a fast and efficient tool for the automatic extraction, quantification and analysis of histological sections. It is robust with respect to several deficits caused by image acquisition systems and produces objective and reproducible results. Thanks to its flexibility, MIAQuant_Learn represents an important tool to be exploited in basic research where needs are constantly changing.
Background
Over the past decades, continuous increase in computational power, together with substantial advance in digital image processing and pattern recognition fields, have motivated the development of computer-aided diagnostic (CAD) systems. Thanks to their effective, precise and repeatable results, validated CAD systems are nowadays exploited as a valid aid during diagnostic procedures. [1,2,3,4,5]. With the advent of high-resolution digital images, the development of computerized systems helping pathologists during analysis of images obtained by histochemical (HC) and immunohistochemical (IHC) labeling has become a main research focus in microscopy image analysis.
State-of-the-art automatic image analysis systems automatically identify (segment) markers (stained areas), and then try to reproduce the evaluation and quantification performed by expert pathologists [6,7,8,9]. These tools could have the potential to minimize the inherent subjectivity of manual analysis and to largely reduce the workload of pathologists via high-throughput analysis [10,11,12].
Generally, after color transformation, illumination normalization, color normalization and noise reduction, the current methods firstly compute a rough marker segmentation, refine the detected structures, and finally quantify them. Noise reduction is performed by applying median filters [13, 14], Gaussian filters [15] and morphological gray-scale reconstruction operators [16]. Attention is devoted to the color transformation process, which should overcome the problematic and undesirable color variation due to differences in color responses of slide scanners, raw materials and manufacturing techniques of stain vendors, as well as staining protocols across different pathology labs [17]. While some systems transform the RGB color space into more perceptual color spaces, such as CIE-Lab [18,19,20,21,22], Luv [23,24,25], Ycbcr [26], or 1D/2D color spaces [18, 27, 28], others perform illumination and color normalization through white shading correction methods [29, 30], background subtraction techniques, (adaptive) histogram equalization [14, 31, 32], Gamma correction methods [33], Reinhard’s method [34], (improved) color deconvolution [35, 36], Non-negative Matrix Factorization (NMF) and Independent Component Analysis (ICA) [17, 37,38,39,40], decorrelation stretching techniques [14, 32, 41], anisotropic diffusion [22]. After these preprocessing steps, the labeled structures of interest are detected by morphological binary or gray level operators [28, 42,43,44,45], automatic thresholding techniques [20, 28, 33, 43], clustering techniques [46, 47], the Fast Radial Symmetry Transform (FRST) [16, 48], Gaussian Mixture Models [20, 22, 49], and edge detectors such as the Canny edge detector, Laplacian of Gaussian filters [50] or Difference of Gaussian filters [51]. These algorithms are followed by techniques that refine the extracted areas, through methods such as the Hough transform [51], Watershed algorithms [45, 52,53,54], Active Contour Models [45, 51, 55], Chan-Vese Active Contours [54, 56], region growing techniques [19], different graphs methods [57], or graph-cuts segmentation techniques [50, 58]. Extracted areas can be also refined by more complex learning techniques such as rule based systems [59], cascades of decision tree classifiers [60], Bayesian Classifiers [42], KNN classifier [61] trained on RGB color coordinates [59], the Quadratic Gaussian classifier [43, 62], Convolutional Neural Networks (CNN) [63] or SVMs [51]. Marker quantification methods vary a lot, depending on the clinical research question and the required tasks.
The lack of flexibility with respect to different image characteristics, whose variability depends on the acquisition system and the specific staining procedure used to dye the image, hampers the ample application of state of the art automatic histological image analysis systems. Image problems, depending on the presence of tissue folds and/or cuts, unspecific colorations and unwanted background structures, additionally misguide the image analysis systems. Though some effective methods exist, given the high image resolutions and dimensions, their usage of particularly complex image processing techniques often makes them too expensive in terms of computational time and memory storage.
However, automatic analysis is increasingly demanded for its objective, precise and repeatable numerical estimates on a statistically significant number of high-resolution images. Our open source software MIAQuant [59] effectively segments and quantifies markers with specific colorings from histological images, by combining simple and efficient image processing techniques. Upon providing contiguous (serialized) tissue sections, MIAQuant aligns them and computes an image where the markers are overlapped with different colors, thus allowing the visual comparison of the markers’ respective locations. Its effective results in biomedicine motivated us to expand its ability to express the localization of markers stained on different, and eventually contiguous serialized, tissue sections. Similar to MIAQuant, our improved system called MIAQuant_Learn exploits simple, efficient, and effective image processing, pattern recognition and supervised learning techniques [64], with the aim of customizing the marker segmentation to any color appearance. MIAQuant_Learn computes mean-distance histograms to objectively express the markers’ position and relative location with respect to the resection margins and to user-selected structures of interest. In case of serial tissue sections, MIAQuant_Learn computes objective “morphology-based” measures expressing the markers’ co-existence in areas of higher densities.
Implementation
MIAQuant_Learn is an improved version of MIAQuant, developed to overcome MIAQuant’s main limits and expand its capabilities. The extensive usage of MIAQuant has evidenced lack of robustness, with respect to both imaging system related artifacts, and specific problems arising during procedures such as tissue preparation and staining. Some examples are detailed in Fig. 1, showing sub-images containing unspecific colorings (center-column), which are wrongly included by MIAQuant’s segmentation results (left column), since their color appearance is too similar to that of markers. Another drawback of MIAQuant relies in the fact that it allows segmenting only markers whose color appearance is coded into the rule-based system. However, the user might need to expand the segmentation capabilities, to extract markers whose color appearance differs from that actually recognized by MIAQuant.
MIAQuant_Learn is a prototype software developed with Matlab R2017a on a standard laptop (CPU: Intel i7, RAM 16 GB, disk 256 SSD). The system requirements depend on the image size and resolutions; based on our memory storage limits, MIAQuant_Learn is able to open and process images stored with lossless compression techniques (e.g. image formats TIFF, JPEG 2000, or PNG), provided their memory size is less than 2 GB (images whose pixel dimension is about 25000 × 25000). To circumvent this limit, before processing, we vertically slice top weight images (by a Linux script); MIAQuant_Learn processes each slice and recompose the computed results upon analysis. To fasten the algorithms, we developed MIAQuant_Learn by exploiting the parallel computing Toolbox provided by Matlab R2017a, which allows solving computationally and data-intensive problems using multicore processors without CUDA or MPI programming. In detail, the toolbox lets the programmer use the full processing power of multicore desktops by programming applications that are distributed and executed on parallel workers (MATLAB computational engines) that run locally.
MIAQuant_Learn has been developed and tested on digital (RGB color) HC and IHC images representing different tissue samples acquired by different imaging systems (e.g.: Aperio Scanscope Cs, Olympus BX63 equipped with DP89 camera and software cellSens, or Nikon Eclipse E600 microscope equipped with DS-Fi1 camera and software Nis-Elements AR3.10). Up to now, MIAQuant_Learn has processed 1357 RGB images belonging to 11 different datasets (sample tissue sections stained with different colors are shown in are shown in Fig. 2), each containing “biologically” similar (pathological and/or healthy) tissue samples. Each dataset is composed of images with a specific image resolution (resolution range [0.4 η/px - 8 η/px]). When the considered dataset contains serialized section sets, each set is generally composed of 3 to 7 serial IHC-stained sections to visualize different markers and the processed images are characterized by a high pixel dimension (ranging from 15000x15000x3 px to 35000x35000x3 px).
Importantly, MIAQuant_Learn avoids any preprocessing step for noise reduction, illumination normalization, and color normalization, since our experimental results have shown that these procedures might excessively alter, or even delete, small marker areas (which will be simply referred as markers). In the following, the main steps of MIAQuant_Learn are described.
Segmentation of the tissue region
Firstly, the tissue region is extracted to restrict the processing region. To this aim, the image is downsampled (to avoid high computational costs) to a size less or equal to 5000 pixels, it is transformed into its gray-level (gL) version [13] and filtered with a 25 × 25 px median filter followed by a Gaussian filter with standard deviation equal to 0.5. This heavy filtering process allows to abruptly reducing salt-and-pepper and Gaussian noise, creating a smoothed image where an (almost) uniform brighter background is contrasted with the darker tissue region. The resulting tissue mask, distinguishable from the background by automatically thresholding the filtered image with the Otsu algorithm [65], is then rescaled to the original image size and is refined to remove false positive segmentation errors (pixels wrongly included in the tissue mask originating from scale reduction and filtering process). These pixels are in the border of the tissue mask and can be recognized by their bright gL value, which is similar to that of background pixels. To detect false positive pixels, we therefore compute the mean (meanback) and the standard deviation (stdback) of the gL values of pixels included into the background, and we remove from the tissue mask those pixels p such that: gL(p) > meanback + 0.5 ∗ stdback. The obtained mask is further refined by filling small holes [13] and by removing connected areas that are speculated, not compact, or too small. Finally, to reduce the memory storage requirements, the image is cropped to strictly contain the tissue region. Figure 3 shows the results computed by the main steps of the tissue-region segmentation procedure. Note that, though the segmentation result might be quite rough, the applied simple processing steps effectively allow to restrict the processing area without requiring too much computational time.
Since manual segmentations performed by experts were not available, to straightforwardly assess the tissue-region segmentation step we showed 500 images to three experts and asked them to assign the following grades: A (perfect segmentation), B (small presence of false positives and/or false negative errors), C (evident presence of false positive and/or false negative errors), D (bad segmentation). Overall, 487 images were assigned grade A (97.4%), 10 images grade B (2%), while 3 of them contained evident errors and were graded with C (0.6%). This visual analysis has demonstrated the effectiveness of the tissue-region segmentation step.
Marker segmentation via decision trees, support vector machines, and K-nearest neighbor
To let the user customize MIAQuant_Learn to segment any marker colorings, we employ a simple stacked classifier, which combines the results obtained by decision trees (DTs), support vector machines (SVMs) with radial basis function kernels, and one K-Nearest Neighbor (KNN) classifier.
For computational efficiency MIAQuant_Learn avoids the usage of classifiers requiring high computational costs and memory storage, such as deep learners (e.g.: deep neural networks, convolutional neural networks, deep belief networks, deep recurrent neural networks), nowadays widely used in the medical image analysis research field [66, 67]. Additionally, since any image transformation is time-consuming, we characterize each pixel with a small set of RGB color features computed over its 7 × 7-neighborhood, and we avoid more complex texture features (e.g.: entropy, derivatives, Fourier descriptors [68]). This strategy allows splitting big images into smaller sub-images, separately processing them, and recomposing the obtained segmentations for further analysis.
In detail, given a pixel p, and being {Rp, Gp, Bp} its RGB color coordinates,Footnote 1 p is represented by the 24 dimensional feature vector: where: μnRGB = {μnR, μnG, μnB} is a three dimensional vector containing the mean RGB color values of pixels in the n-by-n-neighborhood of p, the vector σnRGB = {σnR, σnG, σnB} contains the standard deviations of the RGB color values of pixels in the n-by-n-neighborhood of p, while rangenRGB = {rangenR, rangenG, rangenB} contains the local ranges (maximum-minimum RGB values) of the n-by-n neighborhood of p.
Training data collection
To collect training data we developed a user interface showing sample sub images to experts with the following selection possibilities:
-
a)
“marker-pixels” (positive training samples), that is pixels belonging to markers.
-
b)
Rectangular areas containing only “not-marker pixels” (obvious negative training samples); these areas generally contain the most “obvious” not-marker-pixels and do not carry enough information to discard not-marker-pixels whose appearance is similar to that of marker-pixels.
-
c)
“Critical not-marker”-pixels (critical negative training samples); these are those not-marker pixels whose an appearance is very similar to marker-pixels.
Figure 4 shows some examples of marker-pixels (green arrows), critical not-marker-pixels (black arrows), and rectangular areas containing no markers (black rectangles). With the described selection system we often obtain highly unbalanced training sets, where the number of positive samples, Npos, which is generally similar to the number of critical negative samples, Ncrit, is much lower than the number of obvious negative samples, Nneg. As a result, it can occur that the ratio of positive versus negative training samples is such that: \( \frac{Npos}{Nneg+ Ncrit}\le \frac{1}{50} \).
Classifying system
The stacked classifier, whose structure is schematized in Fig. 5, is composed by two stacked cost-sensitive decision trees (first DT layer), followed by one cost-sensitive SVM with radial basis function kernel (second SVM layer), followed by one KNN classifier (third KNN layer). Each classifier discards pixels recognized as not-marker pixels and leaves to the next classifiers any further decision regarding the pixels classified as (candidate) marker-pixels.
The misclassification cost of both the DTs and the SVM is
where Cost(p, t) is the cost of classifying a point into class p if its true class is t (i.e., the rows correspond to the true class and the columns correspond to the predicted class). Label 1 is assigned to positive examples and label 0 is assigned to negative examples. This cost matrix assigns a higher misclassification cost to pixels belonging to the class whose training set has the lowest cardinality.
The KNN classifier is not cost-sensitive; it employs the cost matrix: \( Cost\left(p,t\right)=\left[\begin{array}{cc}0& 1\\ {}1& 0\end{array}\right] \).
While the DTs and the SVM are trained on the training pixels coded as 24 dimensional vectors, the KNN is trained on points p coded as 3-dimensional vectors p3 = {Rp, Gp, Bp}.
The classifiers employ different training sets. The first DT is trained with an unbalanced training set composed of the training marker-pixels (positive examples) and all the training (obvious and critical) not-marker pixels (negative examples). The training points are coded as 24 dimensional vectors containing all the previously described features. 10-fold cross-validation is applied for training the first DT. Each fold is composed of \( \frac{1}{10}\ast Npos \) randomly selected positive examples and \( \min \left(\frac{1}{10}\ast \left( Nneg+ Ncrit\right),\kern0.5em 5\ast Npos\right) \) randomly selected negative examples; the remaining training pixels are used for validation. The trained DT that achieves the maximum accuracy is the chosen first DT classifier.
Once the first decision tree is trained, it is used to classify the set of obvious negative examples; after classification, only the wrongly classified samples (false positives) are kept as obvious negative training samples and added to the set of critical negative samples. The training set is therefore composed of all the positive examples, all the critical negative examples, and the wrongly classified negative examples. This process enormously reduces the number of available negative samples considered by the second DT, which is then trained by applying the aforementioned 10-fold cross validation to maximize the accuracy.
The second DT is then used to classify all the negative samples (critical + obvious) and only the wrongly classified negative examples are kept to train the following SVM classifier by applying 2-fold cross validation (to maximize the accuracy). The last layer is composed by one KNN classifier (with neighborhood size K = 3) working on points p coded as p3 = {Rp, Gp, Bp}. It is trained on all the positive samples, all the critical negative samples, and the obvious negative samples wrongly classified by the preceding layers.
Applying the described stacked classifier we create a binary mask containing all the detected marker-pixels. This mask is “cleaned” by removing all connected components that have fewer than 3 pixels. These areas are too small to be considered and are often due to noise or image artifacts. The remaining connected areas are the extracted markers, whose quantification and comparative description is described in the following.
When applying the marker segmentation procedure to our database, after extracting some image samples, experts manually selected a training set of about 150 marker-pixels, 150 critical not-marker pixels and 15.000 obvious not-marker pixels (the selection of the obvious not-marker pixels, being based on rectangular selection areas, easily selects such a large number on negative examples). If some images were wrongly segmented (here it happened in 11% of 1357 images), the experts added extra training points by considering the wrongly segmented pixels. After retraining the classifiers and re-segmenting all the images in the dataset, we obtained remarkably good results for 98.63% of all images. Of note, when two datasets are “similarly stained”, that is they contain images whose markers have similar color appearances, the training procedure can be applied only once, since the marker-segmentation step can be performed by employing the same classifiers. Nonetheless, given a novel dataset to be segmented, the training set employed for a “similarly stained” dataset can be used as a starting training set, and extra training points can be added to obtain adequate classifiers. This allows to build semisupervised segmentation machines, easily adaptable with respect to different image datasets.
Marker quantification and comparative measures for markers’ localization comparison
Mean-distance histograms from resection margins and structures of interest
Similar to MIAQuant, once marker segmentation has been applied on an input image, MIAQuant_Learn computes the marker density estimate as the percentage of the marker-pixels with respect to the tissue area (the tissue area is a scalar number, defined as the number of pixels in the tissue region). Precisely, given a section, SL, and denoting with M the markers segmented in SL, the density, DMT, of markers M in the tissue region of SL is computed as DMT = AM/TA where AM is the area covered by M, and TA is the tissue area in SL.
Additionally, MIAQuant_Learn expresses the marker location in the tissue region by computing normalized minimum-distance histograms estimating the distribution of the minimum distancesFootnote 2 between each marker-pixel and the borders of structures of interest, such as basement membrane, borders of cancer nodules, necrotic areas in plaques. When each marker is stained on a set of HC or IHC images, mean distance histograms can be computed for each marker. The visible similarities/dissimilarities of the mean distance-histograms computed for each marker objectively confirm the expected differences in the spatial distribution of the markers under analysis [69]. Indeed, experts consider the visualization of the distance-histograms as effective to understand the spatial distribution characterizing each marker. MIAQuant_Learn supports the visual comparison with a numerical measure: the difference among the normalized mean distance histograms of two markers, M1 and M2, is expressed by the average of the two histogram intersection measures (from M1 to M2, and from M2 to M1)Footnote 3 [70].
Markers’ neighborhoods detection from sets of serial tissue sections
Given a set of serial sections, pathologists generally mark each to visualize the density and location of a specific structure, visually compare the labelled sections to find areas where the markers’ densities are high, and finally identify corresponding volumes where the analyzed markers (and hence the labelled structures) are mostly concentrated and neighboring.
MIAQuant_Learn provides means to help experts during this analysis.
Though contiguous, the sections we treat might have a quite different shape. Thus, when sets of marked serial tissue slices are available, MIAQuant and MIAQuant_Learn apply a multiscale-hierarchical registration procedure [59] to align the tissue masks as much as possible (tissue-shape registration).
Overall, we have employed this registration procedure on more than 40 sets of contiguous tissue sections (their cardinality varies in the range [3,…, 7]). To objectively evaluate the computed results, for each set composed of n serial tissue sections {SL1, SL2,…, SLn}, denoting with T(SLi) the tissue region in SLi, we define the global tissue-region overlap (GTRO), as: \( GTRO=\frac{A\left({\bigcap}_{i=1}^nT\left({SL}_i\right)\right)}{A\left({\bigcup}_{i=1}^nT\left({SL}_i\right)\right)}\ast 100 \), where A(x) is the numer of pixels of a binary region x For each set of serial tissue sections, we measured the GTAO before and after registration, and we computed the \( mean(GTRO)=\frac{\sum_{j=1}^{40} GTRO(j)}{40} \) (where GTRO(j) is the GTRO computed for the j-th sets of serialized tissue sections). Before registration we measured a mean(GTRO) = 70.6 % (−7.2%, +8.3); after tissue shape registration this measure increased to a mean(GTRO) = 95.7 % (−3.1%, +4.0%).
The registration step is followed by the computation of a color image where the different markers are shown with different colors, to allow an objective visual comparison of their relative location. MIAQuant_Learn also allows analyzing the aligned images, to numerically detect and express the co-existence (or absence) of the markers in (automatically identified) regions where the markers’ densities are higher. Hereafter these regions will be referred as “concentration regions”.
From an attentive observation, we noted that each concentration region is generally composed of a core region, where the markers’ density is higher and the pixel distance among markers is less than \( \frac{R}{2} \), and a surrounding region, where the markers’ density diminishes and the distance among markers increases until reaching the value R on the border of the concentration region. The R value changes in each section, but all the concentration regions in the same section are well defined by a unique R value. Precisely, given a section SL, and denoting its markers with M, concentration regions in SL are composed by pixels belonging to the tissue region, which are distant less than R(M) from any marker pixel. To automatically estimate the proper R(M) we compute the histogram of the minimum distances between each pixel in the tissue region and the markers segmented in SL, and we select the value RMAX(M) where the histogram reaches its maximum value. If the section does not contain any concentration region, RMAX(M) results as too high value. To avoid this problem, we determine the value RLIMIT(M); this value is such that the number of pixels at distance less than RLIMIT(M) from any marker pixel is less than 50AM, where AM is the number of marker pixels in SL. The value R(M) is then computed as R(M) = min(RLIMIT(M), RMAX(M)).Footnote 4
Having estimated R(M), we identify core regions by selecting pixels are at a distance less than \( \frac{R(M)}{2} \) from any marker pixel, and delete small connected areas (areas with less than \( 10{\left(\frac{R(M)}{2}\right)}^2 \) pixels). The remaining core regions are then expanded to include pixels at a distance less than R(M) from any marker and the small connected regions (containing less than 20R2 pixels) are discarded. The remaining connected regions represent the concentration regions in SL.
Once concentration regions are found in two sections SL1 and SL2, they can be exploited to derive different measures expressing the markers co-existence either in the whole tissue region, in user selected regions of interest (ROIs), such as rectangular areas (Fig. 6e and f), or in selected concentration regions.
As an example, denoting with M1 and M2 the markers in two sections SL1 and SL2, with Conc1 and Conc2 the concentration regions computed by M1 and M2, we can compute:
-
the density, DMC1 and DMC2, of M1 and M2 in their concentration regions; DMCi = AMi/CAi, where AMi is the area covered by Mi, and CAi is the area of Conci, that is the number of pixels composing Conci;
-
the density, DM1InC2 and DM2InC1, of M1 and M2 in the concentration regions of the other marker; precisely, \( {DM}_{1 InC2}={A}_{M1\bigcap {Conc}_2}/{CA}_2 \), \( {DM}_{2 InC1}={A}_{M2\bigcap {Conc}_1}/{CA}_1 \), where \( {A}_{M1\bigcap {Conc}_2} \) is the area of the markers M1 in Conc2 and \( {A}_{M2\bigcap {Conc}_1} \) is the area of the markers M2 in Conc1;
-
the weighted mean of DM1InC2 and DM2InC1:\( {wMean}_{Dens}\left({DM}_{1 InC2},{DM}_{2 InC1}\right)=w\ \frac{DM_{2 InC1}+{DM}_{1 InC2}}{2} \), where \( w=\frac{\min \left(\ {DM}_{C1},{DM}_{C2}\right)}{\max \left(\ {DM}_{C1},{DM}_{C2}\right)} \);
-
the percentage, PM1InC2 and PM2InC1, of M1 and M2 in the concentration regions of the other marker; precisely, \( {PM}_{1 InC2}={A}_{M1\bigcap {Conc}_2}/{A}_{M1} \), \( {PM}_{2 InC1}={A}_{M2\bigcap {Conc}_1}/{A}_{M2} \).
-
the weighted mean of PM1InC2 and PM2InC1: \( {wMean}_{AVG}\left({PM}_{1 InC2},{PM}_{2 InC1}\right)=w\ \frac{PM_{2 InC1}+{PM}_{1 InC2}}{2} \).
Though these measures are computed on the whole section, they can be restricted to consider only the markers and concentration regions contained in user-selected ROIs (in this case DMTi = AMi/ROIA, where ROIA is the area of the user-selected region of interest).
Results
Marker segmentation and location analysis
MIAQuant_Learn, our open source software, stands out for its capability to be customized to any marker color appearance thanks to the usage of supervised learning techniques. Of note, its classifiers can be continuously updated by adding training points; this allows increasing their “knowledge” until satisfactory results are computed.
In Fig. 1 (center column) we show three images containing regions whose color, being similar to that of markers, may cause false positive segmentation errors. These are: colorings due to china ink used to identify resection margins (Fig. 1b), stain spread and imprisoned in tissue folds (Fig. 1e), and unspecific colorings in red blood cells (Fig. 1h). Segmentation results computed by MIAQuant_Learn (right column) do not contain the false positive errors computed by MIAQuant (left column). MIAQuant_Learn processed also “old” slides, often biased by color modifications (e.g. by blurring effects and/or by discolorations) and technical deficits. We could obtain successful segmentation results for 98.67% of 1357 images. It must be added that MIAQuant_Learn effectively processes also fluorescence microscopy images, where segmentation problems are easier to overcome.
Once segmented, the markers’ density and (relative) position can be exploited to compute several other measures, such as mean-distance histograms from structures of interest. The histogram plots and the histogram intersection measure allow to visually and numerically assess differences and/or similarities among the markers’ positions. Figure 7 shows two human tonsil sections, belonging to a human tonsil database, stained with Ki-67 and Filagrin antibodies (red) prior to MIAQuant_Learn processing; the sections in database have been studied to understand the distribution of these markers with respect to the basement membrane, manually marked by experts (purple lines in Fig. 7c and Fig. 7f). The mean-distance histograms computed over the whole dataset (plotted in Fig. 7g) confirm the expected marker distribution. Ki-67 marks proliferating cells generally tied to the basement membrane, while Filagrin is contained in differentiating cells, most of which are located far from the basement membrane. The histogram intersection computed by our software equals 0.6, confirming the difference in the markers’ distribution.
This kind of analysis can provide useful information and was applied to analyze the relative location of different cell populations with respect to manually marked borders in arteriosclerotic plaques [69].
Alignment of serial image sets
To provide visual means for markers’ co-existence detection and analysis, MIAQuant and MIAQuant_Learn automatically align (register) serial sections and compute images where the markers are overlapped. MIAQuant Learn improves the visual information by producing a color image where automatically detected concentration regions (that is regions where each marker density is high) are overlapped.
Note that, since serial section images depict contiguous histological sections whose thickness is similar to, or bigger than, that of the histological structures of interest, when two or more markers are overlapping in the color image computed by MIAQuant_Learn after registration, they must be considered as neighbors rather than adhering. For this reason, the analysis of contiguous sections allows detecting markers co-existing in the same volumes rather than co-localizing markers. Co-localization studies [71, 72] can indeed be performed only on (more expensive) histological images produced from sections contemporaneously stained for different antigens.
The detection of co-existing markers identifying biological structures in volume/areas is relevant to get insight into the complex interactions governing biological processes.
In Fig. 6: we show the result computed by the shape-based registration procedure of MIAQuant Learn on an image set depicting three serial sections of metastatic melanoma marked for CD3 and CD8 lymphocytes and CD163 myeloid cell markers (Fig. 6a-c). The color image computed before registration (Fig. 6d) achieves a GTRO value equal to 85.9%, displaying an increase to 93.2% after registration (Fig. 6e), now enabling an objective comparative (visual) analysis of the three markers’ relative position (a detail is shown in Fig. 6g). This confirms the effectiveness of image registration procedure by MIAQuant Learn. Automatically computed (overlapped) concentration regions relative to three markers (CD3, red; CD8, green; CD163, blue) are shown in Fig. 6f. To focus on lymphocytes, we exploited the ability of MIAQuant_Learn to restrict the computation of the co-existence measures in the rectangular ROI shown in Fig. 6e and f (Fig. 6g and h respectively show the overlapped markers and the overlapped concentration regions in the ROI). In Table 1 we show the marker densities in the tissue region, in the ROI, as well as in specific concentration regions. Comparing the computed values reveals that the densities of the three markers in the ROI are higher than those in the whole tissue region, and that they further increase when computed in the automatically extracted concentration regions (Fig. 6h). Importantly, this points out that the three markers have a different increase in density when different areas are considered.
As a further example, Fig. 8 shows three sections of metastatic melanoma tissue marked with CD3, CD8 and CD14, identifying monocytes antibodies (A-C) and the color image of the overlapped segmented markers after image registration (D). In Fig. 8e the automatically computed concentration regions (Fig. 8f-h) are overlapped. Visual inspection evidences that all three markers are mainly present in the peritumoral area. Table 2 shows that the density values increase when they are computed in concentration regions. Considering that CD14 marks myeloid cells, while CD3 and CD8 markers identify lymphocytes, the comparison of the density values computed in specific areas suggests the potential interaction between these cell populations and allows experts to speculate on their biological function.
A concise way to express the marker co-existence is the computation of the weighted mean of markers’ percentage. Computing this measure on this serial section set we obtained:
These values suggest that the co-existence relationship between marker CD3 and marker CD8 is stronger than those between markers CD3 and CD14, and between markers CD8 and CD14, depending at least in part on the co-expression of CD3 and CD8 by T cells. Despite we here considered as serial section sets those composed of only three, the markers’ co-existence measurements could be computed on sets containing an arbitrary number of serial sections. In this case, the weighted mean of markers’ percentage is a useful measure since it expresses couples of co-existing markers in a unique data.
Conclusions
In this paper we have described MIAQuant_Learn, a novel system for the automatic segmentation, quantification, and analysis of histological sections acquired by differing techniques and imaging systems. The usage of simple, efficient, and effective image processing, pattern recognition and supervised learning techniques [64], allows any user to customize the marker segmentation to any color appearance. To facilitate the analysis, MIAQuant_Learn computes mean-distance histograms to objectively express the markers’ position and relative location with respect to the resection margins and to user-selected structures of interest. Furthermore, in case of serial tissue sections, MIAQuant_Learn computes objective “morphology-based” measures expressing the markers’ co-existence in areas of higher densities. In the Results section the reported examples show that the introduced system effectively segments and quantifies markers of any color and shape, provides their descriptive analysis, and eventually provide informative measures to help marker co-existence analysis.
Of note, most of the analysis reported in this paper (e.g. in Table 2) was performed on images of high dimension and resolution; obtaining such precision by manual counting procedure would result as exhausting and time-consuming. Moreover, the co-existence analysis provided by MIAQuant_Learn can exploit any serial section set, even those stored long-term in archives for different purposes.
In conclusion, MIAQuant_Learn is reliable, easy to handle and usable even in small laboratories, since image acquisition can be performed by cameras mounted on standard microscopes, which are commonly used in histopathological routine. As flexible, easily modifiable software, it adapts well to meet researchers’ needs and can be applied on different image formats. Due to its potential, MIAQuant_Learn is currently used in several research studies, such as the study of myeloid infiltrate and the definition of immune cell tissue scores in different types of cancer.
MIAQuant_Learn code is available online at www.consorziomia.org for clinical research studies.
Availability and requirements
Project name: MIAQuant_Learn
Project home page: www.consorziomia.org
Operating system(s): Platform independent
Programming language: Matlab 2017Ra
Other requirements: Matlab 2017Ra
License: Free
Restrictions: No restrictions to use by non-academics.
Notes
Note that to describe the pixels’ colors we employ features computed in the RGB color space, though other color spaces, e.g. HSV and HIS, might be more intuitive for human perception. However, as reported in [13, 42], this might not be necessarily true for automated classification, whereas colored image segmentation generally achieves better results using the RGB color space; this fact is further confirmed by our statistical analysis [59].
The minimum distance between a marker pixel p and a ROI is the distance between p and the pixel in the border of the ROI, which is at the minimum distance from p.
The histogram intersection measure is not symmetric. Computing the average allows to obtain a symmetric measure.
Note that the value R(M), estimated for markers M, is related to the marker distribution in the tissue region; precisely, given two markers M1 and M2 with the same density in the tissue region, values R(M1) > > R(M2) mean that markers M1 tend to concentrate in areas were their density is higher, while markers M2 are more evenly distributed in the whole tissue region.
Abbreviations
- CAD:
-
Computer-Aided Diagnosis
- CNN:
-
Convolutional Neural Networks
- DT:
-
Decision tree
- FRST:
-
Fast Radial Simmetry Transform
- GTRO:
-
Global tissue-region overlap
- HC:
-
Histochemical
- ICA:
-
Independent Component Analysis
- IHC:
-
Immunohistochemical
- KNN:
-
K Nearest Neighbor
- NMF:
-
Non-negative Matrix Factorization
- ROI:
-
Region of interest
- SVM:
-
Support vector machine
References
Jalalian A, Mashohor S, Mahmud R, Karasfi B, Saripan MIB, A.R.B. R. Foundation and methodologies in computeraided diagnosis systems for breast cancer detection. EXCLI J. 2017;16:113–37.
Schläpfer J, Wellens HJ. Computer-interpreted electrocardiograms: benefits and limitations. J Am Coll Cardiol. 2017;70(9):1183–92.
van Ginneken B. Fifty years of computer analysis in chest imaging: rule–based, machine learning, deep learning. Radiol Phys Technol. 2017;10(1):23–32.
Casiraghi E, Campadelli P, Esposito A. Liver segmentation from computed tomography: a survey and a new algorithm. Artif Intell Med. 2009;45(2–3):185–96.
Devaraj A, van Ginneken B, Nair A, Baldwin D. Use of volumetry for lung nodule management: theory and practice. Radiology. 2017;284(3):630–44.
Tosta TAA, Neves LA, do Nascimento MZ. Segmentation methods of H&E-stained histological images of lymphoma: a review. Inf Med Unlocked. 2017;9(1):35–43.
Irshad H, Veillard A, Roux L, Racoceanu D. Methods for nuclei detection, segmentation, and classification in digital histopathology: a review–current status and future potential. IEEE Rev Biomed Eng. 2014;7:97–114.
Gurcan MN, Boucheron L, Can A, Madabhushi A, Rajpoot N, Yener B. Histopathological image analysis: a review. IEEE Rev Biomed Eng. 2009;2:147–71.
Di Cataldo S, Ficarra E, Macii E. Computeraided techniques for chromogenic immunohistochemistry. Comput Biol Med. 2012;42(10):1012–25.
Hamilton PW, et al. Digital pathology and image analysis in tissue biomarker research. Methods. 2014;70(1):59–73.
He L, Long LR, Antani S, Thoma GR. Histology image analysis for carcinoma detection and grading. Comput Methods Prog Biomed. 2012;107(3):538–56.
Bourzac K. Software: the computer will see you now. Nature. 2013;502(7473):S92–4.
Gonzalez RC, Woods RE. Digital image processing. Upper Saddle River: PEARSON/Prentice Hall Publisher; 2008.
Belkacem-Boussaid K, Samsi S, Lozanski G, Gurcan MN. Automatic detection of follicular regions in H&E images using iterative shape index. Comput Med Imaging Graph. 2011;35(7):592–602.
Leong FJW-M, Brady M, McGee J O’D. Correction of uneven illumination (vignetting) in digital microscopy images. J Clin Pathol. 2003;56(8):619–21.
Veta M, Huisman A, Viergever MA, Diest, van PJ, Pluim JPW. Marker-controlled watershed segmentation of nuclei in H&E stained breast cancer biopsy images. In Proceedings of the 8th IEEE International Symposium on Biomedical Imaging : From Nano to Macro ( ISBI'11), 30 March 2011 through 2 April 2011, Chicago, IL, USA. Piscataway: Institute of Electrical and Electronics Engineers (IEEE). 2011. p. 618-621. Available from, https://doi.org/10.1109/ISBI.2011.5872483.
Vahadane A, Peng T, Sethi A, Albarqouni S, Wang L, Baust M, et al. Structure–preserving color normalization and sparse stain separation for histological images. IEEE Trans Med Imag. 2016;35(8):1962–71.
Sertel O, Kong J, Catalyurek UV, Lozanski G, Saltz JH, Gurcan MN. Histopathological image analysis using model–based intermediate representations and color texture: follicular lymphoma grading. J Signal Process Syst. 2009;55(1–3):169–83.
Basavanhally A, Ganesan S, Agner SC, Monaco JP, Feldman MD, Tomaszewski JE, et al. Computerized image-based detection and grading of lymphocytic infiltration in HER2+ breast cancer histopathology. IEEE Trans Biomed Eng. 2010;57(3):642–53.
Dundar MM, Badve S, Bilgin G, Raykar V, Jain R, Sertel O, et al. Computerized classification of intraductal breast lesions using histopathological images. IEEE Trans Biomed Eng. 2011;58(7):1977–84.
Nguyen K, Jain A, Sabata B. Prostate cancer detection: fusion of cytological and textural features. J Pathol Inform. 2011;2(2):7–27.
Khan AM, ElDaly E, Rajpoot NM. A gamma–gaussian mixture model for detection of mitotic cells in breast cancer histopathology images. J Pathol Inform. 2013;4:11.
Yang Y, Meer P, Foran DJ. Unsupervised segmentation based on robust estimation and color active contour models. IEEE Trans Inf Technol Biomed. 2005;9(3):475–86.
Yang L, Tuzel O, Meer P, Foran DJ. Automatic image analysis of histopathology specimens using concave vertex graph. In Med Image Computing Computer-Assisted Intervention. 2008;11:833–41.
Sertel O, Catalyurek UV, Lozanski G, Shanaah A, Gurcan MN. An image analysis approach for detecting malignant cells in digitized H&E–stained histology images of follicular lymphoma. In: 20th international conference on pattern recognition. Istanbul: IEEE; 2010. p. 273–6.
Khan AM, ElDaly H, Simmons E, Rajpoot NM, HyMaP. A hybrid magnitude–phase approach to unsupervised segmentation of tumor areas in breast cancer histology images. J Pathol Inform. 2013;30(4):1.
Malon C, Cosatto E. Classification of mitotic figures with convolutional neural networks and seeded blob features. J Pathol Inform. 2013;4(1):9–13.
Irshad H. Automated mitosis detection in histopathology using morphological and multi-channel statistics features. J Pathol Inform. 2013;4(19):10–5.
Topper RJ, Dischert LR. Method and apparatus for detecting and compensating for white shading errors in a digitized video signal. Google Patents. 1992; Available from: https://www.google.com/patents/US5157497. Date Accessed 1 Dec 2017
Marty GD. Blank–field correction for achieving a uniform white background in Brightfield digital photomicrographs. Biotechniques. 2007;42(6):716–20.
Michail E, Kornaropoulos EN, Dimitropoulos K, Grammalidis N, Koletsa T, Kostopoulos I. Detection of centroblasts in h&e stained images of follicular lymphoma. In: Signal processing and communications applications conference. Trabzon: IEEE; 2014. p. 2319–22.
Oger M, Belhomme P, Gurcan MN. A general framework for the segmentation of follicular lymphoma virtual slides. Comput Med Imaging Graph. 2012;36(6):442–51.
Dalle JR, Li H, Huang CH, Leow WK, RD, Putti TC. Nuclear Pleomorphism scoring by selective cell nuclei detection. In: Nuclear pleomorphism scoring by selective cell nuclei detection; 2009.
Reinhard E, Adhikhmin M, Gooch B, Shirley P. Color transfer between images. IEEE Comput Graph Appl. 2001;21(5):34–41.
Ruifrok AC, Johnston DA. Quantification of histochemical staining by color deconvolution. Anal Quant Cytol. 2001;23(4):291–9.
Khan AM, Rajpoot N, Treanor D, Magee D. A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution. IEEE Trans Biomed Imag. 2014;61(6):1729–38.
Rabinovich A, Laris CA, Agarwal S, Price JH, Belongie S. Unsupervised color decomposition of histologically stained tissue samples. In: Advances in neural information processing systems; 2003.
Macenko M, Niethammer M, Marron JS, Borland D, Woosley JT, Guan X, et al. A method for normalizing histology slides for quantitative analysis. In IEEE Int. Symposium on Biomedical Imaging: From Nano to Macro; 2009; Boston, MA, USA. p. 1107–1110. IEEE Press Piscataway Publisher, NJ, USA, ISBN: 978-1-4244-3931-7.
Gavrilovic M, Azar JC, Lindblad J, Wählby C, Bengtsson E, Busch C, et al. Blind color decomposition of histological images. IEEE Trans Med Imag. 2013;32(6):983–94.
Trahearn, Snead D, Cree I, Rajpoot N. Multi–class stain separation using independent component analysis. In: Gurcan MN, Madabhushi A, editors. SPIE 9420, medical imaging 2015: digital pathology; 2015.
Mather PM. Computer processing of remotelysensed images. Hoboken: Wiley; 2004.
Al-Kadi OS. Texture measures combination for improved meningioma classification of histopathological images. Pattern Recogn. 2010;43(6):2043–53.
Petushi S, Garcia FU, Haber MM, Katsinis C, Tozeren A. Largescale computations on histology images reveal gradedifferentiating parameters for breast cancer. BMC Med Imaging. 2006;6(14):14–24.
Serra J. Image analysis and mathematical morphology. Orlando: Academic Press, Inc.; 1983.
Huang PW, Lai YH. Effective segmentation and classification for HCC biopsy images. Pattern Recogn. 2010;43(4):1550–63.
Anari V, Mahzouni P, Amirfattahi R. Computer-aided detection of proliferative cells and mitosis index in immunohistichemically images of meningioma. In 6th Iranian conference on Machine Vision and Image Processing; 2010; Isfahan, Iran. p. 1-5. IEEE Press Piscataway Publisher, NJ, USA.
Di Cataldo S, Ficarra E, Acquaviva A, Macii E. Automated segmentation of tissue images for computerized IHC analysis. Comput Methods Prog Biomed. 2010;100(1):1–15.
Loy G, Zelinsky A. Fast radial symmetry for detecting points of interest. IEEE Trans Pattern Anal Mach Intell. 2003;25(8):959–73.
Dempster AP, LNM, RDB. Maximum likelihood for incomplete data via the EM algorithm. J R Stat Soc Series B (Methodol). 1977;39(1):1–38.
Al-Kofahi J, Lassoued W, Lee W, Roysam B. Improved automatic detection and segmentation of cell nuclei in histopathology images. IEEE Trans Biomed Eng. 2010;57(4):841–52.
Cosatto E, Miller M, Graf HP, Meyer JS. Grading nuclear pleomorphism on histological micrographs. In 19th International Conference on Pattern Recognition.; 2008; Tampa, FL, USA. p. 1-4. IEEE Press Piscataway Publisher, NJ, USA.
Jung C, Kim C. Segmenting clustered nuclei using h-minima transformbased marker extraction and contour parameterization. IEEE Trans Biomed Eng. 2010;57(10):2600–4.
Wählby C, Sintorn IM, Erlandsson F, Borgefors G, Bengtsson E. Combining intensity, edge and shape information for 2D and 3D segmentation of cell nuclei in tissue sections. J Microsc. 2004;215(Pt 1):67–76.
Mouelhi A, Sayadi M, Fnaiech F. Automatic segmentation of clustered breast cancer cells using watershed and concave vertex graph. In Int. Conf. on Communications Computing and Control Applications; 2011; Hammamet, Tunisia. p. 1-6. IEEE Press Piscataway Publisher, NJ, USA.
Kass M, Witkin A, Terzopoulos D. Snakes: active contour models. Int J Comput Vis. 1988;1(4):321–31.
Chan TF, Vese LA. Active contours without edges. IEEE Trans Image Proc. 2001;10(2):266–77.
Ta VT, Lézoray O, Elmoataz A, Schüpp S. Graph-based tools for microscopic cellular image segmentation. Pattern Recogn. 2009;42(6):113–1125.
Chang H, Loss LA, Parvin B. Nuclear segmentation in H&Esections via multi-reference graph cut (MRGC). In Proceedings of 9th IEEE International Symposium on Biomedical Imaging: Nano Macro; 2012; Barcelona, Spain. p. 614-617. IEEE Press Piscataway Publisher, NJ, USA.
Casiraghi E, Cossa M, Huber V, Tozzi M, Rivoltini L, Villa A, et al. MIAQuant, a novel system for automatic segmentation, measurement, and localization comparison of different biomarkers from serialized histological slices. Eur J Histochem. 2017;61(4):2838.
Vink J, Leeuwen MV, Deurzen CV, Haan G. Efficient nucleus detector in histopathology images. J Microsc. 2013;249(Pt 2):124–35.
Cover TM, HPE. Nearest neighbor pattern classification. IEEE Trans Inf Theory. 1967;13(1):21–7.
Duda RO, Hart PE, Stork D. Pattern classification. 2nd ed. New York: Wiley; 2000.
Cireşan DC, Giusti A, Gambardella LM, Schmidhuber J. Mitosis detection in breast cancer histology images with deep neural networks. In: Medical image computing and computer–assisted intervention. Berlin Heidelberg: Springer; 2013. p. 411–8.
Bishop CM. Pattern recognition and machine learning. Berlin: Springer-Verlag; 2006.
Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern. 1979;9(1):62–6.
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88.
Shen D, Wu G, Suk HI. Deep learning in medical image analysis. Annu Rev Biomed Eng. 2017;19:221–48.
Pratt WK. Digital image processing: PIKS inside. Hoboken: Wiley; 2001.
Casiraghi E., Ferraro S., Franchin M., Villa A., Vergani B., Tozzi M. Analisi semi automatica nella valutazione della neo-vascolarizzazione della placca carotidea. Italian Journal of vascular and endovascular surgery (Minerva Medica Publishing), vol. 23 (4), pp. 55-56, Suppl.I, ISSN 1824-4777, Online ISSN 1827-1847.
Swain MJ, Ballard DH. Color indexing. Int J Comput Vis. 1991;7(1):11–32.
Bolognesi MM, et al. Multiplex staining by sequential immunostaining and antibody removal on routine tissue sections. J Histochem Cytochem. 2017;65(8):431–44.
Feng Z, et al. Multispectral imaging of formalin-fixed tissue predicts ability to generate tumor-infiltrating lymphocytes from melanoma. J Immunother Cancer. 2015;3:47.
Acknowledgements
The authors would like to thank Prof. Paola Campadelli, Dr. Paolo Pedaletti, and Prof. Giorgio Valentini for their invaluable support.
Funding
The project was supported by funding from ‘Bando Sostegno alla Ricerca, LINEA A’ (Università degli Studi di Milano), AIRC 5 × 1000 (Project ID 12162) and H2020-NMP-2015 (Project PRECIOUS, ID 686089). The publication fees of this article were paid by funding from ‘Bando Sostegno alla Ricerca, LINEA A’ (Università degli Studi di Milano).
About this supplement
This article has been published as part of BMC Bioinformatics Volume 19 Supplement 10, 2018: Italian Society of Bioinformatics (BITS): Annual Meeting 2017. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-19-supplement-10.
Author information
Authors and Affiliations
Contributions
EC, statistical analysis, system design, computational method development and testing; AV, BV, image database creation; EC, MF, AV, BV, experiments conception and design; VH, MC, MT, LR, BEL, AV, BV, clinical evaluation and assessment of achieved results; EC, VH, AV, BV, manuscript drafting. All the authors read this paper and gave consent for publication. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Tissues biopsies were obtained in accordance with Informed Consent procedures approved by the Central Ethics Committee of the Fondazione IRCCS Istituto nazionale dei tumori: INT40/11 and INT39/11.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Casiraghi, E., Huber, V., Frasca, M. et al. A novel computational method for automatic segmentation, quantification and comparative analysis of immunohistochemically labeled tissue sections. BMC Bioinformatics 19 (Suppl 10), 357 (2018). https://doi.org/10.1186/s12859-018-2302-3
Published:
DOI: https://doi.org/10.1186/s12859-018-2302-3