Keywords

1 Introduction

Lung cancer led to approximately 159,260 deaths in the US in 2014 and is the most common cancer worldwide. The increasing relevance of pulmonary CT data has triggered dramatic growth in the computer-aided diagnostics (CAD) field. Specifically, the CAD task for interpreting chest CT scans can be broken down into separate steps: delineating the lungs, detecting and segmenting nodules, and using the image observations to infer clinical judgments. Multiple techniques have been proposed and subsequently studied for each step. This work focuses on characterizing the segmented nodules.

Clinical protocols for identifying and assessing nodules, specifically the Fleischner Society Guidelines, involve monitoring the size of the nodule with repeated scans over a period of three months to two years. Ratings on several image-based features may also be considered, including growth rate, spiculation, sphericity, texture, etc. Features like size can be quantitatively estimated via image segmentation, while other markers are mostly judged qualitatively and subjectively. For nodule classification, existing CAD approaches are often based on sub-optimal stratification of nodules solely based on their morphology. Malignancy is then roughly correlated with broad morphological categories. For instance, one study found malignancy in 82 % of lobulated nodules, 97 % of densely spiculated nodules, 93 % of ragged nodules, 100 % of halo nodules, and 34 % of round nodules [1]. Subsequent approaches incorporated automatic or manual definitions of similar shape features, along with various other contextual or appearance features into linear discriminant classifiers. However, these features are mostly subjective and arbitrarily-defined [2]. These limitations reflect the challenges in achieving a complete and quantitative description of malignant nodule appearances. Similarly, it is difficult to model the 3D shape of a nodule, which is not directly comprehensible with the routine slice-wise inspection of human observers. Therefore, the extraction of proper appearance features, as well as shape description, are of great value for the development of CAD systems.

For 3D shape modeling, spherical harmonic (SH) parameterizations offer an effective model of 3-D shapes. As shape descriptors, they have been used successfully in many applications such as protein structure [3], cardiac surface matching [4], and brain mapping [5]. While SH has been shown to successfully discriminate between malignant and benign nodules (with 93 % accuracy for binary separation) [6], using the SH coefficients to uniquely describe a nodule’s “fingerprint” remains largely unexplored [2]. Also, as a scale- and rotation-invariant descriptor of a mesh surface, SH dose not have the capability of describing a nodule’s size and other critical appearance features: e.g.,solid, sub-solid, part-solid, peri-fissural etc. Hence, SH alone may not be sufficient for nodule characterization.

Recently, deep convolutional neural networks (DCNNs) have been shown to be effective at extracting image features for successful classification across a variety of situations [7, 8]. More importantly, studies on “transfer learning” and using DCNN as a generic image representation [911] have shown that successful appearance feature extraction can be achieved without the need of significant modifications to DCNN structures, or even training on the specific dataset [10]. While simpler neural networks have been used for nodule appearance [2], and DCNN has recently been used to classify peri-fissural nodules [12], to our knowledge, DCNNs such as the Imagenet DCNN introduced by Krizhevsky et al. [7] have not been applied to the nodule malignancy problem, nor have they been combined with 3D shape descriptors such as the SH method.

In this paper, we present a classification approach for malignancy evaluation of lung nodules by combining both shape and appearance features using SHs and DCNNs, respectively, on a large annotated dataset from the Lung Image Database Consortium (LIDC) [13]. First, a surface parameterization scheme based on SH conformal mapping is used to model the variations of 3D nodule shape. Then, a trained DCNN is used to extract the texture and intensity features from local image patches. Finally, the sets of DCNN and SH coefficients are combined and used to train a random forest (RF) classifier for evaluation of their corresponding malignancy scores, on a scale of 1 to 5. The proposed algorithm aims to achieve a more complete description of local nodules from both shape (SH) and appearance (DCNN) perspective. In the following sections, we discuss the proposed method in more detail.

2 Methods

Our method works from two inputs: radiologists’ binary nodule segmentations and the local CT image patches. First, we produce a mesh representation of each nodule from the binary segmentation using the method from [5]. These are then mapped to the canonical parameter domain of SH functions via conformal mapping, giving us a vector of function coefficients as a representation of the nodule shape. Second, using local CT images, three orthogonal local patches containing each nodule are combined as one image input for the DCNN, and appearance features are extracted from the first fully-connected layer of the network. This approach for appearance feature extraction is based on recent work in “transfer learning” [9, 10]. Finally, we combine shape and appearance features together and use a RF classifier to assess nodule malignancy rating.

2.1 Spherical Harmonics Computation

SHs are a series of basis for representing functions defined over the unit sphere \(S^2\). The basic idea of SH parameterization is to transform a 3D shape defined in Euclidean space into the space of SHs. In order to do this, a shape must be first mapped onto a unit sphere. Conformal mapping is used for this task. It functions by performing a set of one-to-one surface transformations preserving local angles, and is especially useful for surfaces with significant variations, such as brain cortical surfaces [5]. Specifically, let M and N be two Riemannian manifolds, then a mapping \(\phi : M \rightarrow N\) will be considered conformal if local angles between curves remain invariant. Following the Riemann mapping theorem, a simple surface can always be mapped to the unit sphere \(S^2\), producing a spherical parameterization of the surface.

For genus zero closed surfaces, conformal mapping is equivalent to a harmonic mapping satisfying the Laplace equation, \(\Delta f = 0\). For our application, nodules have approximately spherical shape, with bounded local variations. Therefore, it is an ideal choice to use spherical conformal mapping to normalize and parameterize the nodule surface to a unit sphere. We first convert the binary segmentations to meshes, and then perform conformal spherical mapping with harmonic energy minimization. Further technical details can be found in [5].

With spherical conformal mapping, we are able to model the variations of different nodule shapes onto a unit sphere. However, it is still challenging to judge and quantify the differences within \(S^2\) space. Therefore, SHs are used to map \(S^2\) to real space \(\mathbb {R}\).

Similar to Fourier series as basis for the circle, SHs are capable of decomposing a given function \(f \in S^2\) into a direct sum of irreducible sub-representations

$$\begin{aligned} f = \sum _{l\ge 0}\sum _{|m|\le l}{\hat{f}(l,m)Y_l^m}, \end{aligned}$$

where \(Y_l^m\) is the m-th harmonic basis of degree l, and \(\hat{f}(l,m)\) is the corresponding SH coefficient. Compared to directly using the surface in \(S^2\), this gives us two major benefits: first, the extracted representation features are rotation, scale, and transformation invariant [5]; second, it is much easier to compute the correlation between two vectors than two surfaces. Therefore, SHs are a powerful representation for further shape analysis.

Fig. 1.
figure 1

Example of SH coefficients’ difference for different nodules and segmentations. The top two rows show a comparison of high-malignancy and low-malignancy nodules, and the difference of their SH coefficient values. The bottom two rows show that two different segmentations of the same nodule have much similar SH coefficients. Nonetheless, differences still remain, motivating supplementing shape-based descriptors with appearance-based ones.

Fig. 1 illustrates the process of computing SH representations. It also compares the SH coefficients of four nodule segmentation cases: nodules with high and low malignancy, and two segmentations of the same nodule by different radiologists. From the manual segmentations, we first generate their corresponding 3D mesh. Then, the mesh is conformally mapped to the unit sphere and subsequently decomposed into a series of SH coefficients. Here, we briefly compare the two resulting SHs by using their direct difference. For comparison, the last two rows show the SH computation for the same nodule, but with different segmentations from two annotators. As illustrated, the SH coefficients have far greater differences between malignant and benign nodules than two segmentations for the same nodule, showing that it is possible to use SH coefficients to estimate the malignancy rating of a specific nodule. Even so, as the figure demonstrates, for nodules consisting of only a limited number of voxels, a change in segmentation could lead to some discrepancy in SH coefficients. For such cases, SH may not be able to serve as a reliable marker for malignancy, and we need to assist the classification with further information, i.e., appearance.

2.2 DCNN Appearance Feature Extraction:

The goal of the DCNN appearance feature extraction is to obtain a representation of local nodule patches and relate them to malignancy. Here, we have used the same DCNN structure used by Krizhevsky et al. [7], which has demonstrated success in multiple applications. This network balances discriminative power with computational efficiency by using five convolutional layers followed by three fully-connected layers. With the trained DCNN, each layer provides different levels of image representations.

Fig. 2.
figure 2

Process of appearance feature extraction. Local patches centered at each nodule were first extracted on three orthogonal planes. Then, an RGB image is generated with the three patches fed to each channel. This image is further resampled and used as input to a trained DCNN. The resulting coefficients in the first fully-connected layer (yellow) are then used as the feature vector for nodule appearance.

Fig. 2 shows the process how each candidate was quantitatively coded. We first convert a local 3D CT image volume to an RGB image, which is the required input to the DCNN structure we use  [7]. Here, we used a fixed-size cubic ROI centered at each segmentation’s center of mass with the size of the largest nodule. Since voxels in the LIDC dataset are mostly anisotropic, we used interpolation to achieve isotropic resampling, avoiding distortion effects in the resulting patches. In order to best preserve the appearance information, we performed principal component analysis (PCA) on the binary segmentation data to identify the three orthogonal axes \(x', y', z'\) of the local nodule within the regular xyz space of axial, coronal, and sagittal planes. Then, we resampled the local space within the \(x'y', x'z'\) and \(y'z'\) planes to obtain local patches containing the nodule. The three orthogonal patch samples formed an “RGB” image used as input to the DCNN’s expected three channels. We use Krizhevsky et al.’s pre-trained model for natural images and extract the coefficients of the last few layers of the DCNN as a high-order representation of the input image. This “transfer learning” approach from natural images has proven successful within medical-imaging domains [9, 10]. As an added benefit, no training of the DCNN is required, avoiding this time-consuming and computationally expensive step. For our application, we use the first fully-connected layer as the appearance descriptor.

2.3 RF Classification

By using SH and DCNN, both appearance and shape features of nodules can be extracted as a vector of scalars, which in turn can be used together to distinguish nodules with different malignancy ratings. Combining these two very different feature types is not trivial. Yet, recent work [14] has demonstrated that non-image information can be successfully combined with CNN features using classifiers. This success motivates our use of the RF classifier to synthesize the SH and DCNN features together. The RF method features high accuracy and efficiency, and is well-suited for problems of this form [15]. It works by “bagging” the data to generate new training subsets with limited features, which are in turn used to create a set of decision trees. A sample is then put through all trees and voted on for correct classification. While the RF is generally insensitive to parameter changes, we found that a set of 200 trees delivered accurate and timely performance.

3 Experiments and Results

We trained and tested our method on the Lung Image Database Consortium (LIDC) image collection [13], which consists of 1018 helical thoracic CT scans. Each scan was processed by four blinded radiologists, who provided segmentations, shape and texture characteristic descriptors, and also malignancy ratings. Inclusion criteria consisted of scans with a collimation and reconstruction interval less than or equal to 3 mm, and those with between approximately 1 and 6 lung nodules with longest dimensions between 3 and 30 mm. The LIDC dataset was chosen for its high-quality and numerous multi-radiologist assessments.

In total 2054 nodules were extracted with 5155 segmentations, and 1432 nodules were marked by at least 2 annotators. Different segmentations/malignancy ratings were treated individually. In order to avoid training and testing against different segmentations of the same nodule, dataset was split at nodule level to avoid bias. Different segmentations of same nodules were grouped into sets based on the mean Euclidean distance between their ROI centers using a threshold of 5 mm. To account for mis-meshing and artifacts from interpolating slices, meshes were processed by filters to remove holes and fill islands. We also applied 1-step Laplacian smoothing.

Judging from the distribution of malignancy ratings for all annotating radiologists and based on Welch’s t-test, inter-observer differences is significant among annotators. Meanwhile, according to the range of malignancy rating differences for any specific nodule, most nodules have a rating discrepancy of 2 or 3 among different annotators, indicating that inter-observer variability is highly significant. Therefore to evaluate the performance of the proposed framework, we used “off-by-one” accuracy, meaning that we regard a malignancy rating with \(\pm 1\) as a reasonable and acceptable evaluation.

Accuracy results for 10-fold cross validation are shown in Table 1 for a range of nodule sets and SH coefficients. Three sets of models were used, one using DCNN features only, one using SH coefficients only, and one using both SH and DCNN features. Models were tested with a range of input parameters, including maximum number of coefficients included and minimum number of annotators marking the nodule. In all cases, the hybrid model achieved better results than both individual models using the same input parameters. The hybrid model results are even more impressive when compared against the inter-observer variability of the LIDC dataset. These results indicate that DCNNs and SHs provide complementary appearance and feature information that can help providing reference malignancy ratings of lung nodules.

Table 1. Off-by-one accuracy for SH only, DCNN only, and hybrid models for input sets of number of annotators marking the nodule, and maximum number of SH coefficients included.

4 Discussion and Conclusion

In this study, we presented an approach for generating a reference opinion about lung nodule malignancy based on the knowledge of experts’ characterizations. Our method is based on hybrid feature sets that include shape features, from SHs decomposition, and appearance features, from a DCNN trained on natural images. Both features are subsequently used for malignancy classification with a RF classifier.

There are many promising avenues of future work. For instance, the method would benefit even more from a larger and more accurate testing pool, as well as the inclusion of more reliable and precise ground truth data beyond experts’ subjective evaluations. In addition, using additional complementary information, such as volume and scale-based features, may also further improve scores. In this study, we represented a nodule’s appearance within orthogonal planes along three PCA axis. Indeed, including more 2D views, even 3D DCNN, could potentially be meaningful beyond the promising results from current setting. The rating classification can also be formulated as regression, whereas the results were not statistically significant according to our current experiment.

SH computation variations due to nodule size and segmentation remains open and discussion is limited in existing literatures [6]. In this study, our experiments partially covered this robustness via testing segmentations for same nodules from different human observers. We also observed that including more SH coefficients did not necessarily led to higher accuracy. We postulate that coefficients help define shape to a certain point, beyond which it may introduce more noise than useful information, and further investigation would be helpful to test this hypothesis.

Based on the inter-observer variability, experimental results using the LIDC dataset demonstrate that the proposed scheme can perform comparably to an independent expert annotator, but does so using full automation up to segmentation. As a result, this work serves as an important demonstration of how both shape and appearance information can be harnessed for the important task of lung nodule classification.