Probabilistic Point Cloud Reconstructions for Vertebral Shape Analysis

Anjany Sekuboyina^16,17,
Markus Rempfler¹⁸,
Alexander Valentinitsch¹⁷,
Maximilian Loeffler¹⁷,
Jan S. Kirschke¹⁷ &
…
Bjoern H. Menze¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11769))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

10k Accesses
8 Citations

Abstract

We propose an auto-encoding network architecture for point clouds (PC) capable of extracting shape signatures without supervision. Building on this, we (i) design a loss function capable of modelling data variance on PCs which are unstructured, and (ii) regularise the latent space as in a variational auto-encoder, both of which increase the auto-encoders’ descriptive capacity while making them probabilistic. Evaluating the reconstruction quality of our architectures, we employ them for detecting vertebral fractures without any supervision. By learning to efficiently reconstruct only healthy vertebrae, fractures are detected as anomalous reconstructions. Evaluating on a dataset containing $\sim $1500 vertebrae, we achieve area-under-ROC curve of >75%, without using intensity-based features.

J. S. Kirschke and B. H. Menze—Joint supervising authors.

You have full access to this open access chapter, Download conference paper PDF

Can Point Cloud Networks Learn Statistical Shape Models of Anatomies?

LatentPCN: latent space-constrained point cloud network for reconstruction of 3D patient-specific bone surface models from calibrated biplanar X-ray images

Article 07 April 2023

Reconstructing Human Body Mesh from Point Clouds by Adversarial GP Network

1 Introduction

One of the consequences of the numerous algorithms proposed for segmenting organs, tissues, the spine etc. involves analysing their anatomical shapes, eventually contributing towards population studies [1], disease characterisation [2], survival analysis [3], etc. Employing convolutional neural networks (CNN) for this task involves processing voxelised data due to its Euclidean nature. Such voluminous representation, however, is inefficient, especially when the masks are binary and the shape information corresponds to its surface profile. Alternatively, surface meshes (a collection of vertices, edges, and faces) or active contours could be used. Since the data is no longer Euclidean, a conventional CNN is unusable. Graph convolutional networks (GCN) [4] were thus developed by redefining the notion of ‘neighbourhood’ and ‘convolution’ for meshes and graphs. However, if the number of nodes is high, GCNs (esp. spectral) become bulky. Moreover, each mesh is treated as a domain, making mesh registration a requisite.

An alternative surface representation is a set of 3D points in space, referred to as the point clouds (PC). A PC represents the surface just with a set of N vertices, thus avoiding both the cubic-complexity of voxel-based representations and the $N \times N$ dimensional, sparse, adjacency matrix of meshes. However, despite their representational effectiveness, PCs are permutation invariant and do not describe data on a structured grid, preventing the usage of standard convolution. To this end, we work with an architecture capable of processing PCs (point-net, [5]), and design a network capable of reconstructing PCs thereby extracting shape signatures in an unsupervised manner.

Uncertainty and Latent Space Modelling. Unlike supervised learning on PCs [6], we set out to obtain shape signatures from PCs without supervision, building towards a relatively less explored topic of auto-encoding point clouds. This involves mapping the PC to a latent vector and reconstructing it back. Since the PCs are unordered, PC-specific reconstruction losses replace traditional ones [7, 8]. Extending auto-encoders (AE) based on such a loss, we propose to improve its representational capacity by regularising the latent space to make it compact and by modelling the variance that exists in a PC population. We claim that this results in learning improved shape signatures, validating the claim by employing the extracted features for unsupervised vertebral fracture detection.

Vertebral Fracture Detection. There exists an inherent shape variation in vertebral shapes within the spine of a single patient (e.g. cervical–thoracic–lumbar) along with a natural variation in a vertebra’s shape in a population (e.g. L1 across patients, cf. Fig. 1). Additionally, osteoporotic fractures start without significant shape change and progress into a vertebral collapse. Hence, fracture detection in vertebrae is non-trivial. Added to this, limited availability of fractured vertebrae makes the learning of supervised classifiers non-trivial. In literature, several classification systems exist mainly based on vertebral height measurement [9] or analysing sub-regions of the spine in sagittal slices [10]. However, an explicit shape-based approach seems absent. Evaluating the representational ability of the proposed AE architectures, we seek to analyse vertebral shapes and eventually detect vertebral fractures using the extracted latent shape features.

Our Contribution. Summarising the contributions of this work: (1) We build on existing point-net-based architectures to propose a point-cloud auto encoder (pAE). (2) Reinforcing this architecture, we incorporate latent space modelling and a more challenging uncertainty quantification. (3) We present a comprehensive analysis of the reconstruction capabilities of our pAEs by investigating their utility in detecting vertebral fractures. We work with an in-house, clinical dataset ($\sim $1500 vertebrae) achieving an area-under-curve (AUC) of >75% in detecting fractures, even without employing texture or intensity-based features.

2 Methodology

We present this section in two stages: First, we introduce the notation used in this work and describe a point-net-based architecture capable of efficiently auto-encoding point clouds. Second, we build on this architecture to model the natural variance in vertebrae while regularising the latent space.

2.1 Auto-Encoding Point Clouds

Given accurate voxel-wise segmentation of a vertebra, a point cloud (PC) can be extracted as a set of N points denoted by $X = \{p_i\}_{i=0}^N$, where $p_i$ represents a point by its 3D coordinate $(x_i, y_i, z_i)$. Additionally, $p_i$ could also represent other point specific features such as normal, radius of curvature etc. So, each vertebra is represented by a PC of dimension $N \times m$ (in this work, $N=2048$ vertices and $m=3$ coordinates, with the vertices randomly subsampled from a higher resolution mesh). Recall the lack of a regular coordinate space associated with the PC and that any permutation of these N points represents the same PC. Thus, a unique variant of deep networks is incorporated for processing PCs.

Architecture. An AE consists of an encoder mapping the PC to the latent vector and a decoder reconstructing the PC back from this latent vector, i.e $X \mapsto z \mapsto X$. As the encoder, we employ a variant of the point-net architecture [5]. The latent vector, z, respects the permutation invariance of the PC and represents its shape signature. As a decoder, taking cues from [7], we construct a combination of an up-convolutional and dense branches taking z as input and predicting $\hat{X}$, the reconstructed X. The convolutional path, owing to its neighbourhood processing, models the ‘average’ regions, while the dense path reconstructs the finer structures. This combination of the point-net and the decoder forms our point cloud auto-encoding (pAE, or interchangeably AE) architecture as illustrated in Fig. 2.

Loss. Reconstructing point clouds requires comparing the predicted PC with the actual PC to back-propagate the loss during training. However, owing to the unordered nature of PCs, usual regression losses cannot be employed. Two prominent candidates for such a task are the Chamfer distance and the Earth Mover (EM) distance [7]. We observed that minimising EM distance ignores the natural variation in shapes (e.g. the processes of the vertebrae) and reconstructs only a mean representation (e.g. the vertebral body), as validated in [7]. Since we intend to model the natural variance in the data, using EM distance is undesirable in our case. We thus employ the Chamfer distance computed as:

$$\begin{aligned} d_{ch}(X, \hat{X}) = \mathcal {L}_{ae} = \sum _{p \in X} \min _{\hat{p} \in \hat{X}} || p - \hat{p} ||_2^2 + \sum _{\hat{p} \in \hat{X}} \min _{p \in X} || p - \hat{p} ||_2^2. \end{aligned}$$

(1)

In essence, $d_{ch}$ is the distance between a point in X and its nearest neighbour in $\hat{X}$ and vice versa.

2.2 Probabilistic Reconstruction

From a generative modelling perspective, an AE can be seen to predict the parameters of Gaussian distribution imposed on X, i.e. $p_{\varTheta }(X) = \mathcal {N}(X | \hat{X}, \hat{\mathrm {\Sigma }})$, parameterised by the weights of the AE denoted by $\varTheta $. Determining the distribution parameters, viz. optimising for the AE weights, now involves maximising the log-likelihood of X, resulting in:

$$\begin{aligned} \varTheta ^* = \mathop {{{\,\mathrm{arg\,max}\,}}}\limits _\varTheta \log p_\varTheta (X) = \mathop {{{\,\mathrm{arg\,min}\,}}}\limits _\varTheta \frac{1}{2}(X - \hat{X})^T\hat{\mathrm {\Sigma }}^{-1}(X - \hat{X}) + \frac{1}{2}\log |\hat{\mathrm {\Sigma }}|. \end{aligned}$$

(2)

This perspective towards auto-encoding enables us to extend the pAE to encompass the data variance ($\hat{\mathrm {\Sigma }}$) while modelling the latent space, as described in following sections. It is important to note that the difference $X-\hat{X}$ is not well defined for point clouds, requiring us to opt for alternatives.

Assuming $\mathrm {\Sigma } = \mathbb {I}$, implying an independence among the elements of X and an element-wise unit variance, results in the familiar mean squared error (MSE), $\mathcal {L} = ||X - \hat{X}||^2$. Based on the parallels between MSE and the Chamfer distance (Eq. 1), we design $\sigma $-AE and $\sigma $-VAE, as illustrated in Fig. 3.

$\varvec{\sigma }$-AE. The assumption of unit covariance, as in AE, is inherently restrictive. However, modelling an unconstrained covariance matrix is infeasible due to quadratic complexity. A practical compromise is the independence assumption. Thus, representing covariance as, $\mathrm {\Sigma } = diag\{\hat{\sigma }_{p_1}^2, \dots , \hat{\sigma }_{p_i}^2, \dots , \hat{\sigma }_{p_N}^2\}$, where $\hat{\sigma }_{p_i}^2$ denotes the variance corresponding to $p_i$, Eq. (2) morphs to a loss function as:

$$\begin{aligned} \mathcal {L} = \sum _{\hat{p} \in \hat{X}} \sigma _{\hat{p}}^{-2} ||p_i - \hat{p_i}||^2 + \log \sigma _{\hat{p}}^2 \end{aligned}$$

(3)

This optimisation models the aleoteric uncertainty [11]. Equation 3 is an attenuated MSE, where a high variance associated to a point down-weighs its contribution to the loss. However, due to the lack of a reference grid in the point cloud space, the notion of uncertainty being associated to a data point (eg. pixel, spatial location etc.) is absent. We propose to associate the notion of variance to every point, $\hat{p_i}$. This results in the variance-modelling Chamfer distance:

$$\begin{aligned} \mathcal {L}_{\sigma ae} = \sum _{p \in X} \min _{\hat{p} \in \hat{X}} \sigma _{\hat{p}}^{-2}|| p - \hat{p} ||_2^2 + \sum _{\hat{p} \in \hat{X}} \sigma _{\hat{p}}^{-2} \min _{p \in X} || p - \hat{p} ||_2^2 + \log \sigma _{\hat{p}}^2 \end{aligned}$$

(4)

Observe the slight abuse of notation in Eq. 4, wherein the variance at a predicted point, $\sigma _{\hat{p}}$, actually represents the variance of the coordinate elements of p, i.e $\{\sigma _{\hat{x}}, \sigma _{\hat{y}}, \sigma _{\hat{z}}\}$. Current notation is chosen to avoid clutter.

Variational and $\varvec{\sigma }$-Variational AE. An alternative approach for modelling p(X) involves modelling its dependency over a latent variable z, which is distributed according to a known prior p(z). A variational auto-encoder (VAE) operates on these principles and involves maximising a lower bound on the log-evidence (referred to as ELBO) of the data described as below:

$$\begin{aligned} \log p(X) \ge \mathbb {E}_{z\sim q_\phi (z|X)}\big [\log p_\theta (X|z)\big ] - \text {KL}\big [q_\phi (z|X)~||~p_\theta (z)\big ], \end{aligned}$$

(5)

where $q_\phi (z|x)$ is the approximate posterior of z learnt by the encoder and parameterised by $\phi $. $p_\theta (X|z)$ is the data likelihood modelled by the decoder and parameterised by $\theta $. $p_\theta (z)$ is the prior on z.

Maximising ELBO is equivalent to maximising the log-likelihood of X while minimising the Kullback-Leibler divergence between the approximate and true prior. Representing the combination as $\mathcal {L}_{rec} + \beta \mathcal {L}_{KL}$, where $\mathcal {L}_{rec}$ is the reconstruction loss seen is earlier sections. $\beta $ is a scaling factor weighing the contribution of the two losses appropriately. Standard practice assigns Gaussian distributions for $q_\phi (z|x) \sim \mathcal {N}(z | \varvec{\mu }_z, \varvec{\sigma }_z)$ and $p(z) \sim \mathcal {N}(z | \mathbf {0}, \mathbf {1})$ (cf. Fig. 3). Thus, $\mathcal {L}_{KL}$ models the latent space to follow a Gaussian distribution inline with the prior. Incorporating this into the point cloud domain, results in an objective function for a PC-based VAE (or $\sigma $-VAE) as $\mathcal {L}_{vae} = \mathcal {L}_{ae/\sigma ae} + \beta \mathcal {L}_{KL}$. Thus, $\sigma $-VAE acts as a AE capable of modelling the data variance while regularising the latent space. The prior on the latent space also imparts point cloud generation capabilities to $\sigma $-VAE.

2.3 Detecting Fractures as Anomalies

Examining the descriptive ability of our pAE architectures in auto-encoding PCs, we utilise them for detecting vertebral fractures. Assuming the AE is trained only on ‘normal’ patterns, a fracture can be detected as an ‘anomaly’ based on its ‘position’ in latent space. We inspect two measures for this purpose:

1.
Reconstruction error or Chamfer distance: AEs trained on healthy samples fail to accurately reconstruct anomalous ones, resulting in a high $d_{ch}$.
2.
Reconstruction probability or likelihood [12]: Expected likelihood $\mathbb {E}\big [p_\varTheta (X)\big ]$ of an input can be computed for $\sigma $ – architectures (cf. Eq. 2). For any input PC, $X_{in}$, it is computed by $\mathcal {N}(X_{in} | \mu _\varTheta ,\varSigma _\varTheta )$ with the predicted mean and variances. We expect fractured vertebrae to be less likely than healthy ones.

Intuitively, relying on the reconstruction error or likelihood for detecting anomalies requires the learnt ‘healthy’ latent space to be representative. Both $\sigma $-AE and the VAE work towards this objective. In $\sigma $-AE, predictive variance down-weighs the loss due to highly uncertain points in the PC. This suppresses the interference due to natural variation in the vertebral PCs. On the other hand, VAE acts directly on the latent space by modelling the encoding uncertainty ($X\mapsto z$). The $\sigma $-VAE encompasses both these features.

Inference. A given vertebral PC is reconstructed and the reconstruction error and (or) likelihood are computed. This vertebra is said to be fractured if the reconstruction error is greater than a threshold, $T_{rec}$, or its likelihood is lesser than a threshold, $T_{l}$. $T_{rec}$ and $T_{l}$ and determined on the validation set.

3 Experiments and Discussion

We present this section in two parts: first, we explore the auto-encoding, variance modelling, and generative capabilities of our AE networks. Second, we deploy these architecture to detect vertebral fractures without supervision.

Data preparation: We evaluate our architecture on an in-house dataset with accurate voxel-level segmentations converted into PCs. The dataset consists of 1525 healthy and 155 fractured vertebrae, denoted as ($1525H + 155F$) vertebrae. Since we intend to learn the distribution of healthy vertebrae, we do not use any fractured vertebrae during training. The validation and test sets consists of ($50H + 55F$) and ($100H+100F$) vertebrae, respectively. For the supervised baselines, the train set needs to contain fractured vertebrae. Thus, validation and test sets were altered to contain ($50H + 55F$) and ($55H + 55F$) vertebrae.

Training: The architecture of the encoder and the decoder is similar across all architectures (cf. Fig. 3) except for the layers predicting variance. PCs are augmented online by perturbing the points with Gaussian noise and random rotations ($\pm 15^\circ $). Finally, the PCs are median-centred to origin and normalised to have the same surface area. The networks are trained until convergence using an Adam optimiser with an initial learning rate of $5 \times 10^{-4}$. Specific to the VAE, we use KL-annealing by increasing $\beta $ from 0 to 0.1.

Qualitative Evaluation of AE Architectures. We investigate if meaningful shape features can be learnt without supervision. Validating this, in Fig. 4a, we plot a TSNE embedding of the test set latent vectors learnt by a naive pAE and $\sigma $-VAE trained only on healthy vertebrae. Observe the clusters formed based on the vertebral index and the transition between the indices. This corresponds to the natural variation of vertebral shapes in a human spine. Indicating the fractured vertebrae in the embedding, we highlight their degree of similarity with the healthy counterparts. Also, observe that embedding is more regularised representing a Gaussian in case of $\sigma $-VAE, indicating the continuity of the learnt latent space. Figure 4b shows the predictive variance modelled by the $\sigma $-VAE. Posterior elements of a vertebrae are the most varying among population. Observe this being captured by the variance in the vertebral process regions. Lastly, illustrating $\sigma $-VAE’s generative capabilities, Fig. 4c shows vertebral PC samples generated by sampling the latent vector, $z\sim \mathcal {N}(\mathbf {0}, \mathbf {1})$.

Vertebral Fracture Detection. Evaluating the reconstruction quality of our pAE architectures, we employ them to detect fractures as anomalies. As baselines, we choose two supervised approaches: (1) point-net (PN), the encoding part in our pAE architectures, cast as a binary classifier and (2) the same point-net trained with median frequency balancing the classes (ref. as PN$_{bal}$) to accentuate the loss from minority fractured class. We report their performance in Table 1, over 3-fold cross-validation while retaining the ratio of healthy to fractured vertebrae in the data splits. Frequency balancing improves the F1 score significantly, albeit not at the level of the proposed anomaly detection schemes.

Table 1. Performance comparison of unsupervised and supervised fracture detection approaches. Measures: Precision (P), Recall (R), F1-score, and area-under-ROC curve (AUC) computed by varying thresholds on recon. error and recon. log-probabilities. Since supervised models have no threshold selection, AUC is not reported.

Full size table

Reconstruction for fracture detection: When detecting fractures based on reconstruction error ($d_{ch}$), we observe that a naive pAE already out-performs the supervised classifiers (cf. Table 1). On top of this, we see that latent space modelling and variance modelling individually offer an improvement in F1-scores while increasing the AUC, indicating a stable detection of fractures. The performance of both $\sigma $-AE and $\sigma $-VAE is similar indicating the role of loss attenuation. However, the advantage of explicitly regularising the latent space for $\sigma $-VAE can be seen in likelihood-based anomaly detection, where $\sigma $-VAE outperforms $\sigma $-AE. Figure 5 compares a reconstruction of a healthy and fractured vertebrae of the same vertebral level. Note the high reconstruction error and a low log-likelihood spatially corresponding to the deformity due to fracture.

4 Conclusions

We presented point-cloud-based auto-encoding architectures for extracting descriptive shape features. Improving their description, we incorporated variance and latent space-modelling capability using specially defined PC specific losses. The former captures the natural variance in the data while the latter regularises the latent space to be continuous. Deploying these networks for the task of unsupervised fracture detection, we achieved an AUC of 76% without using any intensity or textural features. Future work will combine the extracted shape signatures with textural features e.g. bone density and trabecular texture of vertebrae to perform fracture-grade classification.

References

Ingalhalikar, M., et al.: Sex differences in the structural connectome of the human brain. Proc. Natl. Acad. Sci. 111(2), 823–828 (2014)
Article Google Scholar
Shakeri, M., Lombaert, H., Tripathi, S., Kadoury, S.: Deep spectral-based shape features for Alzheimer’s disease classification. In: Reuter, M., Wachinger, C., Lombaert, H. (eds.) SeSAMI 2016. LNCS, vol. 10126, pp. 15–24. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-51237-2_2
Chapter Google Scholar
Isensee, F., et al.: Brain tumor segmentation and radiomics survival prediction: contribution to the brats 2017 challenge. arXiv e-prints (2018)
Google Scholar
Bronstein, M.M., et al.: Geometric deep learning: going beyond euclidean data. IEEE Sig. Process. Mag. 34(4), 18–42 (2017)
Article Google Scholar
Qi, C.R., et al.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)
Google Scholar
Gutiérrez-Becker, B., Wachinger, C.: Deep multi-structural shape analysis: application to neuroanatomy. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11072, pp. 523–531. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00931-1_60
Chapter Google Scholar
Fan, H., et al.: A point set generation network for 3D object reconstruction from a single image. In: CVPR (2017)
Google Scholar
Yang, Y., et al.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: CVPR (2018)
Google Scholar
Baum, T., et al.: Automatic detection of osteoporotic vertebral fractures in routine thoracic and abdominal MDCT. Eur. Radiol. 24(4), 872–880 (2014)
Article Google Scholar
Tomita, N., et al.: Deep neural networks for automatic detection of osteoporotic vertebral fractures on CT scans. Comput. Biol. Med. 98, 8–15 (2018)
Article Google Scholar
Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: NIPS (2017)
Google Scholar
An, J., Cho, S.: Variational autoencoder based anomaly detection using reconstruction probability. Technical report, SNU Data Mining Center (2015)
Google Scholar

Download references

Acknowledgements

This work is supported by the European Research Council (ERC) under the European Union’s ‘Horizon 2020’ research & innovation programme (GA637164–iBack–ERC–2014–STG). The Quadro P5000 used for this work was donated by NVIDIA Corporation.

Author information

Authors and Affiliations

Department of Informatics, Technical University of Munich, Munich, Germany
Anjany Sekuboyina & Bjoern H. Menze
Department of Neuroradiology, Klinikum rechts der Isar, Munich, Germany
Anjany Sekuboyina, Alexander Valentinitsch, Maximilian Loeffler & Jan S. Kirschke
Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
Markus Rempfler

Authors

Anjany Sekuboyina
View author publications
You can also search for this author in PubMed Google Scholar
Markus Rempfler
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Valentinitsch
View author publications
You can also search for this author in PubMed Google Scholar
Maximilian Loeffler
View author publications
You can also search for this author in PubMed Google Scholar
Jan S. Kirschke
View author publications
You can also search for this author in PubMed Google Scholar
Bjoern H. Menze
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anjany Sekuboyina .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2932 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sekuboyina, A., Rempfler, M., Valentinitsch, A., Loeffler, M., Kirschke, J.S., Menze, B.H. (2019). Probabilistic Point Cloud Reconstructions for Vertebral Shape Analysis. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11769. Springer, Cham. https://doi.org/10.1007/978-3-030-32226-7_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-32226-7_42
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32225-0
Online ISBN: 978-3-030-32226-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)