Keywords

1 Introduction

In computer-assisted interventions, correctly aligning preoperative data to real-time acquired intraoperative images remains a challenging topic, especially when large deformations are involved and only sparse input data is available. This is typically the case when trying to provide an augmented view of an organ during surgery. In this context, just about 30% of the surface of the organ is visible due to the limited field of view of the laparoscopic camera or size of the incision [2]. Several works have demonstrated the benefits of physics-based models, particularly patient-specific biomechanical models, for accurate registration between different preoperative 3D anatomical model and intraoperative data [3,4,5,6]. These models are usually derived from continuum mechanics theory, where finite element (FE) methods are preferred due to their ability to numerically solve the complex partial differential equations associated with the constitutive models.

When considering augmented reality or, more generally, real-time tracking of the organ’s deformation to provide up-to-date guidance, the computational efficiency of the FE method is essential. To this end, various solutions have been proposed, with different trade-offs regarding the ratio between computation time and model accuracy [5, 7, 8]. The choice of the model, and its parameterization, are obviously key to an accurate registration. It is usually acknowledged that a registration of internal structures below 5 mm is needed for best clinical impact, such as targeting relatively small tumors.

Without lack of generality, let us consider the scenario where an augmented view of the liver has to be provided during hepatic surgery. The intraoperative data is provided as a point cloud, reconstructed from the laparoscopic images [13, 14] or direct camera view [9] of the operative field. Images are usually acquired at 20 Hz or more, and latency between data acquisition and display of the augmented image needs to be minimized. This leads to update times of less than 50 ms, during which image acquisition, image processing and model update need to take place. As a result, FE computation times should require less than 30 ms. This is usually achieved using efficient implementation of specific elastic models.

In surgery where only small deformations take place, achieving such computation times is feasible, as illustrated in [10]. However, when considering large, non-linear deformations, involving typical soft tissue hyper-elastic behavior, computation times become incompatible with augmented reality constraints. A solution towards this objective has been to use a co-rotational technique, allowing to cope with large displacements (but small strain) behavior [4, 9]. Yet, if more advanced laws from biomechanics are to be used, these optimizations (and simplifications) no longer hold. Alternative solutions have been proposed to avoid these restrictions. For instance, Marchesseau et al. [11] introduced the Multiplicative Jacobian Energy Decomposition (MJED) which is an alternative to the Galerkin FEM formulation. This method for discretizing non-linear hyperelastic materials on linear tetrahedral meshes can lead to fast and realistic liver deformations including hyperelasticity, porosity and viscosity. Also, Miller et al. [12] introduced TLED, an efficient numerical algorithm for computing deformations of very soft tissues. By using a total Lagrangian formulation, most spatial derivatives can be pre-computed, leading to fast simulations of non-linear material.

In this paper, we propose a method allowing for extremely fast and accurate simulations by using an artificial neural network that partially encodes the stress-strain relation in a low-dimensional space. Such a network can learn the desired constitutive model, and predict deformations under new, complex, conditions. As a result it can be used for augmented reality applications, where both complex tissue deformation and boundary conditions are involved. Our method consists in two main components: a deep network architecture, described in Sect. 2.1, which is derived from model order reduction techniques, and an immersed boundary method, used for generating patient-specific simulations, using hexahedral finite elements Sect. 2.2. Results presented in Sect. 3 show the efficiency of the method when applied to both synthetic and ex vivo scenarios.

2 Method

A typical pipeline for providing an augmented view of an organ during surgery is described by Fig. 1. The only assumption here is that we have a method for extracting a partial point cloud of the organ during surgery, either from a laparoscopic image or other system, such as RGB-D camera.

Fig. 1.
figure 1

Augmented reality pipeline: preoperative internal structures are mapped in real-time onto the live image of the organ using a FEM model.

In such a pipeline we propose to accelerate the FE computation step by replacing it with a non-linear dimensionality reduction technique based on a U-Net architecture. Dimensionality reduction techniques have shown real benefits in speeding-up FE simulations. Among them, POD is a very popular one since it leads to very realistic real-time simulations. Although authors in [17] have shown good results using this method for large deformation, POD is better suited for linear or weakly non-linear processes. Our method uses a deep neural network to learn the relationship between sparse surface data and volumetric deformation of a given mesh.

2.1 U-Net Architecture

The 3D point cloud provided by the RGB-D sensor is a sparse and partial view of the surface deformation of the organ. Our problem consists of finding the function f that produces the best estimation of the internal deformation of the organ from such point cloud. The function f is found by minimizing the expected error over a training set \(\{(\mathbf {u_s}^n, \mathbf {u_v}^n)\}_{n=1}^N\) of N samples:

$$\begin{aligned} \min _{\theta } \dfrac{1}{N}\sum _{n=1}^N \Vert f(\mathbf {u_s}^n) - \mathbf {u_v}^n\Vert ^2_2. \end{aligned}$$
(1)

where \(\theta \) is the set of parameters of the network f, and \(\mathbf {u_s}\) and \(\mathbf {u_v}\) are the surface input and the volumetric output displacement fields of each sample. In order to characterize f, we propose to use the U-Net [1], a modified fully convolutional network initially built for precise medical image segmentations. As depicted in Fig. 2, the network is similar to an auto-encoder, with an encoding path to transform the input space into a low-dimensional representation, and a decoding path to expand it back to the original size. Additional skip connections transfer detailed information along matching levels from the encoding path to the decoding path. The encoding path consists of k sequences of two padded \(3\times {}3\times {}3\) convolutions (\(k=4\) in [1]) and a \(2\times {}2\times {}2\) max pooling operation (see Fig. 2). At each step, each feature map doubles the number of channels and halves the spatial dimensions. In the bottom part there are two extra \(3\times {}3\times {}3\) convolutional layers leading to a 1024-dimensional array. In a symmetric manner, the decoding path consists of k sequences of an up-sampling \(2\times {}2\times {}2\) transposed convolution followed by two padded \(3\times {}3\times {}3\) convolutions. The features from the encoding path at the same stage are cropped and concatenated to the up-sampled feature maps. At each step of the decoding path, each feature map halves the number of channels and doubles the spatial dimensions. There is a final \(1\times {}1\times {}1\) convolutional layer to transform the last feature map to the desired number of channels of the output (3 channels in our case). The up- and down-sampling process implies to use a grid-like structure for storing displacement information. Although this could be interpolated from a non-structured FEM mesh, we prefer here to rely on hexahedral mesh, as explained below.

Fig. 2.
figure 2

U-Net architecture with 3 steps and 128 channels in the first layer for a padded input grid of size \(32\times 40 \times 16 \times 3\).

2.2 Soft Tissue Deformation on Hexahedral Grids

To train our network, we need to generate many samples consisting of a volumetric output displacement field \(\mathbf {u_v}\), given a nearly random input displacement \(\mathbf {u_s}\). The simulation that generates this data needs to be as accurate as possible, yet computationally efficient to make it possible to generate data and train the network in less than a day (i.e. in the shortest amount of time available between preoperative data acquisition and surgery). While several choices of constitutive models can be made to describe soft tissue biomechanics, we want here to emphasize the choice of elements used in our FE implementation. Instead of using 4-node tetrahedral elements, as it is very often the case in the literature, we choose here to rely on 8-node hexahedral elements. This choice is governed by their better convergence, and lock-free behavior, especially with close to imcompressible materials (such as the liver) and strong shear stresses [19]. In addition, we combine this choice of elements with an immersed-boundary method, which allows us to create a hexahedral mesh directly from the segmented image and produces a sparse, regular grid of elements which matches the structure of the first layer of our U-net. Since most soft tissue exhibit an hyperelastic behavior, we chose to implement a Saint Venant-Kirchhoff model, for its relative simplicity and computational efficiency. This hyperelastic material model is just an extension of the linear elastic material model to the nonlinear regime. It is derived from Cauchy’s first law of motion, which states the conservation of linear momentum in a continuum. The second Piola-Kirchhoff stress tensor \(\varvec{S} = 2\mu \varvec{\epsilon } + \lambda tr(\varvec{\epsilon }) \varvec{I}\) conveys the amount of stress (\(N/m^2\)) sustained by the material undergoing a certain deformation, described by the strain tensor \(\varvec{\epsilon }(\varvec{u}) = \frac{1}{2} (\varvec{\nabla } \varvec{u}^T + \varvec{\nabla } \varvec{u} + \varvec{\nabla } \varvec{u}^T \cdot \varvec{\nabla } \varvec{u})\). Here, \(\lambda \) and \(\mu \) are the Lamé elasticity parameters and \(\varvec{u} = [u\, v\, w]^T\) is a displacement vector from position \(\varvec{x}^0\) in the undeformed state of the body to its deformed position \(\varvec{x} = \varvec{x}^0 + \varvec{u}\). Cauchy’s first law of motion can finally be translated into the system of partial differential equations \(- \varvec{\nabla } \cdot \varvec{\sigma } = \varvec{f}_{ext}\) which pose our set of rules for the simulation process. The weak form of those rules, obtained from the principle of virtual work, brings forward the boundary terms and allows to solve the nonlinear system of equations of the Saint Venant-Kirchhoff model through a discretization of the domain into a finite set of hexahedral elements (Fig. 3).

Fig. 3.
figure 3

Example of sparse grid discretization generated from a preoperative CT. Combined with an immersed boundary method, it allows the use of regular hexahedral meshes to simulate the deformation of the organ, and can be used with our CNN.

The choice of discretizing the initial domain with a sparse and regular grid brings several benefits. The usual tri-linear interpolation functions of an 8-nodes hexahedral element are reduced to a linear mapping, and, similar to 4-nodes tetrahedral elements, its jacobian remains constant inside the element. However, using such discretization requires particular care of the boundary elements. Volume integration of the displacement field inside these partially filled cells is carried on by recursively subdividing the cell into 8 sub-cells. The stiffness matrix inside a boundary cell is then accumulated from its sub-cells using the linear mapping, as in [16]. Since we are using a fine mesh of the domain, only one level of subdivision is enough to obtain an accurate approximation of the volume integral. Finally, the non-linearity of the strain tensor requires an iterative Newton-Raphson method to solve the non-linear system of equations approximating the unknown displacement. Since the convergence of the Newton-Raphson method is only valid for a displacement near the solution, large external loads must be applied by small increments, which can in turn require a large number of iterations to converge. This is an important characteristic, as one would need to restrict those iterations during a simulation if such nonlinear model were to be used directly in the registration process. In our case, these computations are done in the training stage and do not need to meet real-time constraints.

2.3 Data Generation for U-Net Training

To train the U-Net, a data set of pairs \((\mathbf {u_s},\mathbf {u_v})\) is generated, where the surface displacement corresponds to the mapped point cloud. We assume that, at most, half of the organ surface is visible from the camera. Hence, we uniformly sample 100 points on the visible surface mesh. Hundred simultaneous forces are then applied to this virtual point cloud with random amplitudes and directions. In order to generate a series of surface deformations (see Fig. 4(b)), these random forces are applied to their enclosing grid nodes using their barycentric coordinates. Patient-specific parameterization of the biomechanical model is not required since for homogeneous materials, the relation between the surface and the volumetric displacements is independent of the stiffness of the object [21], and only depends on the Poisson’s ratio. As a result, we set the Poisson’s ratio to 0.49 (soft tissues are generally incompressible).

3 Results

Results below are obtained using an ex vivo human liver data set, for which surface data was obtained with an RGB-D camera and ground truth data acquired at different stages of deformation using a CT scan. Markers were embedded in the liver to compute Target Registration Errors (TRE).

Predictions on Controlled Synthetic Deformations. To validate the performance of our network, we compare the predictions of the displacement field with the corresponding FE simulation, using the model described in Sect. 2.2. We generated 1, 000 samples and used \(N=800\) samples to train the network by minimizing Eq. (1). The minimization is performed using the Adam optimizer: a stochastic gradient descent procedure with parameter-wise adjusted learning rates. The remaining 200 samples are then used for validation. For each sample, the average target registration error (TRE) is computed over 10 virtual markers. The obtained average TRE over the validation set is \(\overline{\text {TRE}}\) = 1.96 ± 3.46 mm.

Fig. 4.
figure 4

(a) U-Mesh prediction in green, co-rotational FEM based registration in blue and ground truth in red. A mean TRE of 2.92 mm is achieved over the 10 markers in about 3ms. (b) Samples of the generated deformations using our FE method. The rest shape is shown in gray. (Color figure online)

Application to Augmented Reality. With a PyTorch implementation of the U-Net running on a GeForce 1080 Ti, the network can predict the volumetric deformation of the liver in only 3 ms. We can then apply this prediction each time the RGB-D camera generates a point cloud. Before computing the displacement field, the point cloud needs to be cropped to the portion of the surgical image that contains the liver. This is done by segmenting the associated color image, similarly as in [15]. The RGB-D point cloud is then interpolated onto the grid to obtain per-node displacements on the surface (i.e. \(\mathbf {u_s}\)). Given this input, the network predicts the volumetric deformation, and the next point cloud can be processed. When compared to our ex vivo ground truth, the average TRE at the 10 markers is of only \(2.92\,mm\) with a maximal value of 5.3 mm. The same scenario, but this time using a co-rotational FE method, leads to an average TRE of 3.79 mm and is computed in about 25 ms. The solution of the Saint Venant-Kirchhoff model, used to train our model, gives nearly the same error (which was expected) but for a computation time of 1550 ms, even when using a very efficient linear solver (PardisoFootnote 1).

4 Conclusion

We have presented a deep neural network approach that can learn complex elastic deformations of a liver and generate its deformed state about 500\(\times \) faster than a reference FE solution. Since the network takes as input a regular sparse grid where displacements are imposed, we have shown an efficient FE immersed-boundary method based on the same hexahedral discretization from which thousands of deformed configurations are generated to train the network. Based on a U-net architecture, which emulates a model order reduction technique, complex and accurate deformation of a preoperative organ model are computed in only a few milliseconds. Driven by surface displacement data, it makes this approach an ideal candidate for providing augmented reality during surgery.