Abstract
Powerful non-invasive imaging technologies like computed tomography (CT), ultrasound, and magnetic resonance imaging (MRI) are used daily by medical professionals to diagnose and treat patients. While 2D slice viewers have long been the standard, many tools allowing 3D representations of digital medical data are now available. The newest imaging advancement, functional MRI (fMRI) technology, has changed medical imaging from viewing static to dynamic physiology (4D) over time, particularly to study brain activity. Add this to the rapid adoption of mobile devices for everyday work and the need to visualize fMRI data on tablets or smartphones arises. However, there are few mobile tools available to visualize 3D MRI data, let alone 4D fMRI data. Building volume rendering tools on mobile devices to visualize 3D and 4D medical data is challenging given the limited computational power of the devices. This paper describes research that explored the feasibility of performing real-time 3D and 4D volume raycasting on a tablet device. The prototype application was tested on a 9.7” iPad Pro using two different fMRI datasets of brain activity. The results show that mobile raycasting is able to achieve between 20 and 40 frames per second for traditional 3D datasets, depending on the sampling interval, and up to 9 frames per second for 4D data. While the prototype application did not always achieve true real-time interaction, these results clearly demonstrated that visualizing 3D and 4D digital medical data is feasible with a properly constructed software framework.
Keywords: Computer graphics, Volume rendering, Functional imaging
Introduction
The last 20 years have seen medical imaging technology expand beyond the traditional viewing of anatomical features inside the body to an array of visualization technologies commonly used to view physiological processed in the body. This includes positron emission tomography (PET) for viewing metabolic processes and diffusion tensor imaging (DTI), high angular resolution diffusion imaging, and functional magnetic resonance imaging (fMRI) among others. The versatility of MR techniques makes it one of the most popular formats for research and everyday use. Although fMRI is one form of MR imaging, it is becoming very popular in identifying brain activity [1].
The impact of fMRI on the areas of science, clinical practice, cognitive neuroscience, mental illness, and society has been significant [2]. In 1992, there were zero publications with the word fMRI in the title, abstract, or a keyword. In 2005, there were close to 2500 research publications meeting those criteria, showing the growth in fMRI research being performed [3].
As the use of functional imaging technology increases, it is important to provide visualization tools that allow researchers and medical professionals to harness the power of fMRI. This research presented in this paper focuses on visualizing fMRI data obtained from brain activity studies and stored in the NIfTI file format [4]. However, any functional medical data stored in the NIfTI format can be visualized and the general methods themselves can be applied to any type of functional medical data.
Magnetic Resonance Imaging
Medical imaging technology can obtain data from different directions or angles depending on the underlying technology. Whereas a CT scan typically captures along one axis of the body (although any axis can be visualized), MRI technology can effectively sample from any angle. Each method generates a series of 2D “slices” orthogonal to the sampled axis. This series of 2D slices can then be combined to create a single 3D block of data representing the entire scan of the patient. This 3D block of data is known as a volume representation.
3D Volume Rendering
The volume representation must then be visualized to enable medical professionals to gain insight about the patient. Volumetric data is different from the surface data traditionally used in computer graphics programming, such as computer games, because it does not contain any defined surfaces or edges. Therefore, surface rendering techniques are limited in their use for visualizing volumetric data. Volume rendering is a different type of graphics rendering technique, which allows a more complete visualization of volumetric data. Unlike surface rendering that represents an object as a series of vertices and polygons making up an outer shell to show the object’s shape, volume rendering sees an object as a three-dimensional lattice of vertices similar to a Rubik’s cube. This allows volume rendering to visualize not only the shell of the data but the values inside the volume as well. Figure 1 shows the difference between simply mapping the 3D texture from the data onto a cube (i.e., volume) with no compositing and data prepared through a full volume rendering pipeline. In the left image (mapped 3D texture), only the faces of the data can be seen, whereas in the right image, the internal structure of the volume can be viewed.
Volume rendering can generally be broken down into four main techniques: (1) image splatting [5–8], (2) shear warp [9, 10], (3) texture slicing [11, 12], and (4) raycasting [13, 14]. Of all these methods, raycasting produces the most accurate visual representation, but at a high computational cost.
Volume Raycasting Pipeline
Raycasting is a direct volume rendering technique that involves casting linear segments (i.e., “rays”) from each pixel in the frame buffer through the volume in the view direction [13]. Points along the path of the ray that intersect with the volume are sampled and composited together to generate the final pixel color.
The volume rendering process begins with acquiring a volumetric dataset. The volumetric data is comprised of a set of samples, known as voxels, in three dimensions (i.e., x, y, and z). Each voxel contains a measured value, v. In medical imaging, the voxel values typically represent a measured property of interest. In fMRI brain scans, the voxel value is a scalar value representing the blood oxygenation level-dependent (BOLD) signal change [1]. This is used because neural activity has been linked with local changes in brain oxygen content [15]. The change of oxygen levels over time also requires the addition of a time component, t, to the volume data sample, resulting in five values per voxel (x, y, z, t, v).
The volume rendering pipeline defines the steps taken by the computer to translate volumetric data into a computer-generated 3D representation. It should be noted that although the representation is computed and drawn in 3D, it is drawn on a flat 2D screen. To truly view it in 3D, a stereo representation should be used, but that is beyond the scope of this work. Rendering pipelines differ depending on the application or the type of data being used. The pipeline used in this research consists of resampling, classification, coloring, and compositing steps [16]. These four steps make up the core of the volume rendering pipeline. Additional steps for gradient computation and shading were not needed as medical imaging is typically not concerned with photorealistic rendering. Instead, medical imaging is concerned with providing contrast in the data to help identify physiological objects. Thus, the steps chosen for this research are common to many volume renderers available as open source or commercially.
Resampling
Resampling is the first step in the render pipeline. The goal of resampling is to measure the volume data, comprised of voxels, at different positions in three-dimensional space. Each voxel is commonly represented as a cube in medical imaging where each vertex represents a value. When sampling the volume, it is rare that a ray can sample a voxel’s vertex directly. More commonly, the sampling point resides within the voxel. Trilinear interpolation can be used to approximate the value given its position within a voxel. Trilinear interpolation was chosen for this research due to its reasonable visual quality and minimal computational requirements [17].
The sampling techniques used in resampling are one of the primary differences between different volume renderers. For raycasting, an imaginary ray is cast into the volume from each screen pixel and samples are taken along the ray at evenly spaced intervals. Figure 2 shows a 2D representation of raycasting where the person’s eye is on the left looking at the computer screen represented by the black line, as rays are cast from each pixel into the gray volume.
For speed considerations, it was important to first determine the bounds of the volume before casting the rays. This cuts down on the total number of samples taken by the ray and thus speeds up rendering. The bounds of the volume were found by sending imaginary rays, r i, from each pixel through the scene in the viewing direction. Each ray looks for the first intersection with the volume; the blue boundary f i; and the last intersection with the volume, the green boundary l i. Rays that did not intersect with the volume were rendered the background image color.
Rays that intersect the volume take samples at specified intervals, shown by the dashed lines in Fig. 2, where each dash would represent a single sample. The sampling interval can be changed to accommodate different implementation goals. The tradeoff is smaller sampling intervals resulting in higher quality images at the expense of more computations. The sampling interval used in this implementation was set to the width of a single voxel, after ad-hoc testing. This value can be changed in further implementations if desired. Raycasting does not require the voxel data to be isotropic.
Classification
Classification determines the subset of samples that make up the final image by mapping the sample’s intensity to an opacity value between zero and one. This range indicates how much of that voxel’s data should be included in the final image with zero being completely transparent and one being completely opaque. The mapping between voxel intensity and opacity is known as an opacity transfer function [18, 19]. Creating an opacity transfer function can be very complex depending on the type of data being viewed. The opacity transfer function used in this implementation is a normal distribution with the high and low values equating to completely transparent and the median equating to completely opaque [20].
Coloring
Coloring is used to map a sample’s intensity to a color using color transfer functions. The purpose of coloring is not always to provide photo realism but to provide contrast within the data to help identify desired features. Coloring is challenging because volume data typically assigns a single value (intensity in medical imaging) to each voxel and not a traditional three-component color, like red, green, and blue. Therefore, coloring methods must intelligently convert a single value into three different values in a way that makes desired structures more visible. This conversion is typically accomplished by using different color transfer functions (a.k.a color lookup tables [21]) for each color channel. Different organizations and institutions, such as the National Institute of Health (NIH), create their own color transfer functions for different types of data. This research used some of these institutional color transfer functions.
Compositing
Compositing is the final step in the rendering pipeline and is the process of taking all the values sampled by the ray and combining them into a single color to be used for that pixel. The front-to-back compositing method used Eq. 1 to calculate the composite intensity value, I(x,y), for each ray. The composite intensity value is a sum of the sample point intensities, I i, multiplied by all the transparencies (1-αj) encountered previously along the ray. Put another way, each voxel sample can be thought of as a pane of colored glass that has some opacity, α. If the first pane of glass is completely opaque, the following panes of glass cannot be seen. If the first pane of glass is 75% opaque, the second pane of glass can be seen but its color will only be 25% visible at best. Compositing works the same way: the higher the opacity of a sample, the less the following sample’s intensity can contribute to the final intensity value.
1 |
Each voxel sample’s intensity value, I i, is a combination of the color, C i, from the color transfer functions and the opacity, α i, from the opacity transfer function. Equation 2 shows how these two values are multiplied together to compute the voxel sample’s color. The higher the opacity, the more intense the resulting color contribution is:
2 |
Front-to-back compositing is continuously evaluating the current voxel sample’s intensity and blending it with previous samples. This constant evaluation is what allows front-to-back compositing to achieve a performance benefit from early ray termination. When the cumulative opacity reaches 1.0, there is no need to continue compositing along the ray because the contributions of all subsequent samples would be zero. Thus, the render loop can be exited for that ray with no impact to the final pixel color.
Raycasting Optimizations
Much research has been performed to handle the enormous computational resources required by raycasting and improve its speed. One of the most common methods is early ray termination (also known as adaptive termination) [22]. Early ray termination was originally proposed by Whitted [23] as an adaptively terminating raytracing algorithm. Levoy [24] took this idea and developed two front-to-back volume raycasting algorithms. The first was a case where the ray traversal should be terminated when an opaque voxel is reached. The second case terminates the ray traversal when the accumulated opacity reaches a user-specified level where additional samples will not dramatically change the final pixel color.
Another significant challenge of raycasting is the limited texture memory available in commodity computers. Traditional 3D MRI datasets usually range from 128 × 128 pixels per slice to 1024 × 1024. For example, consider an MR dataset that is 512 × 512 pixel per slice. This equates to 0.75 megabytes (MB) per slice if each pixel is 3 bytes, if JPEG compression is used. It is common to have datasets on the order of 300 to 1000 slices, resulting in 225 to 750 MB per dataset, respectively. It is common to store the slice data as a 32-bit per pixel texture for the graphics card, resulting in a required memory of 300 to 1000 MB. A dedicated graphics card is typically necessary to handle this amount of texture memory effectively, and most commodity computers do not come with dedicated graphics cards.
4D Volume Raycasting
Moving from 3D to 4D to accommodate functional data presents its own sets of problems and limitations. All of the issues relevant to 3D volume rendering remain present in 4D rendering, but new ones get added to the list. In 3D volume rendering, there is one set of static volumetric data. Organs are caught in a single position and represented in that “frozen” state. In 4D rendering, there is volumetric data over time (i.e., per a defined time step). This causes the amount of data to increase linearly with the number of time steps taken. Using the previously discussed 3D dataset as an example, if the same 225 to 750 MB 3D volume was recorded for 100 time steps, the amount of memory increases to 22.0 2 to 73.2 GB. Thus, methods are required to handle the increase in data, along with other issues to be discussed later, to allow rendering at interactive speeds. The most common method for dealing with the increase in data when moving from 3D to 4D data involves decomposing the original data into two individual datasets. The first is a high-resolution scan holding all the structural data that does not change over time. For example, in an fMRI of the brain, this would equate to the structure of the brain itself. The second scan is a set of lower-resolution scans capturing changes. In the previous example, this would be the brain activity.
Dividing the fMRI data into two scans reduces the total amount of volumetric data that changes from one frame to the next but introduces the challenge of rendering multiple volumes at the same time. Some researchers approached this problem with the assumption that multiple volumes were correctly aligned in 3D space and never moved in relation to each other [25–27]. This assumption works well for datasets like fMRI brain activity where the functional and structural data never move independently of each other. Once the volumes are aligned in 3D space, the ray casts into the volume samples once from each volume at every sampling point along the ray. The process of scaling and aligning two volumes of different resolutions will be discussed in more depth in section combining 3D and 4D data into one representation.
Multiple volume rendering uses the same volume rendering pipeline described earlier, with the exception of sampling all volumes at each point in space during resampling. There are two different methods of combining those samples together. The first is one property per point (OPP), where a single volume’s sample is selected to be the final value at that point [27]. The ability to use only a single volume’s value was found to be useful when certain characteristics were more important than others (e.g., EEG brain activity would be more important to visualize than the skull).
The second option is multiple properties per point (MPP), where all samples are combined together to achieve a single value for that point [28, 29]. The biggest difference in MPP is typically where in the volume rendering pipeline the samples are combined together. Data mixing can occur in the sampling (resampling), coloring (classification), or image (compositing) stages depending on the desired output [25]. Manssour et al. proposed a method where two volume values were combined after the sample’s color was determined for each volume and combined using a weighting function [26]. Wilson et al., experimented with three different methods of data mixing: (1) choose one volume’s value as the only value; (2) use a weighted function to combine all the values; or (3) use a single volume value each for the red, green, and blue color channels [30]. The advantage to MPP is the ability to mix the samples from both volumes into the final color versus the OPP method that must choose one volume per sample. This provides more control over the final sample color and allows more complex visualization strategies to be used. However, the visualizations can become too complex to decipher if too many volumes are mixed or if the data mixing is not done in a consistent and intuitive way.
Mobile Raycasting
Advances in computational and graphics processing speeds allow for real-time volume raycasting on desktop and immersive VR hardware platforms [31, 32]. However, there has been a dramatic shift in the USA and worldwide toward mobile devices in recent years [33]. Expanding the reach of medical imaging visualization technologies toward supporting these types of devices as they become more common in the medical field will increase adoption and use by doctors and nurses working with patients.
Volume raycasting for mobile devices has been a challenge due to the limited hardware computational power and a lack of support for 3D textures. Mobile devices are generally underpowered compared to desktop computers because they intentionally sacrifice computational power for size to keep devices thin and light and provide ample battery life. However, the largest limitation came from the lack of 3D texture support that required 3D volume data to be stored as a series of 2D textures. Storing volume data in 2D textures required a greater number of graphics operations to map 3D voxel locations to a pixel in set of 2D textures. Additionally, raycasting samples generally fall between voxels requiring some method of interpolation to obtain a correct value. Interpolating 3D positions within a set of 2D textures required a prohibitively large number of graphics operations. It was just not possible to perform all of these calculations quickly enough for real-time applications on a mobile device [34].
The computing power provided by mobile devices and specifically the graphics processing units (GPU) has changed in recent years with mobile GPU technology approaching the performance seen in some laptop computers. There has also been an increase in support for 3D textures that have proven critical for volume raycasting. New GPU languages like Metal [35] and Vulcan [36] reduce computational overhead while increasing draw rates over previous software libraries like OpenGL ES [37].
To date, there has not been a successful real-time volume raycasting implementation for mobile devices. Achieving real-time volume rendering has required using a less computationally expensive method, such as orthogonal texture slicing [38]. The only available application for visualizing fMRI data in 3D on a mobile device is NeuroPub [39], an application for Apple’s iOS platform. NeuroPub does not use volume raycasting and limits the functional data to only 32-bit NIfTI format with 64 × 64 pixel slices. The lack of available fMRI volume raycasting applications indicates the difficulty inherent in building a mobile raycaster that can handle generic fMRI data robustly with real-time interactivity for a user.
The goal of the research presented in this paper was to explore the feasibility of real-time visualization of fMRI data on a mobile device using volume raycasting. The application was developed to support any 3D static anatomical and 4D fMRI brain activity data that can be generically stored in the NIfTI file format.
Materials and Methods
A prototype application was built for iOS tablets using the new Metal graphics programming language to test the feasibility of implementing 3D and 4D volume raycasting methods on a mobile device. The goal of the prototype was to develop an efficient raycasting algorithm that could be run on mobile devices and observe real-time interaction when viewing both 3D static and 4D dynamic volumetric data. Currently, there is no standard for what frame rates qualify as real time, so it is difficult to quantify whether an application is real-time or not. Miller identified 0.1 s as an acceptable graphical response time after input from a light pen [40]. Therefore, a frame rate of 10 frames per second or higher will be considered acceptable as real-time for this research. Volumetric data load times were also considered given they are critical to determining the usability of an application in everyday medical situations. Medical professionals are not likely to wait significant periods of time (e.g., 5–10 min for more) to load a single dataset in their everyday work flow. The following sections will describe the implementation of the raycasting algorithm used in this research, along with the challenges and contributions to the field.
Raycasting with the iOS Metal Shading Language
As mentioned in the “Mobile Raycasting” section, 3D volume raycasting on mobile devices has historically been a challenge due to the limited hardware computational power and a lack of support for 3D textures. Moving to 4D volume rendering of fMRI data increases the amount of data and the number of volumes required to be stored in textures.
The Metal shading language was introduced for iOS devices with support for 3D textures and a focus on low level control of the graphics pipeline. Metal provides the capabilities to develop a more advanced volume rendering method for mobile devices than was previously possible. Implementing the volume raycasting algorithm using Metal required development and optimization of both the application initialization and the render loop for best performance. The initialization of the application required the setup of the Metal graphics pipeline. This includes steps such as initializing the vertex buffers, textures, and compiling the shaders for use. Shaders are small programs that can be written to run on a GPU and is where the volume rendering logic is written. The render loop consisted of passing in the vertex buffers, textures, and uniforms to the shader to process. A shader uniform is simply a value passed into the shader from the application. In general, it is desirable to move as many operations as possible into the initialization step to allow faster drawing in the render loop step. However, doing so sometimes requires increasing the memory footprint on a limited device. The volume raycasting pipeline, as implemented using Metal, can be seen in Fig. 3. The pipeline is broken up into the application compilation, application initialization, and render loop steps. The entire raycasting pipeline implementation using Metal will be discussed in more depth in the following sections.
Initializing the Volume Raycaster
Optimization of the volume renderer began by moving as many operations out of the run loop as possible. For example, Metal provides the option of precompiling the shaders during application compilation. This is different from the previous standard, OpenGL ES, which required compiling the shader code on application launch or dynamically as the application ran. Moving the shader compilation into application compilation improved both application launch speed and frame rates when changing between different shader programs during application runtime. Three different shader programs were written to render the 3D and 4D data.
Traditionally, a shader accepts the volumetric data as an input in the form of a texture or a set of textures. In this research, each volume is passed into the shader as a 3D texture. Each texel in the texture represents a single voxel’s intensity value. The high-resolution structural data is stored in a single 3D texture, and the low-resolution functional data is stored in a series of 3D textures where each texture represents a single snapshot in time.
Each shader program was included within a unique MTLRenderPipelineState, which is a Metal object that encapsulates the rendering configuration used during a graphics rendering pass. Three MTLRenderPipelineState objects were created upon initialization with the first for only 3D structural data, the second for only 4D functional data, and the third for a combination of the two. Creating individual MTLRenderPipelineState objects allowed the rendering process to be optimized for each type of data. MTLRenderPipelineState objects are expensive to create but efficient to swap; therefore, it made sense to create the three different state objects at initialization and swap them as needed at run time.
Loading data is another time-consuming process that is best performed upon initialization. When loading the volume data, there were two options for storing it. The first was to store each volume as a 3D texture and the second was to store the volume as an array of values. The second method used less memory and was faster at initialization but required converting the array of values for a volume into a 3D texture anytime that volume was to be rendered. Both of these methods were implemented and tested to determine if the rendering speed gains were significant enough to overcome the increased memory footprint and increased application initialization time.
Another operation that was moved from the render loop to initialization was the generation of opacity and color transfer functions. A 1D texture was used to store the opacity transfer function, and a set of three 1D textures was used to store each color transfer function. Opacity transfer function textures are traditionally created/updated every time the desired range of voxel values to visualize changes, which is an expensive process. Instead, a single texture was created at initialization and a user-defined minimum and maximum intensity value were passed to the shader through a uniform. Using the minimum and maximum intensity values, the shader can determine whether a voxel value falls within the range and then use the opacity transfer functions to determine an opacity; otherwise, the opacity is zero. This method is more efficient than rebuilding the texture after every change.
The Raycasting Render Loop
Once the graphics pipeline is initialized, the rendering loop can begin drawing. The shader starts by determining the start and end points of the ray for the current pixel. It is important to calculate the start and end points of the ray to determine the number of times the ray must sample. The number of samples the ray takes is directly proportional to rendering time. The more samples the ray takes, the longer it takes to render a frame and the lower the frame rates.
The shader then starts sampling at the start point and continues along the ray until it reaches the end point. The ray is sampling in 3D world space, but the volume data is defined as a 3D texture with its own coordinate system. Sampling the volume requires both the ray and the volume texture to be in the same coordinate space. In computer graphics, the model-view matrix is used to map a 3D model, the volume, from its local coordinate system to a position and orientation relative to the camera, the screen. To map the ray’s world location relative to our 3D texture, the inverse of the model-view matrix was applied to the ray.
The converted ray can then be used in the raycasting loop to step along the ray and sample the volume at different points. Each sample along the ray checks if the intensity value was within the minimum and maximum intensity value bounds defined by the user. Anything outside those bounds was set to zero. The intensity value can then be used to sample the opacity transfer function texture for the correct opacity. If the opacity is not greater than zero, the ray moves to the next sample. If the opacity is greater than zero, the intensity value is used to determine a color using the color transfer function texture. The opacity and color are then combined to form the sample’s contribution to the final pixel color. Equation 2 defines how the individual components are combined together to achieve a final pixel color. This sampling process continues until the end of the ray is reached or the pixel color becomes completely opaque.
4D Volume Rendering with Metal
Moving from 3D rendering to 4D rendering required creating a different shader to handle changing volume data. The interest in 4D data comes from the change in activity from the baseline at different moments in time. Therefore, it was necessary to create a fragment shader that accepted two 3D textures, one representing the baseline activity and the second representing the time step of interest. The 4D fragment shader samples both the baseline and the current time step textures. The two intensity values are differenced to visualize the current activity at that time step.
The amount of noise associated with the functional data necessitated an activity threshold to be implemented to only show changes in activity that reached a certain magnitude. The threshold value was set with a slider in the user interface, and the value was passed into the shader as a uniform. The threshold method was chosen due to its simplicity and computational efficiency compared to other filtering methods.
The functional data did not go through the same coloring process as the structural data. Instead, the color was determined by multiplying the color red by the magnitude of the activity. This results in higher levels of activity being displayed as more red in the final image. Any color could be used for this specific representation.
Combining 3D and 4D Data into One Representation
The third fragment shader program combined both the 3D structural data and the 4D functional data into one representation. This required passing in one 3D texture for the structural volume: one for the functional baseline volume and one for the functional current time step volume. The implemented method for combining multiple volumes was built on the assumption that none of the volumes move independent of each other. This assumption allowed the efficient casting of a single ray and sampling all three textures at the same location. While this assumption is a gross simplification of the general multi-volume rendering problem, it is not incorrect with fMRI data, where the structural and functional volumes never move independently.
The challenge with this multi-volume rendering method comes from mapping the sample point in 3D world space to a point on each texture in their own space. This becomes especially tricky when each volume contains textures with different resolutions and scalings. Figure 4 shows a 2D example of the mapping issue. The left side of the figure shows the structural (blue) and the functional (red) textures that represent the voxel data of their respective volumes. Both textures use a normalized coordinate system with (0, 0) being the upper left corner and (1, 1) being the lower right corner. Each box in the image represents a voxel. The right side of the image shows the world space where both volumes are aligned and scaled with the functional volume being scaled up equally in both directions. The world space uses a coordinate system where the center of the volumes is at (0, 0), and the height and width of the structural volume is 10. If the renderer samples the volume at world space location (0, 0), the mapped texture position would be (0.5, 0.5) for both the structural and the functional texture. However, if the renderer samples the volumes at (5, −4), the sample falls outside the bounds of the functional volume but not the structural volume. Therefore, the structural texture is sampled at (1, 0.9) and the value for the functional volume is set to zero as it does not exist at that point.
Once both volumes are correctly sampled, the intensity values must be classified and colored. The structural volume’s intensity value went through the same opacity and coloring operations as previously described for 3D rendering. The functional data went through the same coloring process as described above for 4D rendering where the activity level is directly proportional to the intensity of a single color, in this case red. The structural and functional color components were then added together to obtain a final color value for that sample. This process was then repeated for every sample along the ray. The result can be seen in Fig. 5 where the structural data is colored using a blue coloring scheme and the functional data is represented in red.
Results
Evaluating the feasibility of this real-time volume raycasting implementation relied on measuring data initialization time, frame rates, and the memory footprint. All timings were measured using tools compiled into the application code. The prototype raycasting application was tested on a 2016 9.7” iPad Pro running iOS 9.3.4. The iPad had a 64-bit Apple A9X dual core processor with 2 GB of built-in memory to power a 2048 × 1536 pixel screen. iPad uses a unified memory architecture which force both CPU and GPU to share the same system memory. There is no dedicated texture memory for the GPU unlike desktop and laptop computers with discrete graphics cards.
Two representative brain activity fMRI datasets were used for testing. Each dataset consisted of a high-resolution structural volume and a set of low-resolution functional volumes. For both datasets, the structural component was 256 by 256 pixels per slice with 128 slices. The functional component of dataset 1 was 64 by 64 pixels per slice, with 24 slices per time step and 126 time steps total. The functional component of dataset 2 was 80 by 80 pixels per slice, with 41 slices per time step and 114 time steps total. All fMRI data was stored in the NIfTI file format with 16 bits per voxel. The functional data was captured with a 3-s time interval for dataset 1 and 2.5-s time interval for dataset 2.
The prototype application was evaluated for data initialization times using these representative fMRI datasets. Data initialization included the setup of the rendering pipelines and creation of all necessary Metal objects. Table 1 shows the measured results of the data initialization time compared to the resulting memory footprint. The first condition looked at the implementation where only a single time step is converted into a 3D texture upon initialization. The second condition stores all time steps as 3D textures. For each condition, the application was launched and data was loaded ten times to obtain an average.
Table 1.
Dataset 1 | Dataset 2 | |||
---|---|---|---|---|
Single functional texture | All functional textures | Single functional texture | All functional textures | |
Average (s) | 16.4 | 29.8 | 18.1 | 47.8 |
Standard Deviation | 0.272 | 0.268 | 0.317 | 0.337 |
Variance | 0.074 | 0.072 | 0.101 | 0.113 |
Memory (MB) | 362 | 408 | 670 | 958 |
Frame rates were tested between the three different shader configurations for structural, functional, and combined data (both structural and functional). Within the functional and combined conditions, the frame rates were measured as a single time step (single time step) and as animating through time steps (all time steps). The all time steps condition was used to determine the performance impact when “playing” back the data as it would have been captured. The frame rates were also compared between the implementation where textures were created upon demand (single texture) and when they are initialized upon startup (preload textures). Each condition consisted of five runs measuring 100 consecutive frames each with no user interaction. The mobile device was rebooted between runs. Those frames were then averaged for the final frame rates. Table 2 shows the resulting frame rates for the different conditions.
Table 2.
Structural | Functional | Combined | ||||
---|---|---|---|---|---|---|
Single time step | All time steps | Single time step | All time steps | |||
Dataset 1 | Single texture | 17.908 | 45.096 | 4.723 | 11.845 | 4.699 |
Preload textures | 17.920 | 51.787 | 9.316 | 11.972 | 9.515 | |
Dataset 2 | Single texture | 17.952 | 31.528 | 1.983 | 9.846 | 1.979 |
Preload textures | 17.984 | 37.345 | 3.873 | 11.705 | 3.864 |
The results shown in Table 2 were obtained with no user input, and therefore, the volume did not move. A second condition was tested, where the volume was rotated around the vertical axis, with one full rotation every 10 s. The results comparing the frame rates while rotating the volume with the non-rotation condition can be seen in Table 3. The difference in the results was considered negligible, as it is less than 10% for the conditions tested. This indicates that user interaction does not have much, if any, effect on the frame rates, that the size of the dataset is the most influential factor. This will be discussed more in the “Discussion” section to follow. Therefore, further results only used one condition (no user input) and the results were assumed to not have changed for the rotation condition.
Table 3.
Structural | Functional | Combined | ||||
---|---|---|---|---|---|---|
Single time step | All time steps | Single time step | All time steps | |||
Dataset 1 | No rotation | 17.920 | 51.787 | 9.316 | 11.972 | 9.515 |
rotation | 18.866 | 47.854 | 9.704 | 11.016 | 8.217 |
Frame rates were again tested when the sampling step size of the raycasting algorithm was increased from the width of one voxel to the width of two voxels. Increasing the sampling step size will reduce the image quality and result in missing small anatomical features. However, the tradeoff between speed and image quality may be desirable in some situations. Table 4 shows the results of the frame rates when increasing the sampling step size to the width of two voxels for the preloaded texture condition. The change in image quality between the two sampling rates can be seen in Fig. 6.
Table 4.
Structural | Functional | Combined | ||||
---|---|---|---|---|---|---|
Single time step | All time steps | Single time step | All time steps | |||
Dataset 1 | Preload textures | 41.20 | 60.03 | 9.64 | 22.60 | 9.45 |
Dataset 2 | Preload textures | 41.04 | 60.02 | 3.81 | 21.55 | 3.87 |
Discussion
This research built a prototype volume raycasting application for 3D and 4D medical imaging to determine the feasibility of visualizing fMRI data on mobile devices. Feasibility was determined by data initialization times, frame rates, and memory consumption. It was impossible to compare this method with others because no other real-time mobile volume raycaster that support 3D and 4D data could be found after an exhaustive literature search. Therefore, the results of this research were compared, when possible, with an fMRI desktop implementation [41] similar to this work as well as a previous iPad volume renderer using orthogonal texture slicing instead of raycasting [38].
The data initialization times were 16.4 and 29.8 s with dataset 1 and 18.1 and 47.8 for dataset 2 using two different implementations for storing the volume data (Table 1). The difference in load times is a result of loading all the functional volume data into a set of 3D textures, which is computationally expensive. For comparison, the data loading time of the desktop application using the same fMRI dataset was 8.55 s [41]. In general, lower times indicate better usability in the real world. However, initialization happens once upon loading data and does not significantly impact the real-time nature of the application once loading is complete. Since the longest initialization time was only approximately 30 s, it is considered near real time.
Storing all volume data in 3D textures instead of generating those textures on demand increases the random access memory (RAM) footprint of the application from 362 to 408 MB for dataset 1 and from 670 to 958 MB for dataset 2 (Table 1). This is not an insignificant increase in memory given many mobile devices are limited to 1 to 2 GB of memory, and the operating system will shut down the application if it uses too much of the available memory. The desirability of this memory increase can only be considered when looking at the impact on frame rates.
This research defined real-time interaction as being 10 frames per second [40]. The single time step frame rates were between 11 and 51 frames per second for dataset 1 and between 9 and 37 frames per second for dataset 2 (Table 2). The all time steps conditions were between 4.7 and 9.5 frames per second for dataset 1 and between 1.9 and 3.8 frames per second for dataset 2. The single time step conditions do consistently meet the 10 frames per second goal across both datasets. The all time steps condition was not able to consistently meet the 10 fps, but dataset 1 was close at 9 fps using the pre-initialized textures.
Comparing the frame rates when raycasting dataset 1 on the desktop application saw the structural condition with 54.9 fps, functional single time step with 60.93 fps, and functional all time steps with 60.22 fps [41]. The improved frame rates are expected with a 2.6 GHz Intel Core i7 processor, 16 GB of RAM, and an NVIDIA GeForce GT 750 M graphics card. The comparison iPad application used orthogonal texture slicing and a similar dataset of 256 × 256 pixels per slice and 128 slices and saw frame rates between 45 and 50 fps for the structural static condition [38]. The improved frame rates can largely be attributed to the use of a less computationally expensive orthogonal texture slicing algorithm, even running on an iPad 2. Raycasting is widely considered the superior visual when compared with orthogonal texture slicing, due to a more realistic lighting model used for intensity calculation.
When increasing the sampling step size to twice the width of a single voxel, the frame rates increased as expected. The single time steps conditions consistently saw double the frame rates (Table 4). However, the all time steps condition saw no increase in frame rates whatsoever. This indicates that switching 3D textures representing the functional volumes is the bottleneck in the rendering process as opposed to the sheer number of computations done per time step. This is not consistent with traditional views on 3D raycasting that identify the number of rays and the number of samples as the bottleneck. This observation cannot be claimed as an outcome of this work until sufficient testing is done to verify this is actually occurring.
The image quality differences between the single voxel width sampling step size and the double voxel width step size is minimal (Fig. 6). It is possible to observe a lighter coloring of the volume due half the number of samples being composited into a final color. It is not recommended to use the double step size for most medical visualization application, as small anatomical objects could be missed. However, for a general viewing, the tradeoff between speed and visual quality may be acceptable.
One previous attempt at volume raycasting on a mobile device did attempt to use an iPad 2 to render a volume consisting of 64 slices at 512 × 512 resolution. The implementation used 2D textures as 3D textures were unavailable and achieved frame rates under 1 frame per second. This would roughly equate to the structural condition in this research that saw 17 frames per second. This jump in frame rates occurred in a 5-year time span and would indicate higher frame rates are not far way with the rate that computing power is increasing in mobile devices.
The tradeoff between memory footprint and frame rates can be seen in the result of a 46 MB increase for dataset 1 and a 288 MB increase for dataset 2 resulting in double the frames for the all time steps conditions. While memory is at a premium on mobile devices, the doubling of frame rates appears to outweigh the memory increase. Therefore, it is recommended to store all volume data as 3D textures when 4D functional data is used.
Conclusions
This research looked at the feasibility of implementing a real-time 3D and 4D fMRI volume rendering method on a mobile device given the recent updates in graphic computational performance and new graphics languages. The application initialization, frame rates, and memory footprint results indicate that it is feasible to implement a raycasting method on a mobile device, but the frame rates are still not consistently high enough to be considered real time. This is especially true for the 4D functional data which saw low frame rates. However, the performance will improve as hardware continues to improve.
Improvements in frame rates can be gained by implementing other performance techniques like empty space skipping through the use of octree data structures. Another possible improvement for brain activity visualization would be to calculate the difference in activity between the baseline and the current time step upon initialization and store the difference in a 3D texture. This would take more time for data initialization but would reduce the number of textures passed into the fragment shader as well as remove the fragment operations currently used to perform this at run time. This method would also allow more sophisticated filtering of the noisy functional data.
References
- 1.Di Salle F, Formisano E, Linden DE, Goebel R, Bonavita S, Pepino A, et al. Exploring brain function with magnetic resonance imaging. Eur J Radiol. 1999;30:84–94. doi: 10.1016/S0720-048X(99)00047-9. [DOI] [PubMed] [Google Scholar]
- 2.Rosen BR, Savoy RL. fMRI at 20: Has it changed the world? Neuroimage. 2012;62:1316–1324. doi: 10.1016/j.neuroimage.2012.03.004. [DOI] [PubMed] [Google Scholar]
- 3.Bandettini P. Functional MRI today. Int J Psychophysiol. 2007;63:138–145. doi: 10.1016/j.ijpsycho.2006.03.016. [DOI] [PubMed] [Google Scholar]
- 4.Neuroimaging Informatics Technology Initiative (NIfTI), (2011). http://nifti.nimh.nih.gov
- 5.Westover L. Footprint evaluation for volume rendering. ACM SIGGRAPH Comput Graph. 1990;24:367–376. doi: 10.1145/97880.97919. [DOI] [Google Scholar]
- 6.Botsch M, Kobbelt L: High-quality point-based rendering on modern GPUs, 11th Pacific Conf. on Computer Graph. Appl. 2003. Proceedings. (2003). doi:10.1109/PCCGA.2003.1238275.
- 7.Neophytou N, Mueller K. GPU accelerated image aligned splatting. Fourth Int. Work. Vol. Graph. 2005;2005:197–242. [Google Scholar]
- 8.Moller T, Machiraju R, Mueller K, Yagel R: Classification and local error estimation of interpolation and derivative filters for volume rendering, in: Proc. 1996 Symp. Vol. Vis., San Fransisco, CA, 1996: pp. 71–78. doi:10.1109/SVV.1996.558045.
- 9.Pfister H, Hardenbergh J, Knittel J, Lauer H, Seiler L: The VolumePro Real-Time Ray-casting System, in: Siggraph 1999, Comput. Graph. Proceedings, 1999: pp. 251–260. doi:10.1145/311535.311563.
- 10.Wu Y, Bhatia V, Lauer H, Seiler L: Shear-image order ray casting volume rendering, in: Proc. 2003 Symp. Interact. 3D Graph. - SI3D ‘03, 2003: pp. 152–162. doi:10.1145/641508.641510.
- 11.Rezk-Salama C, Engel K, Bauer M, Greiner G, Ertl T: Interactive volume on standard PC graphics hardware using multi-textures and multi-stage rasterization, in: Proc. ACM SIGGRAPH/EUROGRAPHICS Work. Graph. Hardw., 2000: pp. 109–118. doi:10.1145/346876.348238.
- 12.Engel K, Kraus M, Ertl T: High-quality pre-integrated volume rendering using hardware-accelerated pixel shading, in: Proc. ACM SIGGRAPH/EUROGRAPHICS Work. Graph. Hardw., 2001: pp. 9–16. doi:10.1145/383507.383515.
- 13.Pfister H: Hardware-Accelerated Volume Rendering, in: Vis. Handb., Elsevier, 2005: pp. 229–258. doi:10.1016/B978-012387582-2/50013-7.
- 14.Weiskopf D. GPU-Based Interactive Visualization Techniques. Berlin Heidelberg: Springer; 2006. [Google Scholar]
- 15.Fox PT, Raichle ME. Focal physiological uncoupling of cerebral blood flow and oxidative metabolism during somatosensory stimulation in human subjects. Proc Natl Acad Sci U S A. 1986;83:1140–1144. doi: 10.1073/pnas.83.4.1140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kaufman A, Mueller K: Overview of volume rendering, in: Vis. Handb., 1st ed., Elsevier Inc., Amsterdam, 2005: pp. 127–174.
- 17.Lichtenbelt B, Crane R, Naqvi S. Introduction to volume rendering. Upper Saddle River, NJ: Prentice-Hall, Inc.; 1998. [Google Scholar]
- 18.Levoy M. Display of surfaces from volume data. IEEE Comput Graph Appl. 1988;8:29–37. doi: 10.1109/38.511. [DOI] [Google Scholar]
- 19.Drebin RA, Carpenter L, Hanrahan P. Volume Rendering, Proc. SIGGRAPH ‘88. Comp Graph. 1988;22:65–74. doi: 10.1145/378456.378484. [DOI] [Google Scholar]
- 20.He T, Hong L, Kaufman A, Pfister H: Generation of transfer functions with stochastic search techniques, in: Proc. 7th Conf. Vis. ‘96, San Fransisco, CA, 1996: pp. 227–234. doi:10.1109/VISUAL.1996.568113.
- 21.INMOS . Graphics Databook. 2. Melksham, Wiltshire: Redwood Press Limited; 1990. [Google Scholar]
- 22.Kruger J, Westermann R: Acceleration Techniques for GPU-based Volume Rendering, in: IEEE_VIS, Seattle, Washington, USA, 2003: pp. 38–44. doi:10.1109/VIS.2003.10001.
- 23.Whitted T. An improved illumination model for shaded display. ACM SIGGRAPH Comput Graph. 1979;13:14. doi: 10.1145/965103.807419. [DOI] [Google Scholar]
- 24.Levoy M. Efficient ray tracing of volume data. ACM Trans Graph. 1990;9:245–261. doi: 10.1145/78964.78965. [DOI] [Google Scholar]
- 25.Cai W, Sakas G. Data intermixing and multi-volume rendering. Comput Graph Forum. 1999;18:359–368. doi: 10.1111/1467-8659.00356. [DOI] [Google Scholar]
- 26.Manssour IH, Furuie SS, Nedel LP, Freitas CMDS: A framework to visualize and interact with multimodal medical images. Vol Graph:385–398, 2001. doi: 10.1007/978-3-7091-6756-4_27
- 27.Beyer J, Hadwiger M, Wolfsberger S, Bühler K. High-quality multimodal volume rendering for preoperative planning of neurosurgical interventions. IEEE Trans Vis Comput Graph. 2007;13:1696–1703. doi: 10.1109/TVCG.2007.70560. [DOI] [PubMed] [Google Scholar]
- 28.Ferre M, Puig A, Tost D. A framework for fusion methods and rendering techniques of multimodal volume data. Comput Animat Virtual Worlds. 2004;15:63–77. doi: 10.1002/cav.1. [DOI] [Google Scholar]
- 29.Ferré M, Puig A, Tost D: Rendering Techniques for Multimodal Data, in: Proc. SIACG 2002 1st Ibero-American Symp. Comput. Graph., 2002: pp. 305–313
- 30.Wilson B, Lum EB, Ma K: Interactive Multi-volume Visualization, in: Comput. Sci. — ICCS 2002, Springer Berlin Heidelberg, 2002: pp. 102–110. doi:10.1007/3-540-46080-2_11.
- 31.Noon C, Foo E, Winer E. Interactive GPU volume raycasting in a clustered graphics environment. Med Imaging 2012 Image-Guided Proced Robot Interv Model. 2012;8316:1–9. [Google Scholar]
- 32.Van Gelder A, Kim K: Direct volume rendering with shading via three-dimensional textures, in: Proc. 1996 Symp. Vol. Vis., Acm, San Fransisco, CA, 1996: p. 23–30,. doi:10.1109/SVV.1996.558039.
- 33.DeGusta M: Are Smart Phones Spreading Faster than Any Technology in Human History?, MIT Technol. Rev. (2012)
- 34.Noon CJ: A Volume Rendering Engine for Desktops, Laptops, Mobile Devices and Immersive Virtual Reality Systems using GPU-Based Volume Raycasting by, Iowa State University, 2012
- 35.Metal - Apple Developer: https://developer.apple.com/metal/, accessed May (2017)
- 36.Vulkan - Industry Forged: https://www.khronos.org/vulkan/, accessed May (2017)
- 37.Khronos Group, OpenGL ES: https://www.khronos.org/opengles/, accessed May (2017)
- 38.Noon C, Holub J, Winer E: Real-time volume rendering of digital medical images on an iOS device, in: C.G.M. Snoek, L.S. Kennedy, R. Creutzburg, D. Akopian, D. Wüller, K.J. Matherson, et al. (Eds.), IS&T/SPIE Electron. Imaging. Int. Soc. Opt. Photonics, 2013. doi:10.1117/12.2005335
- 39.Forsberg L: NeuroPub, (2013)
- 40.Miller RB, Response time in man-computer conversational transactions, Proc. December 9–11, 1968, Fall Jt. Comput. Conf. Part I. (1968) 267–277. doi:10.1145/1476589.1476628.
- 41.Holub J, Winer E, Visualizing fMRI data using volume rendering in Virtual Reality Joseph, in: Interservice/Industry Training, Simulation, Educ. Conf., Orlando, FL, 2015: pp. 1–11.