Abstract
Computer generated graphics are superimposed onto live video emanating from an endoscope, offering the surgeon visual information that is hiding in the native scene—this describes the classical scenario of augmented reality in minimally invasive surgery. Research efforts have, over the past few decades, pressed considerably against the challenges of infusing a priori knowledge into endoscopic streams. As framed, these contributions emulate perception at the level of the surgeon expert, perpetuating debates on the technical, clinical, and societal viability of the proposition.
We herein introduce interactive endoscopy, transforming passive visualization into an interface that allows the surgeon to label noteworthy anatomical features found in the endoscopic video, and have the virtual annotations remember their tissue locations during surgical manipulation. The streamlined interface combines vision-based tool tracking and speech recognition to enable interactive selection and labeling, followed by tissue tracking and optical flow for label persistence. These discrete capabilities have matured rapidly in recent years, promising technical viability of the system; it can help clinicians offload the cognitive demands of visually deciphering soft tissues; and supports societal viability by engaging, rather than emulating, surgeon expertise. Through a video-assisted thoracotomy use case, we develop a proof-of-concept to improve workflow by tracking surgical tools and visualizing tissue, while serving as a bridge to the classical promise of augmented reality in surgery.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction and Motivation
Lung cancer is the deadliest form of cancer worldwide, with 1.6 million new diagnoses and 1.4 million deaths each year, more than cancers of the breast, prostate, and colon—the three next most prevalent cancers—combined. In response to this epidemic, major screening trials have been enacted including the Dutch/Belgian NELSON trial, the US NLST, and Danish trials. These studies found that proactive screening using low dose computed tomography (CT) can detect lung cancer at an earlier, treatable stage at a rate of 71%, leading to a 20% reduction in mortality [2]. This prompted Medicare to reimburse lung cancer screening in 2015 and with that, the number of patients presenting with smaller, treatable tumors was expected to rise dramatically. The projected increase was observed within the Veterans Health Administration [17], and while this population bears a heightened incidence of lung cancer due to occupational hazards, the need to optimize patient care was foretold.
Surgical resection is the preferred curative therapy due to the ability to remove units of anatomy that sustain the tumor, as well as lymph nodes for staging. Most of the 100,000 surgical resections performed in the US annually are minimally invasive, with 57% as video-assisted thoracoscopic surgery (VATS) and 7% as robotic surgery [1]. Anatomically, the lung follows a tree structure with airways that root at the trachea and narrow as they branch towards the ribcage; blood vessels hug the airways and join them at the alveoli, or air sacs, where oxygen and carbon dioxide interchange. Removing a tumor naturally detaches downstream airways, vessels, and connective lung tissue, so tumor location and size prescribe the type of resection performed. Large or central tumors are removed via pneumonectomy (full lung) or lobectomy (full lobe), while small or peripheral tumors may be “wedged” out. Segmentectomy, or removal of a sub-lobar segment, is gaining currency because the procedure balances disease removal with tissue preservation; and because the trend towards smaller, peripheral tumors supports it.
2 Background
In an archetypal VATS procedure, the surgeon examines a preoperative CT; here the lung is inflated. They note the location of the tumor, relative to adjacent structures. Now under the thoracoscope, the lung is collapsed, the tumor invisible. The surgeon roughly estimates the tumor location. They look for known structures; move the scope; manipulate the tissue; reveal a structure; remember it. They carefully dissect around critical structures [14]; discover another; remember it. A few iterations and their familiarity grows. They mentally align new visuals with their knowledge, experience, and the CT. Thusly, they converge on an inference of the true tumor location.
The foregoing exercise is cognitively strenuous, time consuming, yet merely a precursor to the primary task of tumor resection. It is emblematic of endoscopic surgery in general and of segmentectomy in particular, as the surgeon continues to mind critical structures under a limited visual field [18]. Consequently, endoscopic scenes are difficult to contextualize in isolation, thereby turning the lung into a jigsaw puzzle in which the pieces may deform, and must be memorized. Indeed, surgeons routinely retract the scope or zoom out to construct associations and context. Moreover, the lung appearance may not be visually distinctive nor instantly informative, further intensifying the challenges, and thus the inefficiencies, of minimally invasive lung surgery.
The research community has responded vigorously, imbuing greater context into endoscopy using augmented reality [5, 24] by registering coherent anatomical models onto disjoint endoscopic snapshots. For example, Puerto-Souza et al. [25] maintain registration of preoperative images to endoscopy by managing tissue anchors amidst surgical activity. Du et al. [11] combine features with lighting, and Collins et al. [9], texture with boundaries, to track surfaces in 3D. Lin et al. [19] achieve surface reconstruction using hyperspectral imaging and structured light. Simultaneous localization and mapping (SLAM) approaches have been extended to handle tissue motion [23] and imaging conditions [21] found in endoscopy. Stereo endoscopes are used to reconstruct tissue surfaces with high fidelity, and have the potential to render overlays in 3D or guide surgical robots [29, 32]. Ref. [22] reviews the optical techniques that have been developed for tissue surface reconstruction.
In recent clinical experiments, Chauvet et al. [8] project tumor margins onto the surfaces of ex vivo porcine kidneys for partial nephrectomy. Liu et al. [20] develop augmented reality for robotic VATS and robotic transoral surgery, performing preclinical evaluations on ovine and porcine models, respectively, in elevating the state of the art. These studies uncovered insights on registering models to endoscopy and portraying these models faithfully. However, whether pursuing a clinical application or technical specialty, researchers have faced a timeless obstacle: tissue deformation. Modeling deformation is an ill-posed problem, and this coinciding domain has likewise undergone extensive investigation [28]. In the next section, we introduce an alternative technology for endoscopy that circumvents the challenges of deformable modeling.
3 Interactive Endoscopy
3.1 Contributions
The surgeon begins a VATS procedure as usual, examining the CT, placing surgical ports, and estimating the tumor location under thoracoscopy. They adjust both scope and lung in search of known landmarks, the pulmonary artery for instance. Upon discovery, now under the proposed interactive endoscopy system (Fig. 1, from a full ex vivo demo), they point their forceps at the target and verbally instruct, “Mark the pulmonary artery”. An audible chime acknowledges, and a miniature yet distinctive icon appears in the live video at the forceps tip, accompanied by the semantic label. The surgeon continues to label the anatomy and as they move the scope or tissue, the virtual annotations follow.
The combination of a limited visual field and amorphous free-form tissue induces the surgeon to perform motions that are, in technical parlance, computationally intractable—it is the practice of surgery itself that postpones surgical augmented reality ever farther into the future. In that future, critical debates on validation and psychophysical effects await, yet the ongoing challenges of endoscopic surgery have persisted for decades. The present contribution transforms endoscopy from a passive visualization tool to an interactive interface for labeling live video. We recast static preoperative interactivity from Kim et al. [16] into an intraoperative scheme; and repurpose OCT review interactivity from Balicki et al. [4] to provide live spatial storage for the expert’s knowledge. We show how the system can help surgeons through a cognitively strenuous and irreproducible exploration routine, examine circumstances that would enable clinical viability, and discuss how the approach both complements and enables augmented reality in surgery, as envisioned a generation ago (Fuchs et al. [15]).
3.2 Key Components
For the proposed interactive endoscopy system, the experimental setup and usage scenario are pictured in Fig. 2. Its key components include (1) vision-based tool tracking, (2) a speech interface, and (3) persistent tissue tracking. While these discrete capabilities have been historically available, they have undergone marked improvement in recent years due to the emergence of graphical processing units (GPUs), online storage infrastructure, and machine learning. A system capitalizing on these developments has the potential to reach clinical reliability in the near future. While these technologies continue to evolve rapidly, we construct a proof-of-concept integration of agnostic building blocks as a means of assessing baseline performance.
Tool Tracking. Upon discovering each landmark, the surgeon points out its location to the system. A workflow-compatible pointer can be repurposed from a tool already in use, such as a forceps, by tracking it in the endoscope. We use a hierarchical heuristic method (Fig. 3) with similar assumptions as in [10] that thoracic tools are rigid, straight, and of a certain hue. The low-risk demands of the pointing task motivates our simple approach: 2D tool tracking can reliably map 3D surface anatomy due to the projective nature of endoscopic views. Our ex vivo tests indicate 2D tip localization to within 1.0 mm 92% of the time that the tool points to a new location, and more advanced methods [6, 27] suggest that clinical-grade tool tracking is well within reach.
Speech Interface. Pointing the forceps at a landmark, the surgeon uses speech to generate a corresponding label.This natural, hands-free interface is conducive to workflow and sterility, as previously acknowledged in the voice-controlled AESOP endoscope robot [3]. Recognition latency and accuracy were at the time prohibitive, but modern advances have driven widespread use. The proliferation of voice-controlled virtual assistants (e.g., Alexa) obliges us to revisit speech as a surgical interface.
We use Google Cloud Speech-to-Text in our experiments. The online service allows the surgeon to apply arbitrary semantic labels; offline tools or preset vocabularies may be preferred in resource-constrained settings. Qualitatively, we observed satisfactory recognition during demos using a commodity wireless speaker-microphone, and this performance was retained in a noisy exhibition room by switching to a wireless headset, suggesting that the clinical benefits of speech interfaces may soon be realizable.
Persistent Tissue Tracking. After the surgeon creates a label, the system maintains its adherence by encoding the underlying tissue and tracking it as the lung moves, deforms, or reappears in view following an exit or occlusion. This provides an intuition of the lung state, with similar issues faced in liver surgery. The labeling task asks that arbitrary patches of tissue be persistently identified whenever they appear—a combination of online detection and tracking which for endoscopy is well served by SURF and optical flow [12]. SURF can identify tissue through motion and stretching 83% and 99% of the time respectively [30].In ex vivo experiments, uniquely identified features could be recovered successfully upon returning into view so long as imaging conditions remain reasonably stable, as illustrated in Fig. 4.
Labels should be displayed, at minimum, when the tissue is at rest, and modern techniques in matching sub-image elements [13, 33] show promise in overcoming the challenges of biomedical images [26]. Approaches such as FlowNet can then be used to track moving tissue and enhance the realism of virtual label adherence. In short, there is a new set of tools to address traditional computer vision problems in endoscopy.
3.3 Capturing Surgeon Expertise
Interactive endoscopy ties maturing technologies together into a novel application with forgiving performance requirements, paving the way to the level of robustness needed for clinical use. The simplicity of the concept belies its potential to alleviate cognitive load, which can impact both judgment and motor skills [7]. When the surgeon exerts mental energy in parsing what they see, the system lets them translate that expertise directly onto the virtual surgical field. This mitigates redundant exploration, and the visibility of labels can help them infer context more readily.
In fact, many surgeons already use radiopaque, tethered (iVATS), dye [31], and ad hoc electrocautery markers to aid localization prior to or during surgery. These varied practices introduce risk and overhead, whereas virtual markers are easy to use and provide a reason for use, potentially bridging gaps between clinical practices and supporting technology. Moreover, surgeon engagement with technology has a broader implication: digitization of the innards of surgery, which has been a black box. Digital labels offer a chance to capture semantic, positional, temporal, visual, and procedural elements of surgery, forming a statistical basis for understanding—and anticipating—surgical acts at multiple scales. This, in turn, can help make augmented reality a clinical reality.
4 Conclusions
The promise of augmented reality in surgery has been tempered by challenges such as soft tissue deformation, and efforts to overcome this timeless adversary has inadvertently suspended critical debates on the role of augmented perception in medicine altogether. We present, as a technological bridge, a streamlined user interface that allows surgeons to tag the disjoint views that comprise endoscopic surgery. These virtual labels persist as the organ moves, so surgeons can potentially manage unfamiliar tissue more deterministically. This approach embraces the finiteness of human cognition and alleviates reliance on cognitive state, capturing expert perception and judgment without attempting to emulate it. We design a minimal feature set and a choice architecture with symmetric freedom to use or not, respecting differences between surgeons. Our baseline system demonstrates promising performance in a lab setting, while rapid ongoing developments in the constituent technologies offer a path towards clinical robustness. These circumstances present the opportunity for surgeons to change surgery, without being compelled to change.
References
Healthcare Cost and Utilization Project. https://hcupnet.ahrq.gov/#setup
Reduced lung-cancer mortality with low-dose computed tomographic screening. New Engl. J. Med. 365(5), 395–409 (2011)
Allaf, M.E., et al.: Laparoscopic visual field. Surg. Endosc. 12(12), 1415–1418 (1998)
Balicki, M., et al.: Interactive OCT annotation and visualization for vitreoretinal surgery. In: Linte, C.A., Chen, E.C.S., Berger, M.-O., Moore, J.T., Holmes, D.R. (eds.) AE-CAI 2012. LNCS, vol. 7815, pp. 142–152. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38085-3_14
Bernhardt, S., Nicolau, S.A., Soler, L., Doignon, C.: The status of augmented reality in laparoscopic surgery as of 2016. Med. Image Anal. 37, 66–90 (2017)
Bodenstedt, S., et al.: Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery (2018)
Carswell, C.M., Clarke, D., Seales, W.B.: Assessing mental workload during laparoscopic surgery. Surg. Innov. 12(1), 80–90 (2005)
Chauvet, P., et al.: Augmented reality in a tumor resection model. Surg. Endosc. 32(3), 1192–1201 (2018)
Collins, T., Bartoli, A., Bourdel, N., Canis, M.: Robust, real-time, dense and deformable 3D organ tracking in laparoscopic videos. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9900, pp. 404–412. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46720-7_47
Doignon, C., Nageotte, F., de Mathelin, M.: Segmentation and guidance of multiple rigid objects for intra-operative endoscopic vision. In: Vidal, R., Heyden, A., Ma, Y. (eds.) WDV 2005-2006. LNCS, vol. 4358, pp. 314–327. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-70932-9_24
Du, X., et al.: Robust surface tracking combining features, intensity and illumination compensation. Int. J. Comput. Assist. Radiol. Surg. 10(12), 1915–1926 (2015)
Elhawary, H., Popovic, A.: Robust feature tracking on the beating heart for a robotic-guided endoscope. Int. J. Med. Robot. Comput. Assist. Surg. 7(4), 459–468 (2011)
Fischer, P., Dosovitskiy, A., Brox, T.: Descriptor matching with convolutional neural networks: a comparison to SIFT (2014)
Flores, R.M., et al.: Video-assisted thoracoscopic surgery (VATS) lobectomy: catastrophic intraoperative complications. J. Thorac. Cardiovasc. Surg. 142(6), 1412–1417 (2011)
Fuchs, H., et al.: Augmented reality visualization for laparoscopic surgery. In: Wells, W.M., Colchester, A., Delp, S. (eds.) MICCAI 1998. LNCS, vol. 1496, pp. 934–943. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0056282
Kim, J.-H., Bartoli, A., Collins, T., Hartley, R.: Tracking by detection for interactive image augmentation in laparoscopy. In: Dawant, B.M., Christensen, G.E., Fitzpatrick, J.M., Rueckert, D. (eds.) WBIR 2012. LNCS, vol. 7359, pp. 246–255. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31340-0_26
Kinsinger, L.S., et al.: Implementation of lung cancer screening in the Veterans Health Administration. JAMA Intern. Med. 177(3), 399–406 (2017)
Lee, C.Y., et al.: Novel thoracoscopic navigation system with augmented real-time image guidance for chest wall tumors. Ann. Thorac. Surg. 106(5), 1468–1475 (2018)
Lin, J., et al.: Dual-modality endoscopic probe for tissue surface shape reconstruction and hyperspectral imaging enabled by deep neural networks. Med. Image Anal. 48, 162–176 (2018)
Liu, W.P., Richmon, J.D., Sorger, J.M., Azizian, M., Taylor, R.H.: Augmented reality and CBCT guidance for transoral robotic surgery. J. Robot. Surg. 9(3), 223–233 (2015)
Mahmoud, N., Collins, T., Hostettler, A., Soler, L., Doignon, C., Montiel, J.M.M.: Live tracking and dense reconstruction for handheld monocular endoscopy. IEEE Trans. Med. Imaging 38(1), 79–89 (2019)
Maier-Hein, L., et al.: Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery. Med. Image Anal. 17(8), 974–996 (2013)
Mountney, P., Yang, G.-Z.: Motion compensated SLAM for image guided surgery. In: Jiang, T., Navab, N., Pluim, J.P.W., Viergever, M.A. (eds.) MICCAI 2010. LNCS, vol. 6362, pp. 496–504. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15745-5_61
Nicolau, S., Soler, L., Mutter, D., Marescaux, J.: Augmented reality in laparoscopic surgical oncology. Surg. Oncol. 20(3), 189–201 (2011)
Puerto-Souza, G.A., Cadeddu, J.A., Mariottini, G.L.: Toward long-term and accurate augmented-reality for monocular endoscopic videos. IEEE Trans. Biomed. Eng. 61(10), 2609–2620 (2014)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Shvets, A.A., Rakhlin, A., Kalinin, A.A., Iglovikov, V.I.: Automatic instrument segmentation in robot-assisted surgery using deep learning. In: IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 624–628 (2018)
Sotiras, A., Davatzikos, C., Paragios, N.: Deformable medical image registration: a survey. IEEE Trans. Med. Imaging 32(7), 1153–1190 (2013)
Stoyanov, D., Scarzanella, M.V., Pratt, P., Yang, G.-Z.: Real-time stereo reconstruction in robotically assisted minimally invasive surgery. In: Jiang, T., Navab, N., Pluim, J.P.W., Viergever, M.A. (eds.) MICCAI 2010. LNCS, vol. 6361, pp. 275–282. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15705-9_34
Thienphrapa, P., Bydlon, T., Chen, A., Popovic, A.: Evaluation of surface feature persistence during lung surgery. In: BMES Annual Meeting, Atlanta, GA (2018)
Willekes, L., Boutros, C., Goldfarb, M.A.: VATS intraoperative tattooing to facilitate solitary pulmonary nodule resection. J. Cardiothorac. Surg. 3(1), 13 (2008)
Yip, M.C., Lowe, D.G., Salcudean, S.E., Rohling, R.N., Nguan, C.Y.: Tissue tracking and registration for image-guided surgery. IEEE Trans. Med. Imaging 31(11), 2169–2182 (2012)
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via CNNs. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Thienphrapa, P. et al. (2019). Interactive Endoscopy: A Next-Generation, Streamlined User Interface for Lung Surgery Navigation. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11768. Springer, Cham. https://doi.org/10.1007/978-3-030-32254-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-32254-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32253-3
Online ISBN: 978-3-030-32254-0
eBook Packages: Computer ScienceComputer Science (R0)