WO2024123888A1

WO2024123888A1 - Systems and methods for anatomy segmentation and anatomical structure tracking

Info

Publication number: WO2024123888A1
Application number: PCT/US2023/082697
Authority: WO
Inventors: Dani KIYASSEH; Fabrizio Santini
Original assignee: Vicarious Surgical Inc.
Priority date: 2022-12-06
Filing date: 2023-12-06
Publication date: 2024-06-13

Abstract

Systems and methods for anatomy segmentation and anatomical structure tracking are provided. The system receives an image from a camera assembly of the system. The image can include a representation of one or more anatomical structures of a subject. The system extracts from the image a visual representation. The system determines position and orientation data associated with a robotic assembly of the system. The position and orientation data is indicative of a pose of the robotic assembly. The system generates a state representation based at least in part on the pose and visual representations. The state representation represents a state of the system. The system identifies one or more anatomical landmarks in an anatomical space in which the robotic assembly is being operated. The system generates a plurality of segmentation maps. Each segmentation map identifies which of the anatomical structures to avoid contact with the robotic assembly.

Description

SYSTEMS AND METHODS FOR ANATOMY SEGMENTATION AND ANATOMICAL STRUCTURE TRACKING

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims the benefit of priority to U.S. Provisional Patent Application No. 63/430,513 filed on December 6, 2022, the entire contents of which are incorporated herein by reference.

BACKGROUND

[0002] Surgical robotic systems permit a user (also described herein as an “operator” or a “user”) to perform an operation using robotically-controlled instruments to perform tasks and functions during a procedure. However, users (e.g. surgeons) are still limited to visual feedback of a patient’s interior. Such visual feedback can limit the ability of the users to navigate and orient themselves in the confined and often confusing surgical theater.

[0003] Navigation is usually performed using prior knowledge of a target area and anatomical landmarks. Although the prior knowledge can be obtained with significant preoperative planning time, recognizing anatomical landmarks can represent a challenge if users are unaware of a camera orientation relative to a patient or lacks a sufficient amount of expertise. This issue can significantly increase intra-operative time and a likelihood of potential injury by, for example, dissecting the wrong structure.

SUMMARY

[0004] A surgical robotic system is presented. The surgical robotic system includes a robotic assembly. The robotic assembly includes a camera assembly configured to generate one or more images (e.g., a series of video frames, a live video footage, a series of photos captured in a burst mode, a single photo, and/or other suitable images) of an interior cavity of a subject (e.g., a patient) and a robotic arm assembly to be disposed in the interior cavity to perform a surgical operation. The surgical robotic system also includes a memory storing one or more instructions, a processor configured to or programmed to read the one or more instructions stored in the memory. The processor is operationally coupled to the robotic assembly. The processor is configured to receive an image from the camera assembly. The image can include a representation of one or more anatomical structures of the subject (e.g., organs, organ ducts, blood vessels (arteries and veins), nerves, other sensitive structures, and/or obstruction structures obstructing the robotic arm assembly). The processor is further configured to extract from the image a visual representation thereof. The visual representation is a compact representation (e.g., a vector that represents the image by reducing the dimensionality of the image to one or any other suitable representation that represents the image in a compact manner, such as reduced data size or the like). The processor is further configured to determine position and orientation data associated with the robotic assembly. The position and orientation data is indicative of a pose of the robotic assembly. The processor is further configured to generate a pose representation of the robotic assembly based at least in part on the position and orientation data. The processor is further configured to generate a state representation based at least in part on the pose representation and the visual representation. The state representation represents a state of the surgical robotic system. The processor is further configured to identify, based at least in part on the state representation, one or more anatomical landmarks (e.g., the inguinal triangle, the triangle of doom, the triangle of pain, or the like) in an anatomical space in which the robotic assembly is being operated. The processor is further configured to generate a plurality of segmentation maps. Each segmentation map identifies which of the one or more anatomical structures to avoid contact with the robotic assembly.

[0005] In some embodiments, the processor is further configured to determine a similarity between the state representation and a previous state representation. The previous state representation is generated based at least in part on a previous pose representation extracted from previous position and orientation data and a previous visual representation extracted from a previous image. In response to determining that the similarity is equal to or greater than a similarity threshold, the processor is further configured to average the state representation and the previous state representation to generate an averaged state representation. The processor is further configured to identify one or more anatomical landmarks based at least in part on the averaged state representation. The processor is further configured to generate a plurality of segmentation maps, each segmentation map identifying which of the one or more anatomical structures to avoid contact with the robotic assembly. In some embodiments, the processor is further configured to determine a three-dimensional (3D) reconstruction of the anatomical space in which the robotic assembly is being operated. The processor is further configured to determine a location of each of one or more identified anatomical structures in the anatomical space based at least in part on the 3D reconstruction. BRIEF DESCRIPTION OF THE DRAWINGS

[0006] These and other features and advantages of the present invention will be more fully understood by reference to the following detailed description in conjunction with the attached drawings in which like reference numerals refer to like elements throughout the different views. The drawings illustrate principals of the invention and, although not to scale, show relative dimensions.

[0007] FIG. l is a diagram illustrating an example surgical robotic system in accordance with some embodiments.

[0008] FIG. 2A is an example perspective view of a patient cart including a robotic support system coupled to a robotic subsystem of the surgical robotic system in accordance with some embodiments.

[0009] FIG. 2B is an example perspective view of an example operator console of a surgical robotic system of the present disclosure in accordance with some embodiments.

[0010] FIG. 3 A is a diagram illustrating an example side view of a surgical robotic system performing a surgery within an internal cavity of a subject in accordance with some embodiments.

[0011] FIG. 3B is a diagram illustrating an example top view of the surgical robotic system performing the surgery within the internal cavity of the subject of FIG. 3 A in accordance with some embodiments.

[0012] FIG. 4A is an example perspective view of a single robotic arm subsystem in accordance with some embodiments.

[0013] FIG. 4B is an example perspective side view of a single robotic arm of the single robotic arm subsystem of FIG. 4A in accordance with some embodiments.

[0014] FIG. 5 is an example perspective front view of a camera assembly and a robotic arm assembly in accordance with some embodiments.

[0015] FIG. 6 is an example graphical user interface of a robot pose view including a frustum view of a cavity of a patient and a pair of robotic arms of the surgical robotic system, a robot pose view, and a camera view in accordance with some embodiments.

[0016] FIG. 7 is a diagram illustrating an example anatomy segmentation and tracking module for anatomy segmentation and anatomical structure tracking in accordance with some embodiments. [0017] FIG. 8 is a flowchart illustrating steps for anatomical structure segmentation and anatomical landmark identification carried out by a surgical robotic system in accordance with some embodiments.

[0018] FIG. 9 is a flowchart illustrating steps for anatomical structure tracking carried out by a surgical robotic system in accordance with some embodiments.

[0019] FIG. 10 is a flowchart illustrating steps for training a surgical robotic system to automatically identify anatomical structures in accordance with some embodiments.

[0020] FIG. 11 A is a diagram of an example computing module that can be used to perform one or more steps of the methods provided by example embodiments.

[0021] FIG. 1 IB is a diagram illustrating an example system code that can be executable by the computing module of FIG. 11 A in accordance with some embodiments.

[0022] FIG. 12 is a diagram illustrating computer hardware and network components on which a system can be implemented.

DETAILED DESCRIPTION

[0023] Embodiments taught and described herein provide systems and methods for anatomy segmentation and anatomical structure tracking. In some embodiments, a surgical robotic system taught and described herein can provide an augmented view of a complex surgical environment by displaying anatomically-relevant structures currently in a field of view of a camera assembly of the surgical robotic system, such as by highlighting and tracking anatomical landmarks, organs, organ ducts, blood vessels (arteries and veins), nerves, other sensitive structures, and obstruction structures (e.g., structures obstructs robotic arm assembly) in a video stream in real time (e.g., a live video footage). For example, the surgical robotic system can provide anatomy segmentation and tracking including anatomical landmark identification, anatomical structure segmentation, and anatomical structure tracking, which can reduce intra-operative time spent by users to orient themselves and increasing the users’ awareness of a surgical stage. Because the anatomy segmentation and tracking taught and described herein can facilitate users during a surgical operation, training time and skill required for the users using the surgical robotic system is reduced, thereby increasing accessibility of the surgical robotic system to a broader range of users. The anatomical landmark identification taught and described herein can also enable an advanced user interface that provides users with tools such as tissue and anatomical structure identification, topographical navigation, and user-definable safety keep-out zones to prevent unwanted tissue collisions. The anatomical structure segmentation taught and described herein can further allow the surgical robotic system to automatically define safety keep-out zones that protect a subject from on and off-camera injury and can also enable the surgical robotic system to be a semi-autonomous surgical robotic system that can interact with human tissue while executing navigational and surgical procedures inside a subject, such as suturing and tissue dissection, safely and efficiently under users’ supervision.

[0024] Compared with conventional anatomy segmentation systems and methods that are limited by types of anatomical structures, anatomically-constrained areas, tracking environments, tracking objects, and data modalities, the anatomy segmentation and tracking taught and described herein can use an artificial intelligence (Al)-base framework (e.g., machine leaming/deep learning models) to segment various types of anatomical structures (e.g., blood vessels, nerves, other sensitive structures, obstruction structures, or the like) and provide information about multiple anatomical structures simultaneously. The anatomy segmentation and tracking taught and described herein can also perform the anatomical structure segmentation irrespective of locations of the anatomical structures within the body by segmenting the anatomical structures at various anatomical locations. Additionally, the anatomy segmentation and tracking taught and described herein can track anatomical structures over time and simultaneously localize them in a reconstructed three-dimensional (3D) map of an internal body cavity. Further, the anatomy segmentation and tracking taught and described herein can leverage multiple data modalities (e.g., pose information of a robotic arm assembly, visual information, dyes and fluorescence data, or other suitable modality data) to perform anatomical landmark identification, anatomical structure segmentation, and anatomical structure tracking.

[0025] Prior to providing additional specific description of the anatomy segmentation and tracking with respect to FIGS. 7-12 a surgical robotic system in which some embodiments could be employed is described below with respect to FIGS. 1-6.

[0026] While various embodiments have been taught and described herein, it will be clear to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions can occur to those skilled in the art without departing from the invention. It can be understood that various alternatives to the embodiments of the invention described herein can be employed.

[0027] As used in the specification and claims, the singular form “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” or “include” and/or “including,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

[0028] Unless specifically stated or obvious from context, as used herein, the term “about” is understood as within a range of normal tolerance in the art, for example within 2 standard deviations of the mean. “About” can be understood as within 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.1%, 0.05%, or 0.01% of the stated value. Unless otherwise clear from the context, all numerical values provided herein are modified by the term “about.” [0029] Although some example embodiments can be described herein or in documents incorporated by reference as employing a plurality of units to perform example processes, it is understood that example processes can also be performed by one or a plurality of modules. Additionally, it is understood that the term controller/controller can refer to a hardware device that includes a memory and a processor and is specifically programmed to execute the processes described herein in accordance with some embodiments. In some embodiments, the memory is configured to store the modules and the processor is specifically configured to execute said modules to perform one or more processes which are described further below. In some embodiments, multiple different controllers or controllers or multiple different types of controllers or controllers can be employed in performing one or more processes. In some embodiments, different controllers or controllers can be implemented in different portions of a surgical robotic systems.

Surgical Robotic Systems

[0030] Some embodiments can be employed with a surgical robotic system. A system for robotic surgery can include a robotic subsystem. The robotic subsystem includes at least a portion, which can also be referred to herein as a robotic assembly herein, that can be inserted into a patient via a trocar through a single incision point or site. The portion inserted into the patient via a trocar is small enough to be deployed in vivo at the surgical site and is sufficiently maneuverable when inserted to be able to move within the body to perform various surgical procedures at multiple different points or sites. The portion inserted into the body that performs functional tasks can be referred to as a surgical robotic module, a surgical robotic module or a robotic assembly herein. The surgical robotic module can include multiple different submodules or parts that can be inserted into the trocar separately. The surgical robotic module, surgical robotic module or robotic assembly can include multiple separate robotic arms that are deployable within the patient along different or separate axes. These multiple separate robotic arms can be collectively referred to as a robotic arm assembly herein. Further, a surgical camera assembly can also be deployed along a separate axis. The surgical robotic module, surgical robotic module, or robotic assembly can also include the surgical camera assembly. Thus, the surgical robotic module, or robotic assembly employs multiple different components, such as a pair of robotic arms and a surgical or robotic camera assembly, each of which are deployable along different axes and are separately manipulatable, maneuverable, and movable. The robotic arms and the camera assembly that are disposable along separate and manipulatable axes is referred to herein as the Split Arm (SA) architecture. The SA architecture is designed to simplify and increase efficiency of the insertion of robotic surgical instruments through a single trocar at a single insertion site, while concomitantly assisting with deployment of the surgical instruments into a surgical ready state as well as the subsequent removal of the surgical instruments through the trocar. By way of example, a surgical instrument can be inserted through the trocar to access and perform an operation in vivo in the abdominal cavity of a patient. In some embodiments, various surgical instruments can be used or employed, including but not limited to robotic surgical instruments, as well as other surgical instruments known in the art.

[0031] The systems, devices, and methods disclosed herein can be incorporated into and/or used with a robotic surgical device and associated system disclosed for example in United States Patent No. 10,285,765 and in PCT patent application Serial No. PCT/US2020/39203, and/or with the camera assembly and system disclosed in United States Publication No. 2019/0076199, and/or the systems and methods of exchanging surgical tools in an implantable surgical robotic system disclosed in PCT patent application Serial No. PCT/US2021/058820, where the content and teachings of all of the foregoing patents, patent applications and publications are incorporated herein by reference herein in their entirety. The surgical robotic module that forms part of the present invention can form part of a surgical robotic system that includes a user workstation that includes appropriate sensors and displays, and a robot support system (RSS) for interacting with and supporting the robotic subsystem of the present invention in some embodiments. The robotic subsystem includes a motor and a surgical robotic module that includes one or more robotic arms and one or more camera assemblies in some embodiments. The robotic arms and camera assembly can form part of a single support axis robotic system, can form part of the split arm (SA) architecture robotic system, or can have another arrangement. The robot support system can provide multiple degrees of freedom such that the robotic module can be maneuvered within the patient into a single position or multiple different positions. In one embodiment, the robot support system can be directly mounted to a surgical table or to the floor or ceiling within an operating room. In another embodiment, the mounting is achieved by various fastening means, including but not limited to, clamps, screws, or a combination thereof. In other embodiments, the structure can be free standing. The robot support system can mount a motor assembly that is coupled to the surgical robotic module, which includes the robotic arm assembly and the camera assembly. The motor assembly can include gears, motors, drivetrains, electronics, and the like, for powering the components of the surgical robotic module.

[0032] The robotic arm assembly and the camera assembly are capable of multiple degrees of freedom of movement. According to some embodiments, when the robotic arm assembly and the camera assembly are inserted into a patient through the trocar, they are capable of movement in at least the axial, yaw, pitch, and roll directions. The robotic arms of the robotic arm assembly are designed to incorporate and employ a multi-degree of freedom of movement robotic arm with an end effector mounted at a distal end thereof that corresponds to a wrist area or joint of the user. In other embodiments, the working end (e.g., the end effector end) of the robotic arm is designed to incorporate and use or employ other robotic surgical instruments, such as for example the surgical instruments set forth in U.S. Pub. No. 2018/0221102, the entire contents of which are herein incorporated by reference.

[0033] Like numerical identifiers are used throughout the figures to refer to the same elements.

[0034] FIG. 1 is a schematic illustration of an example surgical robotic system 10 in which aspects of the present disclosure can be employed in accordance with some embodiments of the present disclosure. The surgical robotic system 10 includes an operator console 11 and a robotic subsystem 20 in accordance with some embodiments.

[0035] The operator console 11 includes a display 12, an image computing module 14, which can be a three-dimensional (3D) computing module, hand controllers 17 having a sensing and tracking module 16, and a computing module 18. Additionally, the operator console 11 can include a foot pedal array 19 including a plurality of pedals. The image computing module 14 can include a graphical user interface 39. The graphical user interface 39, the controller 26 or the image Tenderer 30, or both, can render one or more images or one or more graphical user interface elements on the graphical user interface 39. For example, a pillar box associated with a mode of operating the surgical robotic system 10, or any of the various components of the surgical robotic system 10, can be rendered on the graphical user interface 39. Also live video footage captured by a camera assembly 44 can also be rendered by the controller 26 or the image Tenderer 30 on the graphical user interface 39.

[0036] The operator console 11 can include a visualization system 9 that includes a display 12 which can be any selected type of display for displaying information, images or video generated by the image computing module 14, the computing module 18, and/or the robotic subsystem 20. The display 12 can include or form part of, for example, a head-mounted display (HMD), an augmented reality (AR) display (e.g., an AR display, or AR glasses in combination with a screen or display), a screen or a display, a two-dimensional (2D) screen or display, a three-dimensional (3D) screen or display, and the like. The display 12 can also include an optional sensing and tracking module 16A. In some embodiments, the display 12 can include an image display for outputting an image from a camera assembly 44 of the robotic subsystem 20.

[0037] The hand controllers 17 are configured to sense a movement of the operator’s hands and/or arms to manipulate the surgical robotic system 10. The hand controllers 17 can include the sensing and tracking module 16, circuity, and/or other hardware. The sensing and tracking module 16 can include one or more sensors or detectors that sense movements of the operator’s hands. In some embodiments, the one or more sensors or detectors that sense movements of the operator’s hands are disposed in the hand controllers 17 that are grasped by or engaged by hands of the operator. In some embodiments, the one or more sensors or detectors that sense movements of the operator’s hands are coupled to the hands and/or arms of the operator. For example, the sensors of the sensing and tracking module 16 can be coupled to a region of the hand and/or the arm, such as the fingers, the wrist region, the elbow region, and/or the shoulder region. Additional sensors can also be coupled to a head and/or neck region of the operator in some embodiments. In some embodiments, the sensing and tracking module 16 can be external and coupled to the hand controllers 17 via electricity components and/or mounting hardware. In some embodiments, the optional sensor and tracking module 16A can sense and track movement of one or more of an operator’s head, of at least a portion of an operator’s head, an operator’s eyes or an operator’s neck based, at least in part, on imaging of the operator in addition to or instead of by a sensor or sensors attached to the operator’s body.

[0038] In some embodiments, the sensing and tracking module 16 can employ sensors coupled to the torso of the operator or any other body part. In some embodiments, the sensing and tracking module 16 can employ in addition to the sensors an Inertial Momentum Unit (IMU) having for example an accelerometer, gyroscope, magnetometer, and a motion processor. The addition of a magnetometer allows for reduction in sensor drift about a vertical axis. In some embodiments, the sensing and tracking module 16 also include sensors placed in surgical material such as gloves, surgical scrubs, or a surgical gown. The sensors can be reusable or disposable. In some embodiments, sensors can be disposed external of the operator, such as at fixed locations in a room, such as an operating room. The external sensors 37 can generate external data 36 that can be processed by the computing module 18 and hence employed by the surgical robotic system 10.

[0039] The sensors generate position and/or orientation data indicative of the position and/or orientation of the operator’s hands and/or arms. The sensing and tracking modules 16 and/or 16A can be utilized to control movement (e.g., changing a position and/or an orientation) of the camera assembly 44 and robotic arm assembly 42 of the robotic subsystem 20. The tracking and position data 34 generated by the sensing and tracking module 16 can be conveyed to the computing module 18 for processing by at least one processor 22.

[0040] The computing module 18 can determine or calculate, from the tracking and position data 34 and 34A, the position and/or orientation of the operator’s hands or arms, and in some embodiments of the operator’s head as well, and convey the tracking and position data 34 and 34A to the robotic subsystem 20. The tracking and position data 34, 34A can be processed by the processor 22 and can be stored for example in the storage 24. The tracking and position data 34 and 34A can also be used by the controller 26, which in response can generate control signals for controlling movement of the robotic arm assembly 42 and/or the camera assembly 44. For example, the controller 26 can change a position and/or an orientation of at least a portion of the camera assembly 44, of at least a portion of the robotic arm assembly 42, or both. In some embodiments, the controller 26 can also adjust the pan and tilt of the camera assembly 44 to follow the movement of the operator’s head.

[0041] The robotic subsystem 20 can include a robot support system (RSS) 46 having a motor 40 and a trocar 50 or trocar mount, the robotic arm assembly 42, and the camera assembly 44. The robotic arm assembly 42 and the camera assembly 44 can form part of a single support axis robot system, such as that disclosed and described in U.S. Patent No. 10,285,765, or can form part of a split arm (SA) architecture robot system, such as that disclosed and described in PCT Patent Application No. PCT/US2020/039203, both of which are incorporated herein by reference in their entirety.

[0042] The robotic subsystem 20 can employ multiple different robotic arms that are deployable along different or separate axes. In some embodiments, the camera assembly 44, which can employ multiple different camera elements, can also be deployed along a common separate axis. Thus, the surgical robotic system 10 can employ multiple different components, such as a pair of separate robotic arms and the camera assembly 44, which are deployable along different axes. In some embodiments, the robotic arm assembly 42 and the camera assembly 44 are separately manipulatable, maneuverable, and movable. The robotic subsystem 20, which includes the robotic arm assembly 42 and the camera assembly 44, is disposable along separate manipulatable axes, and is referred to herein as an SA architecture. The SA architecture is designed to simplify and increase efficiency of the insertion of robotic surgical instruments through a single trocar at a single insertion point or site, while concomitantly assisting with deployment of the surgical instruments into a surgical ready state, as well as the subsequent removal of the surgical instruments through the trocar 50 as further described below.

[0043] The RSS 46 can include the motor 40 and the trocar 50 or a trocar mount. The RSS 46 can further include a support member that supports the motor 40 coupled to a distal end thereof. The motor 40 in turn can be coupled to the camera assembly 44 and to each of the robotic arm assembly 42. The support member can be configured and controlled to move linearly, or in any other selected direction or orientation, one or more components of the robotic subsystem 20. In some embodiments, the RSS 46 can be free standing. In some embodiments, the RSS 46 can include the motor 40 that is coupled to the robotic subsystem 20 at one end and to an adjustable support member or element at an opposed end.

[0044] The motor 40 can receive the control signals generated by the controller 26. The motor 40 can include gears, one or more motors, drivetrains, electronics, and the like, for powering and driving the robotic arm assembly 42 and the cameras assembly 44 separately or together. The motor 40 can also provide mechanical power, electrical power, mechanical communication, and electrical communication to the robotic arm assembly 42, the camera assembly 44, and/or other components of the RSS 46 and robotic subsystem 20. The motor 40 can be controlled by the computing module 18. The motor 40 can thus generate signals for controlling one or more motors that in turn can control and drive the robotic arm assembly 42, including for example the position and orientation of each robot joint of each robotic arm, as well as the camera assembly 44. The motor 40 can further provide for a translational or linear degree of freedom that is first utilized to insert and remove each component of the robotic subsystem 20 through the trocar 50. The motor 40 can also be employed to adjust the inserted depth of each robotic arm of the robotic arm assembly 42 when inserted into the patient 100 through the trocar 50.

[0045] The trocar 50 is a medical device that can be made up of an awl (which can be a metal or plastic sharpened or non-bladed tip), a cannula (essentially a hollow tube), and a seal in some embodiments. The trocar 50 can be used to place at least a portion of the robotic subsystem 20 in an interior cavity of a subject (e.g., a patient) and can withdraw gas and/or fluid from a body cavity. The robotic subsystem 20 can be inserted through the trocar 50 to access and perform an operation in vivo in a body cavity of a patient. In some embodiments, the robotic subsystem 20 can be supported, at least in part, by the trocar 50 or a trocar mount with multiple degrees of freedom such that the robotic arm assembly 42 and the camera assembly 44 can be maneuvered within the patient into a single position or multiple different positions. In some embodiments, the robotic arm assembly 42 and camera assembly 44 can be moved with respect to the trocar 50 or a trocar mount with multiple different degrees of freedom such that the robotic arm assembly 42 and the camera assembly 44 can be maneuvered within the patient into a single position or multiple different positions.

[0046] In some embodiments, the RSS 46 can further include an optional controller for processing input data from one or more of the system components (e.g., the display 12, the sensing and tracking module 16, the robotic arm assembly 42, the camera assembly 44, and the like), and for generating control signals in response thereto. The motor 40 can also include a storage element for storing data in some embodiments.

[0047] The robotic arm assembly 42 can be controlled to follow the scaled-down movement or motion of the operator’s arms and/or hands as sensed by the associated sensors in some embodiments and in some modes of operation. The robotic arm assembly 42 include a first robotic arm including a first end effector at distal end of the first robotic arm, and a second robotic arm including a second end effector disposed at a distal end of the second robotic arm. In some embodiments, the robotic arm assembly 42 can have portions or regions that can be associated with movements associated with the shoulder, elbow, and wrist joints as well as the fingers of the operator. For example, the robotic elbow joint can follow the position and orientation of the human elbow, and the robotic wrist joint can follow the position and orientation of the human wrist. The robotic arm assembly 42 can also have associated therewith end regions that can terminate in end-effectors that follow the movement of one or more fingers of the operator in some embodiments, such as for example the index finger as the user pinches together the index finger and thumb. In some embodiments, while the robotic arm assembly 42 can follow movement of the arms of the operator in some modes of control while a virtual chest of the robotic assembly can remain stationary (e.g., in an instrument control mode). In some embodiments, the position and orientation of the torso of the operator are subtracted from the position and orientation of the operator’s arms and/or hands. This subtraction allows the operator to move his or her torso without the robotic arms moving. Further disclosure control of movement of individual arms of a robotic assembly is provided in International Patent Application Publications WO 2022/094000 Al and WO 2021/231402 Al, each of which is incorporated by reference herein in its entirety.

[0048] The camera assembly 44 is configured to provide the operator with image data 48, such as for example a live video feed of an operation or surgical site, as well as enable the operator to actuate and control the cameras forming part of the camera assembly 44. In some embodiments, the camera assembly 44 can include one or more cameras (e.g., a pair of cameras), the optical axes of which are axially spaced apart by a selected distance, known as the inter-camera distance, to provide a stereoscopic view or image of the surgical site. In some embodiments, the operator can control the movement of the cameras via movement of the hands via sensors coupled to the hands of the operator or via hand controllers 17 grasped or held by hands of the operator, thus enabling the operator to obtain a desired view of an operation site in an intuitive and natural manner. In some embodiments, the operator can additionally control the movement of the camera via movement of the operator’s head. The camera assembly 44 is movable in multiple directions, including for example in yaw, pitch and roll directions relative to a direction of view. In some embodiments, the components of the stereoscopic cameras can be configured to provide a user experience that feels natural and comfortable. In some embodiments, the interaxial distance between the cameras can be modified to adjust the depth of the operation site perceived by the operator.

[0049] The image or video data 48 generated by the camera assembly 44 can be displayed on the display 12. In embodiments in which the display 12 includes an HMD, the display can include the built-in sensing and tracking module 16A that obtains raw orientation data for the yaw, pitch and roll directions of the HMD as well as positional data in Cartesian space (x, y, z) of the HMD. In some embodiments, positional and orientation data regarding an operator’s head can be provided via a separate head-tracking module. In some embodiments, the sensing and tracking module 16A can be used to provide supplementary position and orientation tracking data of the display in lieu of or in addition to the built-in tracking system of the HMD. In some embodiments, no head tracking of the operator is used or employed. In some embodiments, images of the operator can be used by the sensing and tracking module 16A for tracking at least a portion of the operator’s head.

[0050] FIG. 2A depicts an example robotic assembly 20, which is also referred to herein as a robotic subsystem, of a surgical robotic system 10 incorporated into or mounted onto a mobile patient cart in accordance with some embodiments. In some embodiments, the robotic subsystem 20 includes the RSS 46, which, in turn includes the motor 40, the robotic arm assembly 42 having end-effectors 45, the camera assembly 44 having one or more cameras 47, and can also include the trocar 50 or a trocar mount.

[0051] FIG. 2B depicts an example of an operator console 11 of the surgical robotic system 10 of the present disclosure in accordance with some embodiments. The operator console 11 includes a display 12, hand controllers 17, and also includes one or more additional controllers, such as a foot pedal array 19 for control of the robotic arm assembly 42, for control of the camera assembly 44, and for control of other aspects of the system.

[0052] FIG. 2B also depicts the left hand controller subsystem 23 A and the right hand controller subsystem 23B of the operator console. The left hand controller subsystem 23 A includes and supports the left hand controller 17A and the right hand controller subsystem 23B includes and supports the right hand controller 17B. In some embodiments, the left hand controller subsystem 23 A can releasably connect to or engage the left hand controller 17A, and right hand controller subsystem 23B can releasably connect to or engage the right hand controller 17A. In some embodiments, the connections can be both physical and electronic so that the left hand controller subsystem 23 A and the right hand controller subsystem 23B can receive signals from the left hand controller 17A and the right hand controller 17B, respectively, including signals that convey inputs received from a user selection on a button or touch input device of the left hand controller 17A or the right hand controller 17B.

[0053] Each of the left hand controller subsystem 23 A and the right hand controller subsystem 23B can include components that enable a range of motion of the respective left hand controller 17A and right hand controller 17B, so that the left hand controller 17A and right hand controller 17B can be translated or displaced in three dimensions and can additionally move in the roll, pitch, and yaw directions. Additionally, each of the left hand controller subsystem 23 A and the right hand controller subsystem 23B can register movement of the respective left hand controller 17A and right hand controller 17B in each of the forgoing directions and can send a signal providing such movement information to the processor 22 (as shown in FIG. 1) of the surgical robotic system 10.

[0054] In some embodiments, each of the left hand controller subsystem 23 A and the right hand controller subsystem 23B can be configured to receive and connect to or engage different hand controllers (not shown). For example, hand controllers with different configurations of buttons and touch input devices can be provided. Additionally, hand controllers with a different shape can be provided. The hand controllers can be selected for compatibility with a particular surgical robotic system or a particular surgical robotic procedure or selected based upon preference of an operator with respect to the buttons and input devices or with respect to the shape of the hand controller in order to provide greater comfort and ease for the operator.

[0055] FIG. 3 A schematically depicts a side view of the surgical robotic system 10 performing a surgery within an internal cavity 104 of a subject 100 in accordance with some embodiments and for some surgical procedures. FIG. 3B schematically depicts a top view of the surgical robotic system 10 performing the surgery within the internal cavity 104 of the subject 100. The subject 100 (e.g., a patient) is placed on an operation table 102 (e.g., a surgical table 102). In some embodiments, and for some surgical procedures, an incision is made in the patient 100 to gain access to the internal cavity 104. The trocar 50 is then inserted into the patient 100 at a selected location to provide access to the internal cavity 104 or operation site. The RSS 46 can then be maneuvered into position over the patient 100 and the trocar 50. In some embodiments, the RSS 46 includes a trocar mount that attaches to the trocar 50. The camera assembly 44 and the robotic arm assembly 42 can be coupled to the motor 40 and inserted individually and/or sequentially into the patient 100 through the trocar 50 and hence into the internal cavity 104 of the patient 100. Although the camera assembly 44 and the robotic arm assembly 42 can include some portions that remain external to the subject’s body in use, references to insertion of the robotic arm assembly 42 and/or the camera assembly 44 into an internal cavity of a subject and disposing the robotic arm assembly 42 and/or the camera assembly 44 in the internal cavity of the subject are referring to the portions of the robotic arm assembly 42 and the camera assembly 44 that are intended to be in the internal cavity of the subject during use. The sequential insertion method has the advantage of supporting smaller trocars and thus smaller incisions can be made in the patient 100, thus reducing the trauma experienced by the patient 100. In some embodiments, the camera assembly 44 and the robotic arm assembly 42 can be inserted in any order or in a specific order. In some embodiments, the camera assembly 44 can be followed by a first robotic arm 42A of the robotic arm assembly 42 and then followed by a second robotic arm 42B of the robotic arm assembly 42 all of which can be inserted into the trocar 50 and hence into the internal cavity 104. Once inserted into the patient 100, the RSS 46 can move the robotic arm assembly 42 and the camera assembly 44 to an operation site manually or automatically controlled by the operator console 11.

[0056] Further disclosure regarding control of movement of individual arms of a robotic arm assembly is provided in International Patent Application Publications WO 2022/094000 Al and WO 2021/231402 Al, each of which is incorporated by reference herein in its entirety. [0057] FIG. 4A is a perspective view of a robotic arm subassembly 21 in accordance with some embodiments. The robotic arm subassembly 21 includes a robotic arm 42 A, the endeffector 45 having an instrument tip 120 (e.g., monopolar scissors, needle driver/holder, bipolar grasper, or any other appropriate tool), a shaft 122 supporting the robotic arm 42 A. A distal end of the shaft 122 is coupled to the robotic arm 42A, and a proximal end of the shaft 122 is coupled to a housing 124 of the motor 40 (as shown in FIG. 2 A). At least a portion of the shaft 122 can be external to the internal cavity 104 (as shown in FIGS. 3A and 3B). At least a portion of the shaft 122 can be inserted into the internal cavity 104 (as shown in FIGS. 3 A and 3B).

[0058] FIG. 4B is a side view of the robotic arm assembly 42. The robotic arm assembly 42 includes a shoulder joint 126 forming a virtual shoulder, an elbow joint 128 having position sensors 132 (e.g., capacitive proximity sensors)and forming a virtual elbow, a wrist joint 130 forming a virtual wrist, and the end-effector 45 in accordance with some embodiments. The shoulder joint 126, the elbow joint 128, the wrist joint 130 can include a series of hinge and rotary joints to provide each arm with positionable, seven degrees of freedom, along with one additional grasping degree of freedom for the end-effector 45 in some embodiments.

[0059] FIG. 5 illustrates a perspective front view of a portion of the robotic assembly 20 configured for insertion into an internal body cavity of a patient. The robotic assembly 20 includes a robotic arm 42 A and a robotic arm 42B. The two robotic arms 42 A and 42B can define, or at least partially define, a virtual chest 140 of the robotic assembly 20 in some embodiments. In some embodiments, the virtual chest 140 (depicted as a triangle with dotted lines) can be defined by a chest plane extending between a first pivot point 142A of a most proximal joint of the robotic arm 42A (e.g., a shoulder joint 126), a second pivot point 142B of a most proximal joint of the robotic arm 42B, and a camera imaging center point 144 of the camera(s) 47. A pivot center 146 of the virtual chest 140 lies in the middle of the virtual chest 140.

[0060] In some embodiments, sensors in one or both of the robotic arm 42A and the robotic arm 42B can be used by the surgical robotic system 10 to determine a change in location in three-dimensional space of at least a portion of each or both of the robotic arms 42 A and 42B. In some embodiments, sensors in one or both of the first robotic arm 42A and second robotic arm 42B can be used by the surgical robotic system 10 to determine a location in three- dimensional space of at least a portion of one robotic arm relative to a location in three- dimensional space of at least a portion of the other robotic arm.

[0061] In some embodiments, the camera assembly 44 is configured to obtain images from which the surgical robotic system 10 can determine relative locations in three-dimensional space. For example, the camera assembly 44 can include multiple cameras, at least two of which are laterally displaced from each other relative to an imaging axis, and the system can be configured to determine a distance to features within the internal body cavity. Further disclosure regarding a surgical robotic system including camera assembly and associated system for determining a distance to features can be found in International Patent Application Publication No. WO 2021/159409, entitled “System and Method for Determining Depth Perception In Vivo in a Surgical Robotic System,” and published August 12, 2021, which is incorporated by reference herein in its entirety. Information about the distance to features and information regarding optical properties of the cameras can be used by a system to determine relative locations in three-dimensional space.

[0062] FIG. 6 is a graphical user interface 150 that is formatted to include a left pillar box 198 198 and a right pillar box 199 to the left and right of a live video footage 168, respectively, of a cavity of a patient. The graphical user interface 150 can be overlaid over the live video footage 168. In some embodiments, the live video footage 168 is formatted by the controller 26 to accommodate the left pillar box 198 and the right pillar box 199. In some embodiments, the live video footage 168 can be displayed on display 12, with a predetermined size and location on the display 12, and the left pillar box 198 and the right pillar box 199 can be displayed on either side of the live video footage 168 with a certain size based on the remaining area on the display 12 that is not occupied by the live video footage 168. The graphical user interface 150 includes multiple different graphical user interface elements, which are described below in more detail.

[0063] Robotic arms 42B and 42A are also visible in the live video footage. The left pillar box 198 can include a status identifier 173, for example, an engaged or disengaged status identifier associated with an instrument tip 120 of the robotic arm 42B. The “engaged” status identifier 173 indicates that the user’s left hand and arm are engaged with the left hand controller 201 and therefore the instrument tip 120 is also engaged. The “disengaged” status identifier 173 indicates that the user’s left hand and arm are not engaged with the hand controller 201 and therefore the instrument tip 120 is also disengaged. When the user’s left hand and arm are disengaged with the left hand controller 201, the surgical robotic system 10 can be completely disengaged. That is the surgical robotic system 10 can remain on, but it is unresponsive until the user’s hands reengage with the hand controllers. The instrument tip 120 can be represented by iconographic symbol 179 that includes a name of the instrument tip 120 to provide confirmation to the user of what type of end effector or instrument tip is currently in use. In FIG. 6, the instrument tip 120 represented by the iconographic symbol 179 is a bipolar grasper. Notably, the present disclosure is not limited to the bipolar grasper or scissor shown in FIG. 6.

[0064] Similarly, the right pillar box 199 can include a status identifier 175 associated with an instrument tip 120 of the robotic arm 42A for example, engaged or disengaged status identifier. In some embodiments, based on the status of the end effector, the graphical user interface can also provide a visual representation of the status in addition to text. For example, the end effector iconography can be “grayed out” or made less prominent if it is not disengaged.

[0065] The status identifier 175 can be “engaged” thereby indicating that the user’s right hand and arm are engaged with the right hand controller 202 and therefore the instrument tip 120 is also engaged. Alternatively, the status identifier 175 can be “disengaged” thereby indicating that the user’s right hand and arm are not engaged with the right hand controller 202 and therefore the instrument tip 120 is also disengaged. The instrument tip 120 can be represented by iconographic symbol 176 that includes a name of the instrument tip 120 to provide confirmation to the user of what type of end effector or instrument tip is currently in use. In FIG. 6, the instrument tip 120 represented by iconographic symbol 176 is a monopolar scissors. Notably, the present disclosure is not limited to the monopolar scissors shown in FIG. 6. [0066] The left pillar box 198 can also include a robot pose view 171. The robot pose view 171 includes a simulated view of the robotic arms 42B and 42 A, the camera assembly 44, and the support arm thereby allowing the user to get a third person view of the robotic arm assembly 42, the camera assembly 44, and the robot support system 46. The simulated view of the robotic arms 42B and 42 A represented by a pair of simulated robotic arms 191 and

192. The simulated view of the camera assembly 44 is represented by a simulated camera

193. The robot pose view 171 also includes a simulated camera view associated with a cavity, or a portion of the cavity, of a patient, which is representative of the placement, or location of the pair of robotic arms 151 and 172 relative to a frustum 151. More specifically the camera view can be the field of view of the camera assembly 44 and is equivalent to the frustum 151. [0067] The right pillar box 199 can also include a robot pose view 172 that includes a simulated view of the robotic arms 42B and 42A, the camera assembly 44, the support arm thereby allowing the user to get a third person view of the robotic arm assembly 42, the camera assembly 44, and the support arm. The simulated view of the robotic arms 42B and 42A are a pair of simulated robotic arms 165 and 166. The simulated view of the camera assembly 44 is represented by a simulated camera 193. The robot pose view 172 also includes a simulated camera view associated with a cavity, or a portion of the cavity, of the patient, which is the placement, or location of the pair of robotic arms 165 and 166 relative to a frustum 167. More specifically, the camera view can be the camera’s field of view which is the frustum 167. The robot pose view 172 provides elbow height awareness, and situational awareness especially with driving in up facing /flip facing configurations.

[0068] Situational awareness can be characterized as a way of understanding certain robotic elements with respect to time and space when the robotic arms 42A and 42B are inside the cavity of the patient. For example, as shown in the robot pose view 171 the elbow of the simulated robotic arm 192 is bent downwards thereby providing the user with the ability to know how the elbow of the actual robotic arm 42A is actually oriented and positioned within the cavity of the patient. It should be noted that because of the positioning of the camera assembly 44 with respect to the robotic arms 42A and 42B, the entire length of the robotic arms 42A and 42B may not be visible in the live video footage 168. As a result, the user may not have visualization of how the robotic arms 42 A and 42B are oriented and positioned within the cavity of the patient. The simulated robotic arms 165 and 166, as well as simulated robotic arms 191 and 192 provide the user with the situational awareness of at least the position and orientation of the actual robotic arms 42A and 42B within the cavity of the patient.

[0069] There can be two separate views (the robot pose view 171 and the robot pose view 172) from two different viewpoints on each side of the graphical user interface 150 that is rendered on display 12. The robot pose view 171 and the robot pose view 172 automatically update to stay centered on the trocar 50 while maintaining the robotic arms 42A and 42B in view. The robot pose views 171 and 172 also provide the user with spatial awareness.

[0070] Spatial awareness can be characterized as the placement or position of the robotic arms 42 A and 42B as viewed in robot pose views 171 and 172 relative to other objects in the cavity and the cavity itself. The robot pose views 171 and 172 provide the user with the ability to determine where the actual robotic arms 42A and 42B are located within the cavity by viewing the simulated robotic arms 191 and 192 in the robot pose view 171 and simulated robotic arms 165 and 166 in robot pose view 172. For example, the robot pose view 171 illustrates the position and location of the simulated robotic arms 191 and 192 relative to the frustum 151. The robot pose view 171 depicts the simulated robotic arms 191 and 192 with respect to the frustum 151, from a side view of the support arm and the simulated robotic arms 191 and 192 that are attached to the support arm. This particular robot pose provides the user with the ability to better ascertain proximity to anatomical features within the cavity. [0071] The robot pose view 172 can also provide the user with the ability to better ascertain how close the actual robotic arms 42A and 42B are relative to one another, or how far apart they are from one another. Further still, the robot pose view 172 can also illustrate where the actual robotic arms 42A and 42B might be positioned or located relative to the inside the cavity of the patient that are to the left and right of the robotic arms 42A and 42B, thereby providing the user with a spatial awareness of where the robotic arms 42A and 42B are within the cavity, and where they are relative to anatomical features within the cavity. As noted above, because the full length of the robotic arms 42A and 42B are not visible in the live video footage 168, the simulated robotic arms 165 and 166 can provide the user with the spatial awareness to know how close or far apart the actual robotic arms 42A and 42B are from one another. The view provided by the robot pose view 172 is a view as if the user were looking at a field of the inside of the cavity. The robot pose view 172 provides the user with the spatial awareness to know how close the virtual elbows 128 are relative to one another if the user manipulates the right hand controller 202 and the left hand controller in such a way that the virtual elbows 128 are brought closer together, as well as how close the actual robotic arms 42A and 42B are to one another. For example, as the user manipulates the left hand controller 201 and the right hand controller 202 to straighten the robotic arms 42 A and 42B, simulated robotic arms 166 and 165 will become parallel to one another and the distance between the elbow of simulated robotic arm 165 and the elbow of the simulated robotic arm 166 decreases. Conversely as the user manipulates the left hand controller 201 and the right hand controller 202 to bend the robotic arms 42A and 42B, so that the distance between the virtual elbows 128 of the robotic arms 42A and 42B are is further apart, simulated robotic arms 166 and 165 will not be parallel to one another and the distance between the elbow of simulated robotic arm 165 and the elbow of the simulated robotic arm 166 will increase. The robot pose views 171 and 172 provide the user with the spatial awareness during a surgical procedure, because the live video footage 168 does not provide visualization of the entire length of the robotic arms 42 A and 42B.

[0072] In FIG. 6, the simulated robotic arms 191 and 192 are shown as being within the camera assembly 14 field of view associated with frustum 151, which provides the user with a situational awareness and spatial awareness of where the robotic arm 42B and the robotic arm 42A are located or positioned within a portion of the actual cavity captured of the patient. The camera view associated with the robot pose view 171 is a simulated view of the robotic arm 42B and the robotic arm 42A as if the user were viewing the actual view of the robotic arm 42B and the robotic arm 42A from a side view within the cavity of the patient. As noted above, the camera view can be the camera assembly 44 field of view which is the frustum 167. That is, the robot pose view 171 provides the user with a side view of simulated robotic arms 191 and 192, which are simulated views corresponding to the robotic arm 42B and the robotic arm 42A respectively.

[0073] In some embodiments, the graphical user interface 150 can display live video footage 168 from a single vantage point including a field of view of the cavity and the robotic arm 42B and the robotic arm 42A relative to different areas within the cavity as shown in FIG. 6. As a result, the user might not always be able to determine how the virtual elbows 128 of the robotic arm 42B and the robotic arm 42A are positioned. This is because the camera assembly 44 might not always include video footage of the virtual elbows 128 of the robotic arm 42B and video footage of the elbow of the robotic arm 42A, and therefore the user may not be able to determine how to adjust the right hand controller 202 and left hand controller 201 if they wish to maneuver within the cavity of the patient. The simulated view of the robotic arm 42B (robotic arm 191) and the simulated view of the robotic arm 42 A (robotic arm 192) provides the user with a view point that allows the user to determine the positioning of the virtual elbows 128 of the robotic arm 42A and the robotic arm 42B because the left situational awareness camera view panel includes a simulated field of view of the entire length of the robotic arms 191 and 192. Because the simulated field of view of the robot pose view 171 includes a view of the virtual elbows 128 of robotic arms 191 and 192, the user can adjust the positioning of the robotic arm 42B and the robotic arm 42A by manipulating the left hand controller 201 and the right hand controller 202, and watching how the robotic arms 191 and 192 move in accordance with the manipulation of the left hand controller 201 and the right hand controller 202.

[0074] The graphical user interface 150 can include the robot pose view 172 within which there is a frustum 167 that is the field of view, of the camera assembly 44, associated with a portion of the cavity of the patient, and the robotic arms 165 and 166 with a simulated camera 158 and simulated robotic supporting arm supporting the robotic arms 165 and 166.

[0075] In FIG. 6, the simulated robotic arms 165 and 166 are shown as being within the frustum 167, which is representative of location and positioning of the robotic arm 42B and the robotic arm 42A within the actual cavity of the patient. The view shown in the robot pose view 172 is a simulated view of the robotic arm 42B and the robotic arm 42A as if the user were viewing the robotic arm 42B and the robotic arm 42A from a top down view within the cavity of the patient. That is, the robot pose view 172 provides the user with a top down view of the simulated robotic arms 165 and 166, which are simulated views corresponding to the robotic arm 42B and the robotic arm 42A, respectively. The top down view provides the user with the ability to maintain a certain level of situational awareness of the robotic arm 42B and the robotic arm 42A as the user is performing a procedure within the cavity. The view of the simulated robotic arm 165 corresponding to the robotic arm 42B and the view of simulated robotic arm 166 corresponding to the robotic arm 42 A provides the user with a top view perspective that allows them to determine the positioning of the robotic arm 42B and the robotic arm 42 A, because the robot pose view 172 includes a simulated top-down field of view of the robotic arms 165 and 166, the camera 158, as well as support arm of the robotic assembly. Because the simulated field of view of the camera assembly 44 as outlined by the frustum 167 includes a top-down view of the simulated robotic arms 165 and 166, the user can adjust the positioning of the robotic arm 42B and the robotic arm 42A by manipulating the left hand controller 201 and the right hand controller 202, and watching how the simulated robotic arms 165 and 166 move forward or move backward within a portion of the cavity within the frustum 167 in accordance with the manipulation of the left hand controller 201 and the right hand controller 202.

[0076] The simulated view of the robotic arms 42B and 42 A in the robot pose views 171 and 172 is automatically updated to stay centered on the trocar 50 while maintaining the robotic arms 42B and 42A in view. In some embodiments, this can be accomplished based on one or more sensors, from the sensing and tracking module 16 that are on the robotic arms 42B and 42 A, providing information to the right hand controller 202 and the left hand controller 201. The sensors can be an encoder or hall effect sensor or other suitable sensor.

Anatomy Segmentation and Tracking

[0077] Anatomy segmentation and tracking as described herein can be employed with any of the surgical robotic systems described above or any other suitable surgical robotic system. Further, some embodiments described herein can be employed with semi-robotic endoscopic surgical systems that are only robotic in part.

[0078] An example for anatomy segmentation and tracking as taught herein, can be understood with reference to embodiments depicted in FIGS. 7-12 described below. For convenience, like reference numbers are used to referenced similar features of the various embodiments shown in the figures, unless otherwise noted.

[0079] FIG. 7 is a diagram illustrating an example anatomy segmentation and tracking module 300 for anatomy segmentation and anatomical structure tracking in accordance with some embodiments. The anatomy segmentation and tracking module 300 can be executed by the processor 22 of the computing module 18 of the surgical robotic system 10. In some embodiments, the anatomy segmentation and tracking module 300 can be part of system code 2000 as described with respect to FIGS. 11 A and 1 IB. The anatomy segmentation and tracking module 300 can include a machine learning model 320 that is fed by an input 310 and generates an output 340, a confidence module 346, and a tracking module 360. In some embodiments, the anatomy segmentation and tracking module 300 can include different components as shown in FIG. 7. For example, the anatomy segmentation and tracking module 300 can further include a training module to train a machine learning model. Some embodiments may include the confidence module 346 and some embodiments may not include the confidence module 346.

[0080] The input 310 includes position and orientation data 312 indicative of a pose of the robotic assembly 20 having the robotic arm assembly 42 and the camera assembly 44. For example, the surgical robotic system 10 can determine the position and orientation data 312 using the sensors coupled to the robotic arm assembly 42 and the camera assembly 44. With respect to FIGS. 1 and 4-6, the robotic arm assembly 42 can include sensors (e.g., sensors 132 in FIG. 4B) that detect a position and an orientation (e.g., roll, pitch, yaw) of one or more portions (e.g., various joints, end-effectors, or other part) of each of the robotic arms in 3D space (e.g., within a patent’s abdominal space). Sensors from the sensing and tracking module 16 that are on the robotic arms 42B and 42 A and the camera assembly 44 can detect position and orientation data 312 of the robotic arms 42B and 42A and the camera assembly 44.

[0081] The input 310 can also include an image 314 including a representation of one or more anatomical structures of a subject. In some embodiments, as shown in FIG. 6, the image 168 (e.g., a video frame of a live video stream) captured by the camera assembly 44 in a field of view of the camera assembly 44 can include an incarcerated hernia 200 and blood vessels 210 in an anatomical space 220 (e.g., abdominal space) of a subject (e.g., a patient). For example, the anatomy segmentation and tracking module 300 can access a video stream in real time from the camera assembly 44 to retrieve a video that depicts the content in a surgical field of view (e.g., a field of view of the camera assembly 44), which can include organs, vasculature, and other tissue. The anatomy segmentation and tracking module 300 can break down the retrieved video into a sequence of video frames that are recorded at a particular rate (e.g., 30Hz). The anatomy segmentation and tracking module 300 can input each video frame into the machine learning model 320 as described below. In some embodiments, the image 314 can be a single photo or a photo of a series of photos captured in a burst mode, or other suitable image that depicts anatomical structures of the subject.

[0082] In some embodiments, the input 310 can also include data from other systems, such as fluorescence data (e.g., from indocyanine green (ICG) imaging, and/or from a laparoscopic system), 3D data or map of the environment being captured by the camera assembly 44, and/or other suitable data associated with an internal body cavity where the surgical robotic system 10 is being operated. Notably, the present disclosure is not limited to the input data described herein.

[0083] The machine learning model 320 can include an encoder-decoder architecture and a classifier architecture. The encoder-decoder architecture can include a pose encoder 322, a visual encoder 326, and decoders 322. The classifier architecture can include a classifier 324 (e.g., a multi-class classifier). The machine learning model 320 can segment multiple types of anatomical structures (e.g., blood vessels, nerves, other tube-like structures, organs, other sensitive structures, and/or obstruction structures) in a surgical field of view (e.g., a field of view of the camera assembly 44) across a diverse range of anatomical locations and can also identify anatomical landmarks (e.g., in real time) as described below.

[0084] In some embodiments, the machine leaning model 320 can include one or more neural networks (e.g., convolutional neural network or other types of neural network). The pose encoder 322, the visual encoder 326, the decoders 322, and the classifier 324 can be part of a single neural network or can be different neural networks. In some embodiments, the machine learning mode 320 can be generated using training process based on supervised learning, semi-supervised learning, unsupervised learning, reinforcement learning, and/or other suitable learning methods.

[0085] The pose encoder 322 can be fed by the position and orientation data 312 of the robotic assembly 20 and extract, from the position and orientation data 312, a compact representation of a pose of the robotic assembly 20. The extracted compact representation can be referred to as a pose representation 324. The extracted compact representation can be a vector that represents the pose of the robotic assembly 20 by reducing the dimensionality of the position and orientation data 312 or can be any other suitable representation that represents the position and orientation data 312 in a compact manner (e.g., reduced data size or the like).

[0086] The visual encoder 326 can be fed by the image 314 and extract, from the image 314, a compact representation of the image 314. The extracted compact representation can be referred to as a visual representation 328. The compact representation can be a vector that represents the image by reducing the dimensionality of the image 314 or can be any other suitable representation that represents the image in a compact manner (e.g., reduced data size or the like).

[0087] The machine learning model 320 can aggregate the pose representation 324 and the visual representation 328 into a single state representation 330 that reflects a current state of the robotic assembly 20 and can be used to achieve multiple tasks of interest, such as identifying anatomical landmarks and delineating sensitive structures in a surgical field of view. In some embodiments, the machine learning model 320 can aggregate the pose representation 324 and the visual representation 328 by averaging the representations and weighing each representation equally. In some embodiments, the machine learning model 320 can aggregate the pose representation 324 and the visual representation 328 using a weighted average of the representations. The weight for each representation is determined according to a relative importance of each data modality. For example, the pose representation 324 can have a greater weight than the visual representation 328, if the robotic assembly 20 moves but the content of the image 314 changes slightly. In some embodiments, the machine learning model 320 can concatenate the representations such that the state representation 330 grows in size with the addition of each representation.

[0088] In some embodiments, the state representation 330 is extensible in that it can account for any number of modality-specific representations and is not limited to the pose representation 324 and the visual representation 328. For example, the machine learning model 320 can include more encoders to extract a respective compact representation from additional inputs, such as fluorescence data, 3D map data, and/or other suitable data as described with respect to the input 310.

[0089] The decoders 332 can be fed by the state representation 330 and generate multiple segmentation maps 342. Each segmentation map 342 can identify which of the one or more anatomical structures to avoid contact with the robotic arm assembly 42. In some embodiments, the decoders 332 can identify multiple anatomical structures. For example, there can be multiple sensitive structures that operating surgeons would like to be aware of. For example, they might want to avoid damaging nerves (anatomical structure 1) while being guided along arteries (anatomical structure 2). To enable the segmentation of multiple anatomical structures, each decoder 332 can be trained to identify a particular anatomical structure (e.g., nerves, arteries, or the like). Each decoder 332 can output a probability that each pixel in the image 314 depicts the particular anatomical structure. Pixels whose values exceed a value threshold can be considered to depict that anatomical structure in a field of view. For example, given N sensitive structures, the machine learning model 320 can have N segmentation decoders. In some embodiments, a single decoder can generate multiple segmentation maps 342.

[0090] The classifier 334 can be fed by the state representation 330 and identify one or more anatomical landmarks in an anatomical space in which the robotic assembly 20 is being operated. Examples of anatomical landmarks can include the inguinal triangle that refers to a region of an abdominal wall, the triangle of doom that refers to an anatomical triangle defined by the vas deferens medially, spermatic vessels laterally and peritoneal fold inferiorly, and the triangle of pain that refers to a region bound by the iliopubic tract, testicular vessels, and the peritoneal fold. [0091] In some embodiments, the classifier 334 can output probability of an area that the camera assembly 44 is viewing belonging to which of a plurality of anatomical landmarks. For example, the classifier 334 can be multi-class classifier to classify an area that the camera assembly 44 is viewing into different anatomical landmarks. The multi-class classifier can be trained by a predefined list of anatomical landmarks in an internal body cavity (e.g., abdominal space or other internal space). In some embodiments, the predefined list can be based on existing surgical practice and is modeled after how surgeons currently navigate the abdominal space. For example, in the context of a surgery to repair an inguinal hernia, a condition whereby organs protrude through the abdominal muscle, key anatomical landmarks include the inguinal triangle, the triangle of doom, and the triangle of pain. The classifier 334 can determine that the area that the camera assembly 44 is viewing belongs to which of the inguinal triangle, the triangle of doom, and the triangle of pain.

[0092] In some embodiments, a machine learning model can be trained by a training module (e.g., a training module 2400 in FIG. 1 IB) to generate the machine learning model 320. In some embodiments, the training module 2400 can include training sets having images with labeled anatomical structures and labeled anatomical landmarks and known position and orientation data. In some embodiments, the training module 2400 can access training sets stored in a remote server. The training module 2400 can feed the training sets into a machine learning model to be trained. The training module 124 can adjust the weights and other parameters in the machine learning model during the training process to reduce the difference between an output of the machine learning model and an expected output. The trained machine learning model 320 can be stored in a database or the anatomy segmentation and tracking module 300. In some embodiments, the training module 2400 can select a group of training sets as validation sets and apply the trained machine learning model 320 to the validation sets to evaluate the trained machine learning model 320.

[0093] The confidence module 346 can create an interdependence between various outputs 340 having the segmentation maps 342 and the identified anatomical landmarks 344. The confidence module 346 can be bidirectional whereby the identified anatomical landmarks 344 can affect a relative confidence one may have in the segmentation maps 342 generated for the different anatomical structures. For example, the presence of a particular anatomical landmark (e.g., one does not include nerves or has limited nerves) in a region in a surgical field of view can automatically eliminate the likelihood of that region being a nerve. The anatomy segmentation and tracking module 300 can down-weight the segmentation map 342 associated with nerves. As the confidence module 346 is bidirectional, in some cases, the segmentation maps 342 can also affect a relative confidence of the identified anatomical landmarks 344. For example, if the anatomy segmentation and tracking module 300 segments multiple structures (e.g., blood vessels and nerves), the anatomy segmentation and tracking module 300 can have a higher confidence that the robotic assembly 20 is looking at a neurovascular bundle, an entity which coincidences with a subset of the identified anatomical landmarks. The anatomy segmentation and tracking module 300 can down-weight the identified anatomical landmarks which are unlikely to have such neurovascular bundles (e.g., based on surgical and anatomical domain knowledge).

[0094] The tracking module 360 can update the segmentation maps 342 and identified anatomical landmarks 334 to reflect a change in a field of view of the camera assembly 44. For example, video frames are recorded over time at a particular sampling rate (e.g., 30Hz). During the recording time period, the camera assembly 44 is moved by the user and/or objects in the field of view change positions. Accordingly, the field of view of the camera assembly 44 can change over time. The tracking module 360 can update the segmentation maps 342 and identified anatomical landmarks 334 to reflect a change such that the anatomy segmentation and tracking module 300 can identify anatomical landmarks and segment anatomical structures over time (e.g., in real time).

[0095] The tracking module 360 can obtain a series of inputs 310 at a predetermined time interval (e.g., a frame rate of a video, a sampling rate, or user defined time interval) over a time period and apply the machine learning model 320 to each input 310 to generate the output 340. For example, at time slot Ti, the tracking module 360 can perform data processing 362A by applying the machine learning model 320 to the input 310 acquired at Ti and storing the corresponding output 340 and intermediate outputs (e.g., the state representation 330, the pose representation 324, and/or the visual representation 328). At time slot T2, the tracking module 360 can perform data processing 362B by applying the machine learning model 320 to the input 310 acquired at T2 and storing the corresponding output 340 and intermediate outputs. At time slot T_n, the tracking module 360 can perform data processing 362N by applying the machine learning model 320 to the input 310 acquired at T_n and storing the corresponding output 340 and intermediate outputs.

[0096] In some embodiments, the tracking module 360 can determine a similarity between the state representation at the current time slot (e.g., T_n) and a previous state representation from a previous time slot (T2 or Ti). The previous state representation can be generated using a previous pose representation extracted from previous position and orientation data acquired at the previous time slot and a previous visual representation extracted from a previous image acquired at the previous time slot. In some embodiments, the current time slot is contiguous to the previous time slot. For example, video frames are recorded over time at a particular sampling rate or a frame rate (e.g., 30Hz). The current time slot and previous time slot are neighboring time slots for acquiring neighboring frames. In some embodiments, there is a predetermined time interval (e.g., based on the frame rate) between the current time slot and the previous time slot, such as time slots for acquiring discontinuous frames. If the tracking module 360 determines that the similarity is equal to or greater than a similarity threshold, the tracking module 360 can average the state representation and the previous state representation to generate an averaged state representation. The tracking module 360 can feed the averaged state representation into the machine learning model 320 to identify one or more anatomical landmarks in an anatomical space in which the robotic assembly 20 is currently being operated and to generate a plurality of segmentation maps.

[0097] For example, the tracking module 360 can determine whether neighboring video frames over a time period depict a similar field of view (i.e., with minimal changes). The tracking module 360 can quantify a similarity of state representations corresponding to the neighboring frames. If the similarity exceeds a threshold value, the tracking module 360 determines that frames has minimal changes in the field of view and propagates the previous state representation of the previous frame forward in time. The tracking module 360 can average the state representations before feeding them into the machine learning model 320. Accordingly, the tracking module 360 capability can leverage historical information to more reliably segment anatomical structures.

[0098] In some embodiments, the tracking module 360 can enable the localization of the segmented anatomical structure within an anatomy of a subject. The tracking module 360 can provide the locations of the segmented anatomical structure relative to other structures to the users. For example, the tracking module 360 can determine a 3D reconstruction of the anatomical space in which the robotic assembly 20 is being operated. For example, the camera assembly 44 can include a light detection and ranging (LIDAR) or a dot matrix projector to obtain a 3D representation or map of at least a portion of a body cavity. The tracking module 360 can use the 3D reconstruction to determine locations of the robot assembly 20 within the anatomy and the locations of the segmented anatomical structure such that the users can easily control the robot assembly 20 to avoid the anatomical structures and/or manipulate the anatomical structures.

[0099] In some embodiments, the confidence module 346 and/or tracking module 360 can be included in the machine learning model 320. In some embodiments, the tracking module 360 can include the machine learning model 320.

[0100] FIG. 8 is a flowchart illustrating steps 400 for anatomical structure segmentation and anatomical landmark identification carried out by the surgical robotic system 10 in accordance with some embodiments. In step 402, the surgical robotic system 10 receives an image from the camera assembly 44 of the surgical robotic system 10. The image can include a representation of one or more anatomical structures of a subject. Examples are described with respect to the image 314 in FIG. 7.

[0101] In step 404, the surgical robotic system 10 extracts from the image a visual representation thereof. The visual representation can be a compact representation for the image. Examples are described with respect to the visual encoder 326 in FIG. 7.

[0102] In step 406, the surgical robotic system 10 determines position and orientation data associated with the robotic assembly 20. The robotic assembly 20 includes the robotic arm assembly 42 and the camera assembly 44. The position and orientation data indicates a pose of the robotic assembly 20. Examples are described with respect to the position and orientation data 312 in FIG. 7.

[0103] In step 408, the surgical robotic system 10 generates a pose representation of the robotic assembly 20 based at least in part on the position and orientation data. Examples are described with respect to the pose encoder 322 in FIG. 7.

[0104] In step 410, the surgical robotic system 10 generates a state representation based at least in part on the visual representation and the pose representation. Examples are described with respect to the state representation 330 in FIG. 7.

[0105] In step 412, the surgical robotic system 10 identifies, based at least in part on the state representation, one or more anatomical landmarks in an anatomical space in which the robotic assembly 20 is being operated. Examples are described with respect to the classifier 334 and the identified anatomical landmarks 344 in FIG. 7.

[0106] In step 414, the surgical robotic system 10 generates a plurality of segmentation maps. Each segmentation map identifies which of the one or more anatomical structures to avoid contact with the robotic assembly 20. Examples are described with respect to the decoders 322 and the segmentation maps 342 in FIG. 7. [0107] FIG. 9 is a flowchart illustrating steps 500 for anatomical structure tracking carried out by the surgical robotic system 10 in accordance with some embodiments. In step 502, the surgical robotic system 10 receives an image of a plurality of images captured by the camera assembly 44. The image includes a representation of one or more anatomical structures of a subject. Examples are described with respect to the image 314 and the tracking module 360 in FIG. 7.

[0108] In step 504, the surgical robotic system 10 extracts from the image a visual representation thereof. The visual representation can be a compact representation for the image. Examples are described with respect to the visual encoder 326 and the tracking module 360 in FIG. 7.

[0109] In step 506, the surgical robotic system 10 determines position and orientation data associated with the robotic assembly 20. Examples are described with respect to the position and orientation data 312 and tracking module 360 in FIG. 7.

[0110] In step 508, the surgical robotic system 10 generates a pose representation of the robotic assembly 20 based at least in part on the position and orientation data. Examples are described with respect to the pose encoder 322 and tracking module 360 in FIG. 7.

[OHl] In step 510, the surgical robotic system 10 generates a state representation based at least in part on the visual representation and the pose representation. Examples are described with respect to the state representation 330 and tracking module 360 in FIG. 7.

[0112] In step 512, the surgical robotic system 10 determines a similarity between the state representation and a previous state representation. The previous state representation is generated based at least in part on a previous pose representation extracted from previous position and orientation data and a previous visual representation extracted from a previous image that is one image of the plurality of images and is prior to the image in time. Examples are described with respect to the tracking module 360 in FIG. 7.

[0113] In step 514, in response to determining that the similarity is equal to or greater than a similarity threshold, the surgical robotic system 10 averages the state representation and the previous state representation to generate an averaged state representation. Examples are described with respect to the tracking module 360 in FIG. 7.

[0114] In step 516, the surgical robotic system 10 identifies, based at least in part on the averaged state representation, one or more anatomical landmarks in an anatomical space in which the robotic assembly 20 is being operated. Examples are described with respect to the classifier 334, the identified anatomical landmarks 344, and the tracking module 360 in FIG. 7.

[0115] In step 518, the surgical robotic system 10 generates a plurality of segmentation maps. Each segmentation map identifies which of the one or more anatomical structures to avoid contact with the robotic assembly 20. Examples are described with respect to the decoders 322, the segmentation maps 342, and the tracking module 360 in FIG. 7.

[0116] FIG. 10 is a flowchart illustrating steps of a method 600 for training the surgical robotic system 10 to automatically identify anatomical structures in accordance with some embodiments.

[0117] In step 602, a computational device, for example, a computing module 18 trains a machine learning model based at least in part on a training set having a plurality of labeled images and known position and orientation data associated with the robotic assembly 20 of the surgical robotic system 10. For example, the training module 2400 of FIG. 1 IB can include training sets having labeled images with labeled anatomical structures and labeled anatomical landmarks and known position and orientation data. In some embodiments, the training module 2400 can access training sets stored in a non-transitory computer readable medium, for example, a server. Each labeled image can have one or more labeled anatomical structures and in some embodiments one or more labeled anatomical landmarks. The known position and orientation data is indicative of a pose of the robotic assembly 20, such as a pose of the robotic arm assembly 42, a pose of the camera assembly 44, or both. The position and orientation data is one of the inputs to a machine learning model in addition to an image, as described with FIG. 7. The position and orientation data can provide additional information to an image captured by the camera assembly 44, such as direction, position, orientation, or any other suitable information regarding how the robotic arm assembly 42 and the camera assembly 44 are positioned while the image is being captured. The training module 2400 can feed the training sets into a machine learning model to be trained to generate the trained machine learning model 320. The training module 2400 can adjust the weights and other parameters in the machine learning model during the training process to reduce the difference between an output of the machine learning model and an expected output. For example, the training module 2400 can adjust one or more parameters in the machine learning model to reduce a difference between one or more anatomical landmarks identified by the machine learning model and corresponding labeled anatomical landmarks and a difference between the one or more anatomical structures in each segmentation map generated by the machine learning model and the corresponding labeled anatomical structures. The trained machine learning model 320 can be stored on a non-transitory storage medium or as a component of the anatomy segmentation and tracking module 300. In some embodiments, the training module 2400 can select a group of training sets as validation sets and apply the trained machine learning model 320 to the validation sets to evaluate the trained machine learning model 320.

[0118] In step 604, the surgical robotic system 10 deploys the trained machine learning model 320. For example, the trained machine learning model 320 can identify one or more anatomical landmarks in an anatomical space in which the robotic assembly 20 is being operated. Examples are described with respect to the classifier 334 and the identified anatomical landmarks 344 in FIG. 7. The trained machine learning model 320 can generate a plurality of segmentation maps, each segmentation map identifying which of the one or more anatomical structures to avoid contact with the robotic assembly 20. Examples are described with respect to the decoders 322 and the segmentation maps 342 in FIG. 7.

[0119] In some embodiments, the training process described with respect to FIG. 10 can be performed in remote computation servers 1102a-l 102n in FIG. 12. The computation servers 1102a-l 102n can train a machine learning model and send the trained machine learning model 320 to the surgical robotic system 10 for automatically identify anatomical structures. [0120] FIG. 11 A is a diagram of an example computing module 18 that can be used to perform one or more steps of the methods provided by example embodiments. The computing module 18 includes one or more non-transitory computer-readable media for storing one or more computer-executable instructions or software for implementing example embodiments. The non-transitory computer-readable media can include, but are not limited to, one or more types of hardware memory, non-transitory tangible media (for example, one or more magnetic storage disks, one or more optical disks, one or more USB flashdrives), and the like. For example, memory 1006 included in the computing module 18 can store computer-readable and computer-executable instructions or software for implementing example embodiments (e.g., the system code 2000). The computing module 18 also includes the processor 22 and associated core 1004, for executing computer-readable and computerexecutable instructions or software stored in the memory 1006 and other programs for controlling system hardware. The processor 22 can be a single core processor or multiple core (1004) processor. [0121] Memory 1006 can include a computer system memory or random access memory, such as DRAM, SRAM, EDO RAM, and the like. The memory 1006 can include other types of memory as well, or combinations thereof. A user can interact with the computing module 18 through the display 12, such as a touch screen display or computer monitor, which can display the graphical user interface (GUI) 39. The display 12 can also display other aspects, transducers and/or information or data associated with example embodiments. The computing module 18 can include other VO devices for receiving input from a user, for example, a keyboard or any suitable multi-point touch interface 1008, a pointing device 1010 (e.g., a pen, stylus, mouse, or trackpad). The keyboard 1008 and the pointing device 1010 can be coupled to the visual display device 12. The computing module 18 can include other suitable conventional VO peripherals.

[0122] The computing module 18 can also include one or more storage devices 24, such as a hard-drive, CD-ROM, or other computer readable media, for storing data and computer- readable instructions, applications, and/or software that implements example operations/steps of the surgical robotic system 10 as described herein, or portions thereof, which can be executed to generate GUI 39 on display 12. Example storage devices 24 can also store one or more databases for storing any suitable information required to implement example embodiments. The databases can be updated by a user or automatically at any suitable time to add, delete or update one or more items in the databases. Example storage device 24 can store one or more databases 1026 for storing provisioned data, and other data/information used to implement example embodiments of the systems and methods described herein.

[0123] The computing module 18 can include a network interface 1012 configured to interface via one or more network devices 1020 with one or more networks, for example, Local Area Network (LAN), Wide Area Network (WAN) or the Internet through a variety of connections including, but not limited to, standard telephone lines, LAN or WAN links (for example, 802.11, Tl, T3, 56kb, X.25), broadband connections (for example, ISDN, Frame Relay, ATM), wireless connections, controller area network (CAN), or some combination of any or all of the above. The network interface 1012 can include a built-in network adapter, network interface card, PCMCIA network card, card bus network adapter, wireless network adapter, USB network adapter, modem or any other device suitable for interfacing the computing module 18 to any type of network capable of communication and performing the operations described herein. Moreover, the computing module 18 can be any computer system, such as a workstation, desktop computer, server, laptop, handheld computer, tablet computer (e.g., the iPad® tablet computer), mobile computing or communication device (e.g., the iPhone® communication device), or other form of computing or telecommunications device that is capable of communication and that has sufficient processor power and memory capacity to perform the operations described herein.

[0124] The computing module 18 can run any operating system 1016, such as any of the versions of the Microsoft® Windows® operating systems, the different releases of the Unix and Linux operating systems, any version of the MacOS® for Macintosh computers, any embedded operating system, any real-time operating system, any open source operating system, any proprietary operating system, any operating systems for mobile computing devices, or any other operating system capable of running on the computing device and performing the operations described herein. In some embodiments, the operating system 1016 can be run in native mode or emulated mode. In some embodiments, the operating system 1016 can be run on one or more cloud machine instances.

[0125] The computing module 18 can also include an antenna 1030, where the antenna 1030 can transmit wireless transmissions a radio frequency (RF) front end and receive wireless transmissions from the RF front end.

[0126] FIG. 1 IB is a diagram illustrating an example system code 2000 that can be executable by the computing module 18 in accordance with some embodiments. The system code 2000 (non-transitory, computer-readable instructions) can be stored on a computer- readable medium, for example storage 24 and/or memory 1006, and executable by the hardware processor 22 of the computing module 18. The system code 2000 can include various custom-written software modules that carry out the steps/processes described herein, and can include, but is not limited to, a data collection module 2100 that collects the input 310 in FIG. 7, the anatomy segmentation and tracking module 300, encoders 2200 including the pose encoder 322 and visual encoder 326, a state representation aggregation module 2300 for aggregating the pose representation 324 and the visual representation 328 , the structure segmentation decoders 332, the classifier 334, the confidence module 346, the tracking module 360, and a training engine 2400. Each component of the system 100 is described with respect to FIG 7.

[0127] The system code 2000 can be programmed using any suitable programming languages including, but not limited to, C, C++, C#, Java, Python, or any other suitable language. Additionally, the system code 2000 can be distributed across multiple computer systems in communication with each other over a communications network, and/or stored and executed on a cloud computing platform and remotely accessed by a computer system in communication with the cloud platform.

[0128] FIG. 12 is a diagram illustrating computer hardware and network components on which the system 1100 can be implemented. The system 1100 can include the surgical robotic system 10, a plurality of computation servers 1102a-l 102n having at least one processor (e.g., one or more graphics processing units (GPUs), microprocessors, central processing units (CPUs), tensor processing units (TPUs), application-specific integrated circuits (ASICs), etc.) and memory for executing the computer instructions and methods described above (which can be embodied as system code 2000). The system 1100 can also include a plurality of data storage servers 1104a-l 104n for storing data. The computation servers 1102a-l 102n, the data storage servers 1104a-l 104n, and the surgical robotic system 10 accessed by a user 1112 can communicate over a communication network 1108.

Claims

CLAIMS WHAT IS CLAIMED IS:

1. A surgical robotic system comprising: a robotic assembly comprising: a camera assembly configured to generate one or more images of an interior cavity of a subj ect; a robotic arm assembly to be disposed in the interior cavity to perform a surgical operation; a memory storing one or more instructions; a processor configured to or programmed to read the one or more instructions stored in the memory, the processor operationally coupled to the robotic assembly to: receive an image from the camera assembly, the image including a representation of one or more anatomical structures of the subject; extract, from the image, a visual representation thereof, the visual representation being a compact representation for the image; determine position and orientation data associated with the robotic assembly, the position and orientation data indicative of a pose of the robotic assembly; generate a pose representation of the robotic assembly based at least in part on the position and orientation data; generate a state representation based at least in part on the visual representation and the pose representation, the state representation representing a state of the surgical robotic system; identify, based at least in part on the state representation, one or more anatomical landmarks in an anatomical space in which the robotic assembly is being operated; and generate a plurality of segmentation maps, each segmentation map identifying which of the one or more anatomical structures to avoid contact with the robotic assembly.

2. The surgical robotic system of claim 1, wherein the processor is further configured to or programmed to read the one or more instructions stored in the memory to execute a machine learning model to: identify the one or more anatomical landmarks in the image based on the state representation; and generate the plurality of segmentation maps, each of the segmentation maps identifying which of the anatomical structures to avoid contact with the robotic assembly.

3. The surgical robotic system of claim 1, wherein the state representation is generated by aggregating the visual representation and the pose representation.

4. The surgical robotic system of claim 3, wherein the visual representation and the pose representation are aggregated by: averaging the visual and pose representations and weighing each of the visual and pose representations equally; or generating a weighted average of the visual and pose representations.

5. The surgical robotic system of claim 3, wherein the visual representation and the pose representation are aggregated by concatenating the visual and pose representations such that the state representation grows in size with an addition of each of the visual and pose representations.

6. The surgical robotic system of claim 1, wherein the one or more anatomical landmarks comprise at least one of an inguinal triangle that refers to a region of an abdominal wall, a triangle of doom that refers to an anatomical triangle defined by a vas deferens medially, spermatic vessels laterally and peritoneal fold inferiorly, or a triangle of pain that refers to a region bound by an iliopubic tract, testicular vessels, and a peritoneal fold.

7. The surgical robotic system of claim 1, wherein the processor is further configured to or programmed to read the one or more instructions stored in the memory to: create an interdependence between the plurality of segmentation maps and the identified one or more anatomical landmarks; and adjust weights of the plurality of segmentation maps and the identified one or more anatomical landmarks based at least in part on the interdependence.

8. A surgical robotic system comprising: a robotic assembly comprising: a camera assembly configured to generate a plurality of images of an interior cavity of a subj ect; and a robotic arm assembly to be disposed in the interior cavity to perform a surgical operation; a memory storing one or more instructions; a processor configured to or programmed to read the one or more instructions stored in the memory, the processor operationally coupled to the robotic assembly to: receive an image of the plurality of images, the image including a representation of one or more anatomical structures of the subject; extract, from the image, a visual representation thereof, the visual representation being a compact representation for the image; determine position and orientation data associated with the robotic assembly, the position and orientation data indicative of a pose of the robotic assembly; generate a pose representation of the robotic assembly based at least in part on the position and orientation data; generate a state representation based at least in part on the visual representation and the pose representation, the state representation representing a current state of the surgical robotic system; determine a similarity between the state representation and a previous state representation, wherein the previous state representation is generated based at least in part on a previous pose representation extracted from previous position and orientation data and a previous visual representation extracted from a previous image that is one image of the plurality of images and is prior to the image in time; in response to determining that the similarity is equal to or greater than a similarity threshold, average the state representation and the previous state representation to generate an averaged state representation; identify one or more anatomical landmarks based at least in part on the averaged state representation; and generate a plurality of segmentation maps, each segmentation map identifying which of the one or more anatomical structures to avoid contact with the robotic assembly.

9. The surgical robotic system of claim 8, wherein the processor is further configured to or programmed to read the one or more instructions stored in the memory to: determine a three-dimensional (3D) reconstruction of an anatomical space in which the robotic assembly is being operated; determine a location of each of one or more identified anatomical structures in the anatomical space based at least in part on the 3D reconstruction.

10. The surgical robotic system of claim 8, wherein the plurality of images comprise a plurality of video frames.

11. The surgical robotic system of claim 8, wherein the processor is further configured to or programmed to read the one or more instructions stored in the memory to execute a machine learning model to: identify the one or more anatomical landmarks based at least in part on the averaged state representation; and generate the plurality of segmentation maps, each segmentation map identifying which of the one or more anatomical structures to avoid contact with the robotic assembly.

12. The surgical robotic system of claim 8, wherein the state representation is generated by aggregating the visual representation and the pose representation.

13. The surgical robotic system of claim 12, wherein the visual representation and the pose representation are aggregated by: averaging the visual and pose representations and weighing each of the visual and pose representations equally; or generating a weighted average of the visual and pose representations.

14. The surgical robotic system of claim 14, wherein the visual representation and the pose representation are aggregated by concatenating the visual and pose representations such that the state representation grows in size with an addition of each of the visual and pose representations.

15. The surgical robotic system of claim 8, wherein the one or more anatomical landmarks comprise at least one of an inguinal triangle that refers to a region of an abdominal wall, a triangle of doom that refers to an anatomical triangle defined by a vas deferens medially, spermatic vessels laterally and peritoneal fold inferiorly, or a triangle of pain that refers to a region bound by an iliopubic tract, testicular vessels, and a peritoneal fold.

16. The surgical robotic system of claim 8, wherein the processor is further configured to or programmed to read the one or more instructions stored in the memory to: create an interdependence between the plurality of segmentation maps and the identified one or more anatomical landmarks; and adjust weights of the plurality of segmentation maps and the identified one or more anatomical landmarks based at least in part on the interdependence.

17. The surgical robotic system of claim 8, wherein the similarity between the state representation and a previous state representation is determined by: determining whether neighboring images of the plurality of images over a time period depict a similar field of view.

18. The surgical robotic system of claim 8, wherein the processor is further configured to or programmed to read the one or more instructions stored in the memory to: update the plurality of segmentation maps and the identified one or more one or more anatomical landmarks in real time to reflect a change in a field of view of the camera assembly.

19. A computer-implemented method for training a surgical robotic system to automatically identify anatomical structures, comprising: training a machine learning model based at least in part on a training set having a plurality of labeled images and known position and orientation data associated with a robotic assembly of the surgical robotic system, each labeled image having one or more labeled anatomical structures and one or more labeled anatomical landmarks, the known position and orientation data indicative of a pose of the robotic assembly; and deploy the trained machine learning model to: identify one or more anatomical landmarks in an anatomical space in which the robotic assembly is being operated; and generate a plurality of segmentation maps, each segmentation map identifying which of the one or more anatomical structures to avoid contact with the robotic assembly.

20. The computer-implemented method of claim 1, wherein training the machine learning model comprises adjusting one or more parameters in the machine learning model to reduce a difference between one or more anatomical landmarks identified by the machine learning model and corresponding labeled anatomical landmarks and a difference between the one or more anatomical structures in each segmentation map generated by the machine learning model and the corresponding labeled anatomical structures.