Nothing Special   »   [go: up one dir, main page]

CN116777994A - Estimating pose in 3D space - Google Patents

Estimating pose in 3D space Download PDF

Info

Publication number
CN116777994A
CN116777994A CN202310724682.XA CN202310724682A CN116777994A CN 116777994 A CN116777994 A CN 116777994A CN 202310724682 A CN202310724682 A CN 202310724682A CN 116777994 A CN116777994 A CN 116777994A
Authority
CN
China
Prior art keywords
image
sparse points
sparse
points
pose
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310724682.XA
Other languages
Chinese (zh)
Inventor
A·克勒
G·布拉德斯基
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Magic Leap Inc
Original Assignee
Magic Leap Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Magic Leap Inc filed Critical Magic Leap Inc
Publication of CN116777994A publication Critical patent/CN116777994A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/332Displays for viewing with the aid of special glasses or head-mounted displays [HMD]
    • H04N13/344Displays for viewing with the aid of special glasses or head-mounted displays [HMD] with head-mounted left-right displays
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/16Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using electromagnetic waves other than radio waves
    • G01S5/163Determination of attitude
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/242Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/398Synchronisation thereof; Control thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/95Computational photography systems, e.g. light-field imaging systems
    • H04N23/951Computational photography systems, e.g. light-field imaging systems by using two or more images to influence resolution, frame rate or aspect ratio
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Electromagnetism (AREA)
  • Computing Systems (AREA)
  • Image Analysis (AREA)
  • Studio Devices (AREA)
  • Length Measuring Devices By Optical Means (AREA)
  • Image Processing (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)
  • Body Structure For Vehicles (AREA)

Abstract

The present disclosure relates to estimating pose in 3D space. Methods and devices for estimating a position of a device within a 3D environment are described. Embodiments of these methods include sequentially receiving a plurality of image segments forming an image, the image representing a field of view (FOV) comprising a portion of an environment. The image includes a plurality of sparse points that are identifiable based in part on a corresponding subset of image segments of the plurality of image segments. The method further includes sequentially identifying one or more sparse points as each subset of image segments corresponding to the one or more sparse points is received, and estimating a position of the device within the environment based on the identified one or more sparse points.

Description

Estimating pose in 3D space
The application is a divisional application of application with the application date of 2017, 5 month and 17 days, PCT international application number of PCT/US2017/033139, chinese national stage application number of 201780053000.X, and the name of "estimating pose in 3D space".
Cross Reference to Related Applications
The present application is based on the priority of U.S. provisional patent application No.62/357,285 entitled "ESTIMATING POSE IN 3D SPACE (estimated pose in 3D SPACE)" filed on even date 6 of 35u.s.c. ≡119 (e), the entire disclosure of which is incorporated herein by reference.
Technical Field
The present disclosure relates to virtual reality and augmented reality imaging and visualization systems, and more particularly to sparse (spark) pose estimation in three-dimensional (3D) space.
Background
Modern computing and display technologies have facilitated system development for so-called "virtual reality" or "augmented reality" experiences, in which digitally rendered images, or portions thereof, are presented to a user in a manner that appears or may be perceived as authentic. Virtual reality or "VR" scenes typically involve the presentation of digital or virtual image information while being opaque to other actual real world visual inputs; augmented reality or "AR" scenes generally involve the presentation of digital or virtual image information as an enhancement to the visualization of the real world around the user. For example, referring to fig. 1, an augmented reality scene 1000 is depicted in which a user of AR technology sees a real-world park-like setting 1000 featuring people, trees, buildings in the background, and a concrete platform 1120. In addition to these items, the user of the AR technology also perceives that he "sees" the robotic figurine 1110 standing on the real world platform 1120, and the avatar character 1130, which appears to be the flying card formula of the hornet's avatar, although these elements are not present in the real world. Human visual perception systems have proven to be very complex and it is challenging to produce VR or AR technologies that promote a comfortable, natural-feeling, rich presentation of virtual image elements in addition to other virtual or real world image elements. The systems and methods disclosed herein address various challenges associated with VR and AR techniques.
Disclosure of Invention
An aspect of the present disclosure provides sparse pose estimation performed by an image capture device when capturing sparse points in an image frame. Thus, sparse pose estimation may be performed prior to capturing the entire image frame. In some embodiments, the sparse pose estimate may be refined or updated as the image frames are captured.
In some embodiments, systems, devices, and methods for estimating a location of an image capture device within an environment are disclosed. In some implementations, the method can include sequentially receiving a first plurality of image segments. The first plurality of image segments may form at least a portion of an image representing a field of view (FOV) from a front of the image capture device, which may include a portion of an environment surrounding the image capture device and a plurality of sparse points. Each sparse point may correspond to a subset of image segments. The method may further include identifying (identify) a first set of sparse points including one or more sparse points identified when the first plurality of image segments are received. The method may then include determining, by a position estimation system, a position of the image capture device within an environment based on the first set of sparse points. The method may further include sequentially receiving a second plurality of image segments, which may be received after the first plurality of image segments and form at least another portion of the image. The method may then include identifying a second set of sparse points, which may include one or more sparse points identified when the second plurality of image segments are received. The method may then update a location of the image capture device within the environment based on the first set of sparse points and the second set of sparse points by the location estimation system.
In some embodiments, systems, devices, and methods for estimating a location of an image capture device within an environment are disclosed. In some implementations, a method may include sequentially receiving a plurality of image segments, which may form an image representing a field of view (FOV) from in front of the image capture device. The FOV may include a portion of the environment surrounding the image capture device and include a plurality of sparse points. Each sparse point may be identifiable based in part on a corresponding subset of image segments of the plurality of image segments. The method may further comprise: one or more sparse points are sequentially identified as each subset of image segments corresponding to one or more sparse points of the plurality of sparse points is received. The method may then include estimating a location of the image capture device within the environment based on the identified one or more sparse points.
In some embodiments, systems, devices, and methods for estimating a location of an image capture device within an environment are disclosed. In some embodiments, the image capture device may include an image sensor configured to capture an image. An image may be captured by sequentially capturing a plurality of image slices representing a field of view (FOV) of the image capture device. The FOV may include a portion of the environment surrounding the image capture device and a plurality of sparse points. Each sparse point may be identifiable based in part on a corresponding subset of the plurality of image segments. The image capturing apparatus may further include: a memory circuit configured to store a subset of image segments corresponding to one or more sparse points; and a computer processor operatively coupled to the memory circuit. The computer processor may be configured to sequentially identify one or more sparse points of the plurality of sparse points as each subset of image segments corresponding to the one or more sparse points is received by the image capture device. The computer processor may be further configured to extract one or more sparse points that are sequentially identified to estimate a location of the image capture device within the environment based on the identified one or more sparse points.
In some embodiments, systems, devices, and methods for estimating a location of an image capture device within an environment are disclosed. In some implementations, an augmented reality system is disclosed. The augmented reality system may include an outward facing imaging device, computer hardware, and a processor operatively coupled to the computer hardware and the outward facing imaging device. The processor may be configured to execute instructions to perform at least a portion of the methods disclosed herein.
In some embodiments, systems, devices, and methods for estimating a location of an image capture device within an environment are disclosed. In some embodiments, an autonomous entity (autonomous entity) is disclosed. The autonomous entity may include an outward facing imaging device, computer hardware, and a processor operatively coupled to the computer hardware and the outward facing imaging device. The processor may be configured to execute instructions to perform at least a portion of the methods disclosed herein.
In some embodiments, systems, devices, and methods for estimating a location of an image capture device within an environment are disclosed. In some embodiments, a robotic system is disclosed. The robotic system may include an outward-facing imaging device, computer hardware, and a processor operatively coupled to the computer hardware and the outward-facing imaging device. The processor may be configured to execute instructions to perform at least a portion of the methods disclosed herein.
The various embodiments of the method and apparatus within the scope of the appended claims each have several aspects, none of which are solely responsible for the desirable attributes described herein. Without limiting the scope of the appended claims, some prominent features are described herein.
The details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, the drawings, and the claims. Neither this summary nor the following detailed description is intended to limit or restrict the scope of the inventive subject matter.
Drawings
Fig. 1 depicts a diagram of an augmented reality scene with certain virtual reality objects and certain actual reality objects viewed by a person.
Fig. 2 schematically illustrates an example of a wearable display system.
Fig. 3 schematically shows an example of a plurality of positions of an imaging device as the imaging device moves in a 3D space (in this example, a room).
Fig. 4A and 4B schematically show examples of the shearing effect for image frames.
Fig. 5A and 5B schematically illustrate examples of the shearing effect of fig. 4A and 4B for a plurality of sparse points.
Fig. 6 is a block diagram of an exemplary AR architecture.
FIG. 7 is an exemplary coordinate system for gestures.
FIG. 8 is a process flow diagram of an example of a method of determining a pose of an imaging device within a 3D space.
Fig. 9A and 9B schematically illustrate an example of extracting one or more sparse points from an image frame based on receiving a plurality of image segments.
FIG. 10 is a process flow diagram of another example of a method of determining a pose of an imaging device within a 3D space.
Throughout the drawings, reference numerals may be repeated to indicate corresponding relationships between the elements mentioned. The drawings, which are not to scale, are provided to illustrate the exemplary embodiments described herein and are not intended to limit the scope of the disclosure.
Detailed Description
SUMMARY
When using an AR device or other device that moves within a three-dimensional (3D) space, the device may need to track its movement through the 3D space and map the 3D space. For example, the AR device may move around within the 3D space due to movement of the user or independently of the user (e.g., a robot or other autonomous entity), and it may be beneficial to map the 3D space and determine one or more of the location, position, or orientation of the device within the 3D space for subsequent processing in order to highlight virtual image elements from other virtual image elements or real world image elements. For example, to accurately present virtual and real world image elements, a device may need to know where and the orientation it is in the real world and accurately present a virtual image at a particular location when located at a particular orientation within the real world space. In another embodiment, it may be desirable to render a trajectory through 3D space to the device. Thus, as the device moves around within the 3D space, it may be desirable to determine the position, location, or orientation of the device within the 3D space (hereinafter collectively referred to as "pose") in real-time. In some implementations, sparse pose estimates within 3D space may be determined from a continuous stream of image frames from an imaging device included as part of, for example, an AR device. Each image frame in the continuous stream may be stored for processing and also for estimating therefrom the pose of the device for inclusion in the sparse pose estimation. However, these techniques may cause delays in estimating pose as each frame is transferred in its entirety to memory to perform subsequent processing.
The present disclosure provides example devices and methods configured to estimate a pose of a device (e.g., an AR device or an autonomous device such as a robot) within a 3D space. As one example, as the device moves through 3D space, the device may perform sparse pose estimation based on receiving multiple image frames and estimating the pose of the device from each image frame. Each image frame may represent a portion of 3D space in front of the device that indicates the orientation of the device within the 3D space. In some embodiments, each image frame may include one or more features or objects that can be represented by sparse points, keypoints, point clouds, or other types of mathematical representations. For each image frame, the image frame may be captured by sequentially receiving a plurality of image segments that, when combined, make up the entire image frame. Thus, the device may be configured to identify sparse points within the image frame upon receipt of an image segment comprising each sparse point. The device may extract a first set of sparse points including one or more sparse points. The first set of sparse points may be at least one input to a sparse pose estimation process. The device may then identify and extract a second set of sparse points and update a sparse pose estimate based on the second set of sparse points. In one exemplary embodiment, the pose of the device may be estimated using the first set of sparse points before the subsequent sparse points (e.g., the second set of sparse points) are identified. Subsequent sparse points, when identified, may be used to update the sparse pose estimate.
Although embodiments of methods, devices, and systems are described herein with reference to an AR device, this is not intended to limit the scope of the present disclosure. The methods and devices described herein are not limited to AR devices or head mounted devices; other devices are possible (e.g., mobile robots, digital cameras, autonomous entities, etc.). Suitable devices include, but are not limited to, such devices that are capable of moving through 3D space independently or with user intervention. For example, the methods described herein may be applied to objects moving around in 3D space, which are tracked by cameras that are remote to the object. In some embodiments, the remote processing may also be performed remotely on the object.
Exemplary AR devices moving within 3D space
In order for a 3D display to facilitate comfortable, natural, rich highlighting of virtual image elements from other virtual or real world image elements, it is necessary to map the real world around the display and reproduce the trajectory of the display through 3D space. For example, a sparse pose estimation process may be performed to determine a map of the 3D space. If the sparse pose estimation is not performed in real-time with minimal delay, the user may experience unstable imaging, deleterious eye strain, headache, and generally unpleasant VR and AR viewing experiences. Accordingly, various embodiments described herein are configured to determine or estimate one or more of a position, location, or orientation of an AR device.
Fig. 2 shows an example of a wearable display system 100. The display system 100 includes a display 62 and various mechanical and electronic modules and systems that support the functionality of the display 62. The display 62 may be coupled to a frame 64 that may be worn by a display system user, wearer, or viewer 60 and may be configured to position the display 62 in front of the eyes of the user 60. The display system 100 may include a Head Mounted Display (HMD) that is worn on the head of a wearer. An Augmented Reality Display (ARD) may include a wearable display system 100. In some embodiments, a speaker 66 is coupled to the frame 64 and positioned near the user's ear canal (in some embodiments, another speaker (not shown) may be positioned near the user's other ear canal to provide stereo/shapable sound control). The display system 100 may include one or more outward facing imaging systems 110 that view the world in the environment (e.g., 3D space) surrounding the wearer. The display 62 may be operatively coupled to the local processing and data module 70 by a communication link 68, such as by a wired lead or wireless connection, and the local processing and data module 70 may be mounted in various configurations, such as fixedly attached to the frame 64, fixedly attached to a helmet or hat worn by the user, embedded in headphones, or otherwise removably attached to the user 60 (e.g., in a backpack configuration, in a belt-coupled configuration).
The display system 100 may include one or more outwardly facing imaging systems 110a or 110b (hereinafter referred to individually or collectively as "110") disposed on the frame 64. In some embodiments, the outward facing imaging system 110a may be disposed at approximately a central portion of the frame 64 between the eyes of the user. In another embodiment, alternatively or in combination, the outward facing imaging system 110b may be disposed on one or more sides of the frame adjacent to one or both eyes of the user. For example, outward facing imaging systems 110b may be located on the left and right sides of the user, adjacent to both eyes. While an exemplary arrangement of outward facing cameras 110 is provided above, other configurations are possible. For example, the outward facing imaging system 110 may be positioned in any orientation or position relative to the display system 100.
In some embodiments, outward facing imaging system 110 captures images of a portion of the world in front of display system 100. The entire area available FOR viewing or imaging by a viewer may be referred to as a field of view (FOR). In some implementations, the FOR may include substantially all solid angles around the display system 100, as the display may be moved around within the environment to image objects around the display (in front of, behind, above, below, or to the sides of the wearer). The portion of the FOR in front of the display system may be referred to as the field of view (FOV), and the outward facing imaging system 110 is sometimes referred to as a FOV camera. The images obtained from the outward facing imaging system 110 may be used to identify sparse points of the environment and estimate pose for sparse pose estimation processing, and so forth.
In some implementations, the outward facing imaging system 110 may be configured as a digital camera including an optical lens system and an image sensor. For example, light from the world in front of the display 62 (e.g., from the FOV) may be focused onto the image sensor by the lens of the outward facing imaging system 110. In some embodiments, the outward facing imaging system 100 may be configured to operate in the Infrared (IR) spectrum, the visible spectrum, or any other suitable wavelength range or electromagnetic radiation wavelength range. In some embodiments, the imaging sensor may be configured as a CMOS (complementary metal oxide semiconductor) or CCD (charge coupled device) sensor. In some embodiments, the image sensor may be configured to detect light in the IR spectrum, the visible spectrum, or any other suitable wavelength range or electromagnetic radiation wavelength range. In some embodiments, the frame rate of the digital camera may relate to the rate at which image data may be sent from the digital camera to a memory or storage unit (e.g., local processing and data module 70). For example, if the frame rate of the digital camera is 30 hertz, data captured by pixels of the image sensor may be read into memory every 30 milliseconds (e.g., clock off). Accordingly, the frame rate of the digital camera may cause delays in the storage and subsequent processing of the image data.
In some embodiments, where the outward facing imaging system 110 is a digital camera, the outward facing imaging system 110 may be configured as a global shutter camera or rolling shutter (e.g., also referred to as a progressive camera). For example, where the outward facing imaging system 110 is a global shutter camera, the image sensor may be a CCD sensor configured to capture an entire image frame representing the FOV in front of the display 62 with a single operation. The entire image frame may then be read into the local processing and data module 70 to perform processing, such as performing sparse pose estimation as described herein. Thus, in some embodiments, for example, as described above, delays in pose estimation may be generated with the entire image frame due to the frame rate and delays in the stored image. For example, a global shutter digital camera with a frame rate of 30 hertz may produce a 30 millisecond delay before any pose estimation can be performed.
In other embodiments, where the outward facing imaging system 110 is configured as a rolling shutter camera, the image sensor may be a CMOS sensor configured to sequentially capture a plurality of image slices and scan the scene to transmit image data of the captured image slices. When combined in the order of capture, the image segments constitute image frames of the FOV of the outward facing imaging system 110. In some embodiments, the scan direction may be horizontal, for example, the outward facing imaging system 110 may capture a plurality of vertical image segments horizontally adjacent in a left or right direction. In another embodiment, the scan direction may be vertical, for example, the outward facing imaging system 110 may capture a plurality of horizontal image segments vertically adjacent in an upward or downward direction. Each image segment may be sequentially read into the local processing and data module 70 as the corresponding image segment is captured at the image sensor. Thus, in some embodiments, as described above, the delay caused by the frame rate of the digital camera may be reduced or minimized by sequentially transmitting image clips as they are captured by the digital camera.
The local processing and data module 70 may include one or more hardware processors, as well as digital memory, such as non-volatile memory (e.g., flash memory), both of which may be used to facilitate the processing, buffering, caching, and storage of data. The data may include a) data captured from sensors (which may be, for example, operatively coupled to the frame 64 or otherwise attached to the user 60), such as image capture devices (outward facing imaging system 110), microphones, inertial Measurement Units (IMUs), accelerometers, compasses, global Positioning System (GPS) units, radios, and/or gyroscopes; and/or b) data acquired and/or processed using remote processing module 72 and/or remote data repository 74, possibly for transmission to display 62 after such processing or retrieval. The local processing and data module 70 may be operatively coupled to the remote processing module 72 and/or the remote data repository 74 by communication links 76 and/or 78 (such as via wired or wireless communication links) such that these remote modules may serve as resources for the local processing and data module 71. Further, the remote processing module 72 and the remote data repository 74 may be operatively coupled to each other. In some embodiments, the local processing and data module 70 may be operably coupled to one or more of an image capture device, a microphone, an inertial measurement unit, an accelerometer, a compass, a GPS unit, a radio, and/or a gyroscope. In some other embodiments, one or more of these sensors may be attached to the frame 64, or may be a stand-alone structure that communicates with the local processing and data module 70 through a wired or wireless communication path.
In some embodiments, the digital memory of the local processing and data module 70, or a portion thereof, may be configured to store one or more elements of data for a temporary period of time (e.g., as a non-transitory buffer storage). For example, the digital memory may be configured to receive some or all of the data and store some or all of the data for a short period of time while the data is moving between local processes and processes of the data module 70. In some implementations, a portion of the digital memory may be configured as a buffer that sequentially receives one or more image segments from the outward facing imaging system 110. Thus, the buffer may be a non-transitory data buffer configured to store a set number of image segments (as described below with reference to fig. 9A and 9B) before they are sent to the local processing and data module 70 (or remote data repository 74) for permanent storage or subsequent processing.
In some embodiments, remote processing module 72 may include one or more hardware processors configured to analyze and process data and/or image information. In some embodiments, the remote data repository 74 may include a digital data storage facility that may be obtained through the Internet or other network configuration in a "cloud" resource configuration. In some embodiments, the remote data repository 74 may include one or more remote servers that provide information (e.g., information used to generate augmented reality content) to the local processing and data module 70 and/or the remote processing module 72. In some embodiments, all data is stored and all calculations are performed in the local processing and data module 70, allowing for fully autonomous use from the remote module.
Although an exemplary AR device is described herein, it should be understood that the methods and devices disclosed herein are not limited to AR devices or head mounted devices. Other configurations are possible, such as mobile robots, digital cameras, autonomous entities, etc. Suitable devices include, but are not limited to, such devices that can be moved through 3D space independently or through the use of interventions.
Exemplary trajectories of AR devices through 3D space
Fig. 3 schematically shows an imaging device 310 moving through a 3D space 300. For example, FIG. 3 shows imaging device 310 at a plurality of orientations 312 (e.g., 312a, 312b, 312c, and 312 d) and orientations within environment 300 as imaging device 310 moves along a dashed line schematically representing trajectory 311. At each position 312, the imaging device 310 may be configured to capture image frames of the environment 300 of a particular location and orientation, which may be used as a continuous stream of image frames, for example, to perform sparse pose estimation. Track 311 may be any track or path that moves through environment 300. Although fig. 3 shows four orientations 312, the number of orientations may be different. For example, the number of orientations 312 may be as few as two orientations, or as many orientations (e.g., 5, 6, 7, etc.) as are needed to perform sparse pose estimation with an acceptable level of certainty. In some embodiments, the imaging device 312 may be configured to capture a series of image frames, e.g., as in a video, where each image frame of the video may be used to perform sparse pose estimation by computer vision techniques as described herein.
In some embodiments, imaging device 310 may be configured as display system 100 in fig. 1, including outward facing imaging system 110, a mobile robot incorporating an imaging system, or as a stand-alone imaging device. The imaging device 310 may be configured to capture image frames at each bearing 312 as it moves through the environment 300, the image frames depicting portions of the environment 300 from the front of the imaging device 310. As described above, the portion of the environment 300 captured by the imaging device at each position 312 and orientation may be the FOV from in front of the imaging device 310. For example, the FOV of bearing 312a is schematically shown as FOV 315a. Each subsequent position and orientation (e.g., 312b, 312c, and 312 d) of imaging device 310 includes a corresponding FOV 315 (e.g., FOV 315b, 315c, and 315 d). Computer vision techniques may be performed on each image frame obtained from imaging device 310 to estimate the pose of imaging device 310 at each bearing 312. The pose estimation may be, for example, an input to a sparse point estimation process for determining or generating a map (or portion thereof) of the environment 300 and tracking movement of the imaging device 310 through the environment 300.
The environment 300 may be any 3D space, such as an office (as shown in fig. 3), a living room, an outdoor space, and the like. The environment 300 may include a plurality of objects 325 (e.g., furniture, personal items, surrounding structures, textures, detectable patterns, etc.) disposed throughout the environment 300. The object 325 may be an individual object that is uniquely identifiable as compared to other features in the environment (e.g., each wall may not be uniquely identifiable). Further, the object 325 may be a common feature captured in two or more image frames. For example, fig. 3 shows an object 325a (in this example, a lamp) located in each FOV 315 of the imaging device 310 at each azimuth 312 along a respective line of sight 330a-d (shown as a dashed line for illustrative purposes). Thus, for each azimuth 312 (e.g., 312 a), the image frame representing each FOV 315 (e.g., 315 a) includes an object 325a imaged along the line of sight 330 (e.g., 330 a).
The imaging device 310 may be configured to detect and extract a plurality of sparse points 320 from each image frame representing the FOV 315, each sparse point 320 (or plurality of sparse points) corresponding to an object 325 or a portion, texture, or pattern of the object 325. For example, the imaging device 310 may extract sparse points 320a corresponding to the object 325 a. In some embodiments, the object 325a may be associated with one or more sparse points 320, where each sparse point 320 may be associated with a different portion of the object 325 (e.g., corner of a lamp, top, bottom, side, etc.). Thus, each sparse point 320 may be uniquely identified within an image frame. Computer vision techniques may be used to extract and identify each sparse point 320 from an image frame or image segment corresponding to each sparse point 320 (e.g., as described below in connection with fig. 9A and 9B).
In some embodiments, sparse points 320 may be used to estimate a position, location, or orientation of imaging device 310 within environment 300. For example, the imaging device 310 may be configured to extract a plurality of sparse points 320 as input to a sparse pose estimation process. An exemplary computer vision technique for sparse pose estimation may be a synchronous localization and mapping (SLAM or V-SLAM, meaning a configuration in which the input is image/vision only) process or algorithm. Such exemplary computer vision techniques may be used to output sparse point representations of the world surrounding the imaging device 310, as described in more detail below. In a conventional sparse pose estimation system using multiple image frames of orientations 312, sparse points 320 may be collected from each image frame, the correspondence between successive image frames (e.g., orientations 312 a-312 b) may be calculated, and the pose changes estimated based on the discovered correspondence. Thus, in some embodiments, the position and/or orientation of the imaging device 310 may be determined. In some embodiments, a 3D map of sparse point locations is required for the estimation process, or may be a byproduct of identifying sparse points in an image frame or frames. In some embodiments, the sparse points 320 may be associated with one or more descriptors, which may be configured as a digital representation of the sparse points 320. In some embodiments, the descriptors may be configured to facilitate computation of correspondence between successive image frames. In some embodiments, the pose determination may be performed by a processor located on the imaging device (e.g., local processing and data module 70) or remotely from the imaging device (e.g., remote processing module 72).
In some embodiments, a computer vision module may be included in operable communication with the imaging device 310 (e.g., as part of the local processing and data module 70 or the remote processing module and data repository 72, 74). An exemplary computer vision module may implement one or more computer vision techniques and may be used to analyze image segments obtained by an outward facing imaging camera, e.g., to identify sparse points, determine pose, etc., e.g., as described with reference to methods 800, 100 of fig. 8 and 10. The computer vision module may recognize objects in the environment surrounding the imaging device 310, such as those described in connection with fig. 3. As the imaging device moves in the environment, the computer vision module may extract sparse points from the image frames and use the extracted sparse points to track and recognize objects passing through the various image frames. For example, sparse points of a first image frame may be compared to sparse points of a second image frame to track movement of the imaging device. In some embodiments, the one or more sparse points of the second image frame may include one or more sparse points of the first image frame, e.g., as reference points for tracking between the first and second image frames. Third, fourth, fifth, etc. image frames may be similarly used and compared to previous and subsequent image frames. The computer vision module may process the sparse points to estimate a position or orientation of the imaging device within the environment based on the identified sparse points. Non-limiting examples of computer vision techniques include: scale Invariant Feature Transforms (SIFT), speeded Up Robust Features (SURF), directional FAST and rotational BRIEF (ORB), binary robust invariant extensible keypoints (BRISK), FAST retinal keypoints (FREAK), viola-Jones algorithms, feature face methods, lucas-Kanade algorithms, horn-schuk algorithms, mean shift (Mean-shift) algorithms, visual synchrony localization and mapping (v-SLAM) techniques, sequential bayesian estimators (e.g., kalman filters, extended kalman filters, etc.), beam method adjustment (bundle adjustment), adaptive thresholding (and other thresholding techniques), iterative Closest Points (ICP), semi-global matching (SGM), semi-global block matching (SGBM), feature point histograms, various machine learning algorithms (e.g., support vector machines, k-nearest neighbor algorithms, naive, neural networks (including convolutional or deep neural networks), or other supervised/unsupervised models, etc.), and the like.
As described above, the current pose estimation process may include a delay in estimating the pose of the imaging device. For example, the frame rate of an imaging device may cause delays due in part to the transfer of the entire image frame from the imaging device to memory. Without being bound by any particular scientific theory, sparse pose estimation may be delayed because sparse points are not extracted from an image frame until the entire image frame is read from the imaging device to memory. Therefore, transmitting the entire image frame based in part on the frame rate capability of the imaging device may become a factor in the delay experienced by the sparse pose estimation. One non-limiting advantage of some of the systems and devices described herein is that extraction or recognition of sparse points for estimating pose may be performed in flight (on the fly) when portions of an image frame are read into an image sensor or memory, and thus pose may be estimated at a point in time that is earlier than other possible points in time when the entire image frame is used. In addition, since only a part of the frame is analyzed for the key point, the processing speed and efficiency can be improved.
While the foregoing description describes sparse points 320 in the context of physical objects in environment 300, this is not intended to be limiting and other implementations are possible. In some embodiments, the object 325 may refer to any feature of the environment (e.g., a real world object, a virtual object, an invisible object or feature, etc.). For example, the projection device may be configured to project a plurality of indicators (indicators), textures, identifiers, etc. throughout the environment, which may be visible or invisible (e.g., projected in the IR spectrum, near infrared spectrum, ultraviolet spectrum, or in any other suitable wavelength range or electromagnetic radiation wavelength range). The indicator, texture, identifier, etc. may be a unique feature or shape that can be detected by the imaging device 310. The imaging device 310 may be configured to detect these indicators and extract sparse points 320 from the plurality of indicators. For example, the indicator may be projected on a wall of the environment in the IR spectrum of electromagnetic radiation, and the imaging device 310 may be configured to operate in the IR spectrum to recognize the indicator and extract sparse points therefrom. In another embodiment, imaging device 310 may alternatively or in combination be included in an AR device configured to display virtual image elements (e.g., on display 62). The imaging device or AR device may be configured to recognize the virtual image elements and extract sparse points therefrom 320. The AR device may be configured to use these sparse points 320 to determine a pose of the AR device relative to the virtual image element.
Examples of clipping effects imparted to exemplary image frames and sparse points
As described above, the outward facing imaging system 110 may be implemented as a rolling shutter camera. One non-limiting advantage of a rolling shutter camera is the ability to send portions of a captured scene (e.g., image clips) while capturing other portions of the scene (e.g., not all portions of an image frame are captured at exactly the same time). However, this may result in distortion of the object moving relative to the camera while capturing the image frames, because the imaging device may not be co-located relative to the object during the entire time that the image is captured.
For example, fig. 4A and 4B are schematic illustrations of rolling shutter effects (e.g., sometimes referred to herein as "clipping," "shifting," or "distortion") applied to a scene image. Fig. 4A schematically illustrates a scene 400a including an object 425a (e.g., square in this example). The scene may be the FOV of an image capture device (e.g., outward facing imaging system 110 in fig. 2). In the embodiment shown in fig. 4A, the scene may be moved in a direction 430 relative to the image capture device. Fig. 4B illustrates a generated image 400B of a captured scene 400a, which may be stored in a memory or storage unit (e.g., local processing and data module 70). As shown in fig. 4B, the resulting image 400B is a distorted object 425B (e.g., shown as a cut square or diamond) due to the relative movement of the object 425a, wherein the broken lines of the distorted object are not captured in the resulting image 400B. Without being bound by any particular scientific theory, this may be caused by the imaging device scanning down the direction line by line, so the top of the object is captured first, with less distortion than the bottom of the object.
Fig. 5A and 5B are schematic diagrams of rolling shutter effects imparted onto a plurality of sparse points included in a FOV (e.g., FOV 315A, 315B, 315c, or 315d in fig. 3) captured by an imaging device. For example, as the AR device moves around within the 3D space, the various sparse points move relative to the AR device and are distorted as schematically shown in fig. 5B in a manner similar to that described above in connection with fig. 4B. Fig. 5A shows a scene (e.g., similar to scene 300 in fig. 3) that includes a plurality of sparse points 320 (e.g., 320a, 320b, and 320 c). Fig. 4B schematically illustrates that the resulting captured image frame includes distorted sparse points 525 (e.g., 525a, 525B, and 525 c). For example, each distorted sparse point 525 is associated with an illustrative corresponding arrow 522. For illustrative purposes only, the size of arrow 522 is proportional to the amount of distortion imparted to sparse point 525. Thus, similar to that described above in connection with fig. 4B, arrow 522a is smaller than arrow 522e, which may indicate that the degree of distortion of sparse point 525a associated with arrow 522a is less severe than sparse point 525 e.
Exemplary AR architecture
Fig. 6 is a block diagram of an example of an AR architecture 600. The AR architecture 600 is configured to receive input from one or more imaging systems (e.g., visual input from the outward facing imaging system 110, input from a room camera, etc.). The imaging device not only provides images from the FOV camera, but may also be equipped with various sensors (e.g., accelerometer, gyroscope, temperature sensor, motion sensor, depth sensor, GPS sensor, etc.) to determine various other attributes of the location and user environment. This information may be further augmented by information from fixed cameras in the room that may provide images and/or various cues from different viewpoints.
The AR architecture 600 may include a plurality of cameras 610. For example, AR architecture 600 may include outward facing imaging system 110 in fig. 1, the outward facing imaging system 110 configured to input a plurality of captured images from a FOV in front of wearable display system 100. In some embodiments, the camera 610 may include a relatively wide field of view or passive camera pair disposed on the side of the user's face, as well as a different camera pair positioned in front of the user to handle the stereoscopic imaging process. However, other imaging systems, cameras, and arrangements are also possible.
The AR architecture 600 may also include a map database 630 that includes map data for the world. In one embodiment, map database 630 may reside partially on a user-wearable system (e.g., local processing and data module 70) or may reside partially at a network storage location (e.g., remote data repository 74) accessible through a wired or wireless network. In some embodiments, map database 630 may include real world map data or virtual map data (e.g., including virtual image elements defining a virtual map or overlaid on a real world environment). In some embodiments, computer vision techniques may be used to generate map data. In some embodiments, map database 630 may be a pre-existing map of the environment. In other embodiments, the map database 630 may be populated based on recognized sparse points that are read into memory and stored for comparison and processing relative to subsequently recognized sparse points. In another embodiment, map database 630 may be a pre-existing map that is dynamically updated based on recognized sparse points from one or more image frames (or portions of frames of a rolling shutter camera system), alone or in combination. For example, one or more sparse points may be used to identify objects in the environment (e.g., object 325 in fig. 3) and to populate the map with identifying features of the environment.
The AR architecture 600 may further include a buffer 620 configured to receive input from the camera 610. Buffer 620 may be a non-transitory data buffer, e.g., separate from or part of a non-transitory data storage device (e.g., local processing and data module 70 in fig. 2), and configured to temporarily store image data. Buffer 620 may then temporarily store some or all of the received inputs. In some embodiments, buffer 620 may be configured to store one or more portions or fragments of received data (e.g., as described below in connection with fig. 9A and 9B) before, for example, performing further processing and moving the data to another component of AR architecture 600. In some embodiments, the image data collected by the camera 610 may be read into the buffer 620 as the user experiences the wearable display system 100 operating in an environment. Such image data may include images or image segments captured by the camera 610. The image data representing the image or image fragment may then be first sent to and stored in buffer 620 and sent to display 62 for visualization and presentation to a user of wearable display system 100 before processing by the local processing and data module. Alternatively or in combination, the image data may also be stored in the map database 630. Alternatively, the data may be removed from memory (e.g., local processing and data module 70 or remote data repository 74) after being stored in buffer 620. In one embodiment, buffer 620 may reside partially on a user-wearable system (e.g., local processing and data module 70) or may reside partially at a network storage location (e.g., remote data repository 74) accessible through a wired or wireless network.
The AR architecture 600 may further include one or more object identifiers 650. The object identifier may be configured to crawl (crawl) the received data and identify and/or tag objects, and then attach information to the objects by means of the map database 630, e.g., via computer vision techniques. For example, the object identifier may scan or crawl through image data or image segments stored in buffer 620 and identify objects captured in the image data (e.g., object 325 in fig. 3). Objects identified in the buffer may be marked with reference to a map database or description information may be attached thereto. The map database 630 may include various objects identified over time between captured image data and its corresponding objects (e.g., a comparison of objects identified in a first image frame with objects identified in a subsequent image frame) to generate the map database 630 or for generating an environmental map. In some embodiments, map database 630 may be populated with pre-existing environmental maps. In some embodiments, map database 630 is stored on an AR device (e.g., local processing and data module 70). In other embodiments, the AR device and map database may be connected to each other over a network (e.g., LAN, WAN, etc.) to access cloud storage (e.g., remote data repository 74).
In some embodiments, AR architecture 600 includes a pose estimation system 640 configured to execute instructions to perform a pose estimation process to determine a position and orientation of wearable computing hardware or device based in part on data stored in buffer 620 and map database 630. For example, when a user is experiencing a wearable device and operating in the world, position, location, or orientation data may be calculated from data collected by camera 610 as it is read into buffer 620. For example, based on information and a set of objects identified from the data and stored in the buffer 620, the object identifier 610 may identify (recognize) the objects 325 and extract these objects as sparse points 320 to a processor (e.g., the local processing and data module 70). In some embodiments, sparse points 320 may be extracted when one or more image segments of a given image frame are read into buffer 620 and used to estimate the pose of an AR device in the associated image frame. The estimate of the pose may be updated as additional image segments of the image frame are read into buffer 620 and used to identify additional sparse points. Optionally, in some embodiments, the pose estimation system 640 may access the map database 630 and retrieve sparse points 320 identified in previously captured image segments or image frames and compare corresponding sparse points 320 between previous and subsequent image frames as the AR device moves through 3D space, thereby tracking the movement, position, or orientation of the AR device in 3D space. For example, referring to fig. 3, the object identifier 650 may identify the sparse point 320a as the light 325a in each of a plurality of image frames. The AR device may append some descriptor information to associate sparse points 320a in one image frame with corresponding sparse points 320a in other image frames and store the information in map database 650. The object identifier 650 may be configured to identify objects for any number of sparse points 320 (e.g., 1, 2, 3, 4, etc. sparse points).
Once the object is identified, pose estimation system 640 may use this information to determine the pose of the AR device. In one embodiment, the object identifier 650 may identify sparse points corresponding to an image segment when the image segment is received, and may then identify additional sparse points when a subsequent image segment of the same image frame is received. The pose estimation system 640 may execute instructions to estimate a pose based on first identified sparse points and update the estimate by integrating subsequently identified sparse points into the estimation process. In another embodiment, the object identifier 650 may identify two sparse points 320a, 320b of two objects in a first frame (e.g., object 325a and another object shown in fig. 3), and then identify the same two sparse points in a second frame and subsequent frames (e.g., up to any number of subsequent frames may be considered), alone or in combination. Based on a comparison between sparse points in two or more frames, poses (e.g., orientation and position) within the 3D space may also be estimated or tracked through the 3D space.
In some embodiments, the accuracy of the pose estimation or the noise reduction of the pose estimation results may be based on the number of sparse points identified by object identifier 640. For example, in 3D space, the position, location, or orientation of the imaging device may be based on translational and rotational coordinates within the environment. Such coordinates may include, for example, X, Y and Z-translational coordinates or yaw, roll, pitch rotational coordinates, as described below in connection with fig. 7. In some embodiments, one sparse point extracted from an image frame may not convey the complete pose of the imaging device. However, a single sparse point may be at least one constraint for pose estimation, for example, by providing information about one or more coordinates. As the number of sparse points increases, the accuracy of the pose estimation may increase, or noise or errors in the pose estimation may decrease. For example, two sparse points may indicate X, Y locations of the imaging device in 3D space based on objects represented by the sparse points. However, the imaging device may not be able to determine its Z-position relative to the object (e.g., in front of or behind the object) or its roll-over coordinates. Thus, in some embodiments, three sparse points may be used to determine the pose, however, any number of sparse points may be used (e.g., 1, 2, 4, 5, 6, 7, 10 or more, etc.).
In some embodiments, the pose determination may be performed by a processor (e.g., local processing and data module 70) on the AR device. The extracted sparse points may be input into a pose estimation system 640 configured to perform computer vision techniques. In some embodiments, the pose estimation system may include a SLAM or V-SLAM (e.g., referring to a configuration in which the input is image/visual only) performed by the pose estimation system 640, and then the pose estimation system 640 may output a sparse point representation 670 of the world surrounding the AR device. In some embodiments, the pose estimation system 640 may be configured as a recursive bayesian estimator (e.g., a kalman filter) that performs continuous updates. However, a bayesian estimator is intended as an illustrative example of at least one method for performing pose estimation by the pose estimation system 640, other methods and processes are contemplated within the scope of the present disclosure. The system may be configured to find not only the location of various components in the world, but also the composition of the world. Pose estimation may be a building block that achieves many goals, including populating map database 630 and using data from map database 630. In other embodiments, the AR device may be connected to a processor configured to access cloud storage (e.g., remote data repository 74) over a network (e.g., LAN, WAN, etc.) to perform pose estimation.
In some embodiments, one or more remote AR devices may be configured to determine the pose of each AR device based on pose determinations of a single AR device comprising AR architecture 600. For example, one or more AR devices may be in wired or wireless communication with a first AR device comprising AR architecture 600. The first AR device may perform pose determination based on sparse points extracted from the environment, as described herein. The first AR device may also be configured to transmit an identification signal (e.g., an IR signal or other suitable medium) that may be received by one or more remote AR devices (e.g., a second AR device). In some embodiments, the second AR device may attempt to display content similar to the first AR device and receive an identification signal from the first AR device. From the recognition signal, the second AR device is able to determine (e.g., interpret or process the recognition signal) its pose relative to the first AR device without extracting sparse points and performing pose estimation on the second AR device. One non-limiting advantage of this arrangement is that by linking the two AR devices, differences in virtual content displayed on the first and second AR devices can be avoided. Another non-limiting advantage of this arrangement is that the second AR system is able to update its estimated bearing based on the identification signals received from the first AR device.
Examples of imaging device poses and coordinate System
Fig. 7 is an example of a coordinate system for the pose of the imaging device. The device 700 may have multiple degrees of freedom. The position, location, or orientation of the device 700 will change relative to the starting position 720 as the device 700 moves in a different direction. The coordinate system in fig. 7 shows three translational directions of movement (e.g., X, Y and Z directions) that may be used to measure device movement relative to a starting position 720 of the device to determine a position within the 3D space. The coordinate system in fig. 7 also shows three degrees of angular freedom (e.g., yaw, pitch, and roll) that may be used to measure the orientation of the device relative to the starting direction 720 of the device. As shown in fig. 7, the device 700 may also be moved horizontally (e.g., X-direction or Z-direction) or vertically (e.g., Y-direction). The device 700 may also tilt back and forth (e.g., pitch), turn left and right (e.g., yaw), and tilt from side to side (e.g., roll). In other embodiments, other techniques or angular representations for measuring head pose may be used, such as any other type of euler angle system.
Fig. 7 illustrates a device 700, which may be implemented, for example, as a wearable display system 100, an AR device, an imaging device, or any other device described herein. As described throughout this disclosure, the device 700 may be used to determine a pose. For example, where the device 700 is an AR device comprising the AR architecture 600 in fig. 6, the pose estimation system 640 may use image segment inputs to extract sparse points for the pose estimation process, as described above, to track the movement of the device in the X, Y or Z directions or to track angular movements of yaw, pitch, or roll.
Exemplary routines for estimating pose in 3D space
Fig. 8 is a process flow diagram of an illustrative routine for determining the pose of an imaging device (e.g., outward facing imaging system 110 of fig. 2) within a 3D space (e.g., fig. 3) within which the imaging device is moving. Routine 800 describes how sparse points may be extracted from an image frame representing a FOV (e.g., FOV 315a, 315b, 315c, or 315D) to determine one of the position, location, or orientation of an imaging device in 3D space.
At block 810, the imaging device may capture an input image of the surroundings of the AR device. For example, the imaging device may sequentially capture a plurality of image segments of the input image based on light received from the surrounding environment. This may be accomplished through various input devices (e.g., a digital camera on the AR device or a digital camera remote from the AR device). The input may be an image representing the FOV (e.g., FOV 315a, 315b, 315c, or 315 d) and include a plurality of sparse points (e.g., sparse points 320). When the imaging device captures image slices, the FOV camera, sensor, GPS, etc. may communicate information to the system including image data of the sequentially captured image slices (block 810).
At block 820, the AR device may receive an input image. In some embodiments, the AR device may sequentially receive a plurality of image segments forming part of the image captured at block 810. For example, as described above, the outward facing imaging system 110 may be a rolling shutter camera configured to sequentially scan a scene, thereby sequentially capturing a plurality of image segments as the data is captured and sequentially reading out the image data to a storage unit. This information may be stored on the user-wearable system (e.g., local processing and data module 70) or may reside in part at a network storage location (e.g., remote data repository 74) accessible via a wired or wireless network. In some embodiments, the information may be temporarily stored in a buffer included within the storage unit.
At block 830, the AR device may identify one or more sparse points based on the received image segments. For example, the object identifier may crawl through image data corresponding to received image segments and identify one or more objects (e.g., object 325). In some embodiments, identifying one or more sparse points may be based on receiving image segments corresponding to the one or more sparse points, as described below with reference to fig. 9A and 9B. The object identifier may then extract sparse points, which may be used as inputs for determining pose data (e.g., imaging device pose within 3D space). This information may then be transferred to the pose estimation process (block 840), and the AR device may accordingly utilize the pose estimation system to map the AR device through the 3D space (block 850).
In various embodiments, the routine 800 may be performed by a hardware processor (e.g., the local processing and data module 70 of FIG. 2) configured to execute instructions stored in a memory or storage unit. In other embodiments, a remote computing device (in network communication with a display device) having computer-executable instructions may cause the display device to perform aspects of routine 800.
As described above, the current pose estimation process may include delays in estimating the pose of the AR device due to the transfer of data (e.g., extracted sparse points) from the image capture device to the pose estimation system. For example, the current implementation may require that the entire image frame be transferred from the image capture device to a pose estimator (e.g., SLAM, VSLAM, or the like). Once the entire image frame is transmitted, the object identifier is allowed to recognize the sparse points and extract them to the pose estimator. Transmitting the entire image frame may be one contributor to the delay in estimating pose.
Examples of extracting sparse points from image frames
Fig. 9A and 9B schematically illustrate an example of extracting one or more sparse points from an image frame based on receiving a plurality of image segments. In some implementations, fig. 9A and 9B may also schematically illustrate an exemplary method of minimizing delays in estimating pose of an imaging device (e.g., outward facing imaging device 110 in fig. 2) through 3D space. In some embodiments, fig. 9A and 9B also schematically illustrate examples of identifying one or more sparse points of the image frame 900. In some embodiments, fig. 9A and 9B illustrate image frames as described above as the rolling shutter camera reads the image frames from the imaging device into the memory unit. The image frames 900 may be captured by an outward facing imaging system 110 configured as a progressive imaging device. The image frame may include a plurality of image segments (sometimes referred to as scan lines) 905 a-905 n that are read into a storage unit (e.g., local processing and data module 70) from an imaging device as the imaging device captures the image segments. The image segments may be arranged horizontally (as shown in fig. 9A) or vertically (not shown). Although 15 image segments are shown, the number of image segments is not limited thereto and any number of image segments 905 a-905 n may be used depending on the needs of a given application or based on the capabilities of the imaging system. In some implementations, the image segments can be lines (e.g., rows or columns) in a raster scan pattern, for example, the image segments can be rows or columns of pixels in a raster scan pattern of an image captured by outward-facing imaging device 110. The raster scan mode may be performed or implemented by a rolling shutter camera, as described throughout this disclosure.
Referring again to fig. 9A, an image frame 900 may include a plurality of image segments 905 that are sequentially captured and read into a memory unit. The image segments 905 may be combined to represent a field of view (FOV) captured by an imaging device. The image frame 900 may further include a plurality of sparse points 320, for example, as described above with reference to fig. 3. In some implementations, as shown in fig. 9A, each sparse point 320 may be generated from one or more image segments 905. For example, sparse points 320a may be generated by, and thus associated with, subset 910 of image segments 905. Thus, when an image segment is received at a storage unit, each sparse point may be identified when a subset of the image segments 905 corresponding to each given sparse point is received. For example, upon receipt of image segments 906 a-906 n at a storage unit of the AR device, sparse point 320a may be recognized by an object identifier (e.g., object identifier 650). Image segments 906a through 906n may correspond to a subset 910 of image segments 905 representing sparse points 320a. Thus, upon receiving a corresponding image segment from an image capture device (e.g., a progressive camera), the AR device is able to determine individual sparse points. The subset 910 of image segments 905 may include image segments 906a through 906n. In some implementations, the number of image segments 906 can be based on the number of image segments received sequentially in the vertical direction that are required to resolve (resolve) in the vertical direction or capture the entire sparse point. Although fig. 9B shows 7 image segments associated with sparse point 320a, this is not necessarily the case, and any number of image segments (e.g., 2, 3, 4, 5, 6, 8, 9, 10, 11, etc. image segments) may be associated with sparse point 320a to identify object 325a corresponding to sparse point 320a.
In an exemplary embodiment, sparse points 320 may be identified by implementing a circular buffer or a rolling buffer. For example, the buffer may be similar to buffer 620 in fig. 6. The buffer may be constructed as part of a memory or storage unit stored on the AR device (e.g., local processing and data module 70), or may be remote from the AR device (e.g., remote data repository 74). The buffer may be configured to receive image information from an image capture device (e.g., outward facing imaging system 110 in fig. 2). For example, as the image sensor captures each sequential image segment, the buffer may sequentially receive image data representing the image segments from the image sensor. The buffer may also be configured to store a portion of the image data for subsequent processing and recognition of the image content. In some embodiments, the buffer may be configured to store one or more image segments, wherein the number of image segments may be less than the total image frame 900. In some embodiments, the number of image segments stored in the buffer may be a predetermined number, such as the number in subset 910. In some embodiments, the buffer may alternatively or in combination be configured to store a subset 910 of image segments corresponding to sparse points. For example, referring to fig. 9B, sparse point 320a may require a 7 x 7 pixel window (e.g., rendering 7 rows of pixels of image segments 906, where each image segment includes 7 pixels). In this embodiment, the buffer may be configured to be large enough to store a subset 910 of image segments 906, such as the 7 image segments shown.
As described above, the buffer may be configured to temporarily store image data. Thus, when a new image segment is received from the imaging capture device, the older image segment is removed from the buffer. For example, a first image segment 906a may be received and a subsequent image segment corresponding to the sparse point 320a is received at the buffer. Once all the image segments 906 a-906 n are received, sparse points 320a may be identified. Subsequently, a new image segment (e.g., 906n+1) is received, thereby removing image segment 906a from the buffer. In some embodiments, segment 906a is moved from the buffer to storage in digital memory (e.g., local processing and data module 70) to perform further processing.
Exemplary routines for estimating pose in 3D space
Fig. 10 is a process flow diagram of an illustrative routine for determining the pose of an imaging device (e.g., outward facing imaging system 110 of fig. 2) within a 3D space (e.g., fig. 3) within which the imaging device is moving. Routine 1000 describes an example of how a first set of sparse points are extracted from an image frame when an image segment corresponding to a sparse point in the first set of sparse points is received. In various embodiments, the corresponding image segments may be captured before the entire image frame representing the FOV of the imaging device is captured. Routine 1000 also describes how a subsequent sparse point or a second set of sparse points may be extracted and integrated to update the pose determination. As described above, routine 1000 may be performed by a hardware processor (e.g., local processing and data module 70 in fig. 2) operatively coupled to an outward facing imaging system (e.g., outward facing imaging system 110) and a digital memory or buffer. The outward facing imaging system 110 may include a rolling shutter camera.
At block 1010, the imaging device may capture an input image of the surroundings of the AR device. For example, the imaging device may sequentially capture a plurality of image segments of the input image based on light received from the surrounding environment. This may be accomplished through various input devices (e.g., a digital camera on the AR device or a digital camera remote from the AR device). The input may be an image frame representing the FOV (e.g., FOV 315a, 315b, 315c, or 315 d) and include a plurality of sparse points (e.g., sparse points 320). When the imaging device captures image slices, the FOV camera, sensor, GPS, etc. may communicate information to the system including image data of the sequentially captured image slices (block 1010).
At block 1020, the AR device may receive an input image. In some embodiments, the AR device may sequentially receive a first plurality of image segments forming a portion of the image captured at block 1010. For example, the imaging device may be configured to sequentially scan the scene to sequentially capture the first plurality of image segments, as described above with reference to fig. 9A and 9B. The image sensor may also sequentially read out image data to the memory unit as the data is captured. This information may be stored on the user-wearable system (e.g., local processing and data module 70) or may reside in part at a network storage location (e.g., remote data repository 74) accessible via a wired or wireless network. In some embodiments, the information may be temporarily stored in a buffer included within the storage unit.
At block 1030, the AR device may identify a first set of sparse points based on receiving a first plurality of image segments (sometimes referred to as "pre-lists") corresponding to each sparse point. For example, referring to fig. 9A and 9B, the ar device may identify one or more sparse points 320 based on receiving a subset 910 (e.g., a first plurality of image segments) of image segments 905 corresponding to the one or more sparse points 320, as described above with reference to fig. 9A and 9B. Once a subset 910 of image segments 905 (e.g., image segments 906) corresponding to sparse points 320 are received at a storage unit (e.g., local processing and data module 70), sparse points 320 may be identified.
In some embodiments, the first set of sparse points includes any number of sparse points (N 1 ). Quantity (N) 1 ) May be any number of sparse points selected for estimating the pose of the AR device within the environment. In some embodiments, the number (N 1 ) May be no less than three sparse points. In other embodiments, the number (N 1 ) Between 10 and 20 sparse points. Greater number (N) 1 ) One non-limiting advantage of (c) is that outlier (outlier) data points may be rejected, thus, due to normal value (inlier) data points, pose determination with some robustness to noise may be provided. For example, the imaging device may shake (jilt) or rock due to an event given to the physical imaging device, or the recorded scene may be temporarily changed (e.g., a person moving in the foreground). The event may affect only a small set of sparse points in one or more image frames. Using a greater number (N 1 ) Updating the pose estimate according to the present description may at least partially reduce noise in the pose estimate due to these outliers or single instance events.
In one implementation, a first set of sparse points may be extracted from an image frame (e.g., by object identifier 650) and transmitted to a pose estimation system (e.g., pose estimation system 640 in fig. 6) configured to perform pose determination (e.g., SLAM, VSLAM, or the like, as described above) (block 1040). In various embodiments, the method includes identifying a number (N 1 ) Thereafter, the first set of sparse points is transmitted to a pose estimation system. Thus, the first set of sparse points may be transmitted when only a portion of the image frame is received, because the imaging device has not received the entire image frame; subsequent image segments (e.g., a second plurality of image segments obtained after the first plurality of image segments) remain to be received. In one embodiment, once each sparse point is identified based on scanning a corresponding subset of image segments, thenA first set of sparse points is extracted (e.g., from a memory unit of the AR device or a portion thereof, such as a buffer). In another embodiment, once the number (N 1 ) A first set of sparse points may be extracted (e.g., from a memory location or buffer of the AR device) and the sparse points sent through a single process.
At block 1045, the AR device may receive a second plurality of image segments (sometimes referred to as a "follow-list"). In some embodiments, after receiving the first plurality of image segments at block 1020, the AR device may also sequentially obtain a second plurality of image segments. For example, the imaging device may be configured to sequentially scan the scene to sequentially capture a first plurality of image segments (e.g., block 1020), followed by sequentially scanning the scene after block 1030 or during block 1030 to obtain a second plurality of image segments, as described above with reference to fig. 9A and 9B. In another embodiment, the second plurality of image segments, or a portion thereof, may be obtained from a second image captured by the imaging device, the second image being captured after the first image. The information may be stored on the AR device (e.g., local processing and data module 70) or may reside in part at a network storage location (e.g., remote data repository 74) accessible via a wired or wireless network. In some embodiments, the information may be temporarily stored in a buffer included within the storage unit.
Referring again to fig. 10, at block 1050, the AR device may identify a second set of sparse points based on a second plurality of image segments. For example, in one embodiment, the entire image frame has not been received before the pose was determined at block 1040, and at block 1045, a second plurality of image segments may be received from the imaging device. Thus, the AR device may recognize one or more new sparse points based on receiving a second plurality of image segments corresponding to the one or more new sparse points (e.g., a second set of sparse points), as described above with reference to fig. 9A and 9B. In another embodiment, after capturing the first image at block 1010, the imaging device may capture a second image and may obtain a second plurality of image segments from the second image. Thus, the AR device may identify one or more new sparse points based on receiving a second plurality of image segments from the second image, which may correspond to the second set of sparse points. In some embodiments, the second set of sparse points may include any number of new sparse points (e.g., 1, 2, 3, etc.). In one embodiment, the second set of sparse points may be extracted and integrated into the pose determination, for example, by transmitting the second set of sparse points to the pose estimation system. The following is an exemplary method of integrating the second set of sparse points with the first set of sparse points into the map construction routine of fig. 10. For example, the exemplary integration methods described herein may be referred to as re-integration, sliding scale integration, or block integration. However, these exemplary integration methods are not intended to be exhaustive. Other methods that can minimize errors and reduce delays in attitude determination are also possible.
At block 1060, the pose estimation system may be configured to update the pose determination based on the pose determination at block 1040 and the receipt of the second set of sparse points at block 1050.
One non-limiting advantage of the routine 1000 described above may be that delays caused by extracting sparse points from image frames are reduced prior to the pose estimation process. For example, by computing and recognizing the sparse points as image segments corresponding to the individual sparse points are received at buffer 620, individual sparse points or a selected set of sparse points may be extracted to and processed by the pose estimation system without waiting for the entire image frame to be captured. Thus, pose estimation can be performed well before the entire image is transferred to memory and before all sparse points can be extracted from the entire image. However, once the sparse points of the first and all subsequent sets of a particular image frame are extracted, the entire image frame may be used for pose estimation.
In various implementations, the second set of sparse points may include a set number of sparse points identified after the pose is determined at block 1040. In some embodiments, the set number may be one sparse point. For example, each time a subsequent sparse point is identified, the sparse point may be transmitted to the pose estimation system and a new pose estimation process is performed to update one or more of the position, location, or orientation of the AR device at block 1060. This approach may sometimes be referred to as a re-integration approach. Thus, each subsequently identified sparse point may represent a subsequent set of sparse points (e.g., a second, third, fourth, etc., set of sparse points). In another embodiment, the set number may be any number of subsequently identified sparse points (e.g., 2, 3, 4, etc.). For example, with a set number of 3, 3 new sparse points (e.g., subsequent sparse point groups) are identified at a time, the group is transmitted to the pose estimation system at block 1050, and a new pose estimation process is performed at block 1060. Thus, the pose estimation process may utilize all sparse points included in the entire image frame.
In other embodiments, the integration method may be configured to explain the rolling shutter effect described above with reference to fig. 4A through 5B. For example, one can target a fixed number (N 2 ) Is performed for the sparse points of the model. This approach may sometimes be referred to as a sliding scale integration approach. In this embodiment, the second set of sparse points may include a selected number (k) of recognitions after determining the pose at block 1040 2 ) Is a sparse point of (c). Every time a number (k 2 ) When sparse points of (a) can be identified, the pose determination can be updated. However, only the most recent N 2 The sparse points may be used to update the pose at block 1060. In some embodiments, the method utilizes the most recent N 2 Sparse points regardless of which group they correspond to. For example, if N 1 Is set to 10, N 2 Is set to 15, and k 2 Set to 5, the first set of sparse points includes the first 10 sparse points identified at block 1030. Thus, a pose is determined at block 1040 based on the first 10 sparse points. Subsequently, new sparse points are identified, but the pose is not updated. Once 5 new sparse points including a second set of sparse points are identified, the first (N) based 1 ) And a second (k) 2 ) The set of sparse points updates the pose. If a third set of sparse points (e.g., 5 sparse points after the second set) is identified, then the pose is updated again at block 1060, however, the update may be based on partial sparse points in the first set (e.g., sparse points 6-10), the second set of sparse points (e.g., sparse point 1) 1 to 15) and a third set of sparse points (e.g., sparse points 16 to 21). Thus, the integration may be regarded as a sliding window or a sliding list of sparse points, whereby only a set number of sparse points are used to estimate the pose, and the used sparse points slide from the first set to the second and third sets. One non-limiting advantage of this approach may be that sparse points identified from earlier received image segments may be deleted from the pose determination at block 1060 when they become old or stale. In some cases, if the AR device moves relative to the sparse points, the rolling shutter effect may be reduced by removing old sparse points and capturing pose changes between the identified new sparse points.
In some embodiments, the foregoing integration method may be utilized between image frames, for example, as the outward facing imaging system 110 moves between capturing an image frame of the FOV 315a of fig. 3 and capturing an image frame of the FOV 315b of fig. 3. For example, a first set of sparse points may be received from an image frame associated with a first orientation 312a (e.g., FOV 315 b) and a second set of sparse points may be received from an image frame associated with a second orientation 312b (e.g., FOV 315 b). A sliding list approach may be implemented to reduce the rolling shutter effect between these image frames. However, in some embodiments, it is not necessary to preserve the phase of the frame more than the most recent (N 2 -1) more sparse points.
In another embodiment, the pose determination at block 1060 may be performed for a fixed number of sparse points or blocks of sparse points. This approach may sometimes be referred to as a block integration approach. In some embodiments, each of the plurality of sets of sparse points may include a number of sparse points equal to the block. For example, if the block is set to 10, the fixed number of the first group (N 1 ) Is 10 and a pose is determined at block 1040 after the first set is identified and extracted. Subsequently, a second set of sparse points including the next 10 sparse points may be identified and the pose updated at block 1060 using the second set of sparse points. In some embodiments, the process may continue for multiple groups (e.g., third, fourth, fifth, etc.). In some embodiments, when an image segment is stored in a buffer (e.g., a graph6) may be selected and configured to store at least a number of sparse points that may be included in a block (e.g., in the above example, the buffer may be selected to have a size configured to store at least 10 sparse points). In some embodiments, the buffer may have a size limited to only storing the number of sparse points included in the block.
While various embodiments of methods, devices, and systems are described throughout this disclosure with reference to a head mounted display device or an AR device, this is not intended to limit the scope of the application and is used as an example for illustration purposes only. The methods and apparatus described herein may be applied to other devices, such as robots, digital cameras, and other autonomous entities, that may implement the methods and apparatus described herein to map a 3D space in which the device is located and track movement of the device through the 3D environment.
Other aspects
In a 1 st aspect, a method for estimating an orientation of an image capturing device within an environment is disclosed. The method comprises the following steps: sequentially receiving a first plurality of image segments forming at least a portion of an image representing a field of view (FOV) of the image capture device, the FOV including a portion of an environment surrounding the image capture device and including a plurality of sparse points, wherein each sparse point corresponds to a subset of image segments; identifying a first set of sparse points, the first set of sparse points comprising one or more sparse points identified when receiving the first plurality of image segments; determining, by a position estimation system, a position of the image capture device within the environment based on the first set of sparse points; sequentially receiving a second plurality of image segments received subsequent to the first plurality of image segments and forming at least another portion of the image; identifying a second set of sparse points, the second set of sparse points comprising one or more sparse points identified when receiving the second plurality of image segments; and updating, by the position estimation system, a position of the image capture device within the environment based on the first set of sparse points and the second set of sparse points.
In aspect 2, the method of aspect 1, further comprising sequentially capturing the plurality of image segments at an image sensor of the image capturing device.
In aspect 3, the method of aspect 1 or aspect 2, wherein the image sensor is a rolling shutter image sensor.
In aspect 4, the method according to any one of aspects 1 to 3, further comprising: the first plurality of image segments and the second plurality of image segments are stored in a buffer having a size corresponding to a number of image segments in the subset of image segments while the image segments are sequentially received.
In aspect 5, the method of any one of aspects 1 to 4, further comprising extracting the first set of sparse points and the second set of sparse points to the position estimation system.
In aspect 6, the method of any one of aspects 1 to 5, wherein the first set of sparse points comprises a number of sparse points.
In aspect 7, the method of aspect 6, wherein the number of sparse points is between 10 and 20 sparse points.
In aspect 8, the method of any one of aspects 1 to 7, wherein the second set of sparse points comprises a second number of sparse points.
In a 9 th aspect, the method of any one of aspects 1-8, wherein the updating the position of the image capture device is based on a number of most recently identified sparse points, wherein the most recently identified sparse points are at least one of the first set, the second set, or one or more of the first set and the second set.
In a 10 th aspect, the method of 9 th aspect, wherein the number of most recently identified sparse points is equal to the number of sparse points in the first set of sparse points.
In an 11 th aspect, the method of any one of the 1 st to 10 th aspects, wherein the position estimation system is configured to perform visual synchronous positioning and mapping (V-SLAM).
In the 12 th aspect, the method of any one of the 1 st to 11 th aspects, wherein the plurality of sparse points are extracted based on at least one of real world objects, virtual image elements, and invisible indicators projected into the environment.
In a 13 th aspect, a method for estimating an orientation of an image capturing device within an environment is disclosed. The method comprises the following steps: sequentially receiving a plurality of image segments forming an image representing a field of view (FOV) of the image capture device, the FOV including a portion of an environment surrounding the image capture device and including a plurality of sparse points, wherein each sparse point is identifiable based in part on a corresponding subset of image segments of the plurality of image segments; sequentially identifying one or more sparse points of the plurality of sparse points upon receiving each subset of image segments corresponding to the one or more sparse points; an orientation of the image capture device within the environment is estimated based on the identified one or more sparse points.
In aspect 14, the method of aspect 13, wherein sequentially receiving the plurality of image segments further comprises receiving a number of image segments and storing the number of image segments in a buffer.
In a 15 th aspect, the method of either the 13 th or 14 th aspect, wherein sequentially receiving the plurality of image segments includes receiving at least a first image segment and a second image segment, wherein the first image segment is stored in the buffer.
In aspect 16, the method according to any one of aspects 13 to 15, further comprising: updating the buffer upon receipt of a second image segment; storing the second image segment in the buffer; and removing the first image segment upon receiving the second image segment.
In aspect 17, the method of aspect 16, wherein sequentially identifying one or more sparse points further comprises scanning the image segments stored in the buffer as the buffer is updated.
In an 18 th aspect, the method of any one of the 13 th to 17 th aspects, wherein sequentially identifying one or more sparse points of the plurality of sparse points as each subset of image segments corresponding to the one or more sparse points is received further comprises: sequentially identifying a first plurality of image segments corresponding to a first set of one or more sparse points as the first set of one or more sparse points is received; and sequentially identifying a second plurality of image segments corresponding to a second set of one or more sparse points upon receipt of the second set of one or more sparse points, wherein the second plurality of image segments are received subsequent to the first plurality of image segments.
In a 19 th aspect, the method of any one of the 13 th to 18 th aspects, wherein estimating the position of the image capturing device is based on identifying one or more sparse points of the first set, wherein the first set comprises a number of sparse points.
In aspect 20, the method of aspect 19, wherein the number of sparse points is between 2 and 20.
In aspect 21, the method of aspect 19, wherein the number of sparse points is between 10 and 20.
In a 22 nd aspect, the method of any of the 13 th to 21 th aspects, further comprising updating the position of the image capturing device based on identifying the second set of one or more sparse points.
In a 23 rd aspect, the method of any one of the 13 th to 22 th aspects, wherein the second set of one or more sparse points comprises a second number of sparse points.
In a 24 th aspect, the method of any one of the 13 th to 23 th aspects, further comprising updating the orientation of the image capturing device based on identifying a number of sequentially identified sparse points.
In aspect 25, the method of aspect 24, wherein the number of sequentially identified sparse points is equal to the number of sparse points.
In a 26 th aspect, the method of 24 th aspect, wherein the number of sequentially identified sparse points includes at least one sparse point in the first set of sparse points.
In the 27 th aspect, the method of any one of the 13 th to 26 th aspects, wherein the plurality of sparse points are extracted based on at least one of real world objects, virtual image elements, and invisible indicators projected into the environment.
In aspect 28, the method according to any one of aspects 13 to 27, further comprising: extracting the sequentially identified sparse points from the buffer; the sequentially identified sparse points are sent to a visual synchronous positioning and mapping (VSLAM) system, wherein the VSLAM system estimates a position of the image capture device based on the sequentially identified one or more sparse points.
In a 29 th aspect, an Augmented Reality (AR) system is disclosed. The AR system includes an outward-facing imaging device, computer hardware, and a processor operatively coupled to the computer hardware and the outward-facing imaging device and configured to execute instructions to perform the method according to any one of aspects 1-28.
In aspect 30, the AR system of aspect 29, wherein the outward facing imaging device is configured to detect light in the invisible spectrum.
In aspect 31, the AR system of aspect 29 or aspect 30, wherein the AR system is configured to display one or more virtual image elements.
In aspect 32, according to any one of aspects 29 to 31, further comprising: a transceiver configured to send an identification signal indicative of an estimated position of the AR system to a remote AR system, wherein the remote AR system is configured to update its estimated position based on the received identification signal.
In aspect 33, an autonomous entity is disclosed. The autonomous entity includes an outward-facing imaging device, computer hardware, and a processor operatively coupled to the computer hardware and the outward-facing imaging device and configured to execute instructions to perform the method of any of aspects 1-28.
In aspect 34, the autonomous entity of aspect 33, wherein the outward facing imaging device is configured to detect light in the invisible spectrum.
In a 35 th aspect, a robotic system is disclosed. The robotic system includes an outward-facing imaging device, computer hardware, and a processor operatively coupled to the computer hardware and the outward-facing imaging device and configured to execute instructions to perform the method according to any one of aspects 1-28.
In a 36 th aspect, an image capture device for estimating an orientation of the image capture device within an environment is disclosed. The image capturing apparatus includes: an image sensor configured to capture an image by sequentially capturing a plurality of image segments, the image representing a field of view (FOV) of the image capture device, the FOV including a portion of an environment surrounding the image capture device, including a plurality of sparse points, wherein each sparse point is identifiable based in part on a corresponding subset of the plurality of image segments; a memory circuit configured to store a subset of image segments corresponding to one or more sparse points; a computer processor operably coupled to the memory circuit and configured to: sequentially identifying one or more sparse points of the plurality of sparse points upon receiving each subset of image segments corresponding to the one or more sparse points; and extracting one or more sparse points that are sequentially identified to estimate a position of the image capture device within the environment based on the one or more sparse points that are identified.
In a 37 th aspect, the image capture device of claim 36, further comprising a position estimation system configured to: receiving one or more sparse points identified sequentially; and estimating a position of the image capture device within the environment based on the identified one or more sparse points.
In aspect 38, the image capture device of aspect 36 or 37, wherein the position estimation system is a visual synchrony positioning and mapping (VSLAM) system.
In an 39 th aspect, the image capturing device according to any of the 36 th to 38 th aspects, wherein the image sensor is configured to detect light in the invisible spectrum.
In the 40 th aspect, according to any one of the 36 th to 39 th aspects, further comprising: a transceiver configured to transmit an identification signal indicative of its estimated position to a remote image capture device, wherein the remote image capture device is configured to update its estimated position based on the received identification signal.
Other considerations
Each of the processes, methods, and algorithms described herein and/or depicted in the accompanying figures may be embodied in, and fully or partially automated by, one or more physical computing systems, hardware computer processors, special purpose circuits, and/or code modules that are executed by electronic hardware configured to execute specific and particular computer instructions. For example, a computing system may include a general purpose computer (e.g., a server) or special purpose computer programmed with specific computer instructions, dedicated circuitry, and so forth. The code modules may be compiled and linked into an executable program, installed in a dynamically linked library, or written in an interpreted programming language. In some implementations, certain operations and methods may be performed by circuitry that is specific to a given function.
Moreover, certain embodiments of the functionality of the present disclosure are sufficiently complex mathematically, computationally, or technically such that dedicated hardware or one or more physical computing devices (utilizing appropriate dedicated executable instructions) or dedicated graphics processing units may be required to perform the functionality, e.g., due to the amount or complexity of the computations involved or in order to provide results, e.g., pose estimation inputs, in substantially real time. For example, video may include many frames of millions of pixels each, and specially programmed computer hardware is required to process video data to provide a desired image processing task or application in a commercially reasonable amount of time.
The code modules or any type of data may be stored on any type of non-transitory computer readable medium, such as physical computer memory, including hard drives, solid state memory, random Access Memory (RAM), read Only Memory (ROM), optical disks, volatile or non-volatile memory, combinations of the same, and/or the like. The methods and modules (or data) may also be transmitted as a generated data signal (e.g., as part of a carrier wave or other analog or digital propagated signal) on a variety of computer-readable transmission media, including both wireless-based and wire/cable-based media, and may take a variety of forms (e.g., as part of a single or multiplexed analog signal, or as multiple discrete digital packets or frames). The results of the disclosed processes or process steps may be stored permanently or otherwise in any type of non-transitory, tangible computer memory or may be transmitted via a computer-readable transmission medium.
Any process, block, state, step, or function in the flowcharts described herein and/or depicted in the figures should be understood to potentially represent code modules, code segments, or code portions that include one or more executable instructions for implementing the specified functions (e.g., logic or arithmetic). A process, block, state, step, or function may be combined, rearranged, added, deleted, modified, or otherwise altered with the illustrative examples provided herein. In some embodiments, additional or different computing systems or code modules may perform some or all of the functions described herein. The methods and processes described herein are also not limited to any particular order, and the blocks, steps, or states associated therewith may be performed in any other order, such as serially, in parallel, or in some other manner, as appropriate. Tasks or events may be added to or removed from the disclosed example embodiments. Furthermore, the separation of various system components in the embodiments described herein is for illustrative purposes and should not be understood as requiring such separation in all embodiments. It should be appreciated that the described program components, methods, and systems may generally be integrated together in a single computer product or packaged into multiple computer products. Many implementation variations are possible.
The processes, methods, and systems may be implemented in a network (or distributed) computing environment. Network environments include enterprise-wide computer networks, intranets, local Area Networks (LANs), wide Area Networks (WANs), personal Area Networks (PANs), cloud computing networks, crowd-sourced computing networks, the internet, and the world wide web. The network may be a wired or wireless network or any other type of communication network.
The systems and methods of the present disclosure each have several inventive aspects, no single one of which is solely responsible for or required by the desirable attributes disclosed herein. The various features and processes described above may be used independently of each other or may be combined in various ways. All possible combinations and subcombinations are intended to fall within the scope of this disclosure. Various modifications to the embodiments described in the disclosure may be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the claims are not intended to be limited to the embodiments shown herein but are to be accorded the widest scope consistent with the disclosure, principles and novel features disclosed herein.
Certain features that are described in this specification in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination. No single feature or group of features is essential or necessary to every embodiment.
Unless specifically stated otherwise, or understood in the context of use, conditional language such as "capable," "probable," "may," "for example," and the like as used herein are generally intended to express that certain embodiments include but other embodiments do not include certain features, elements, and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include a means for deciding, with or without author input or prompting, whether such features, elements and/or steps are included or are to be performed in any particular embodiment. The terms "comprising," "including," "having," and the like are synonymous and are used inclusively in an open-ended fashion, and do not exclude additional elements, features, acts, operations, etc. Furthermore, the term "or" is used in its inclusive sense (rather than its exclusive sense) such that when used in connection with a list of elements, for example, the term "or" means one, some, or all of the elements in the list. In addition, the articles "a," "an," and "the" as used in this disclosure and the appended claims should be construed to mean "one or more" or "at least one" unless otherwise indicated.
As used herein, a phrase referring to "at least one of" a list of items refers to any combination of these items, including individual members. As an example, "at least one of a, B, or C" is intended to encompass: A. b, C, A and B, A and C, B and C, A, B and C. Unless specifically stated otherwise, a connection language such as the phrase "at least one of X, Y and Z" is understood as a usage context as used, generally used to express items, terms, etc. may be at least one of X, Y or Z. Thus, such connection language is not generally intended to imply that certain embodiments require at least one of X, at least one of Y, and at least one of Z to be present.
Similarly, although operations may be illustrated in a particular order in the figures, it should be understood that such operations need not be performed in the particular order illustrated or in sequential order, or that all illustrated operations need not be performed, to achieve desirable results. Furthermore, the figures may schematically depict one or more example processes in the form of a flow chart. However, other operations not shown may be incorporated into the exemplary methods and processes schematically illustrated. For example, one or more additional operations may be performed before, after, concurrently with, or between any of the illustrated operations. Additionally, in other embodiments, operations may be rearranged or reordered. In certain situations, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products. In addition, other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results.

Claims (5)

1. An imaging system, comprising:
an image capturing device configured to:
sequentially capturing a first plurality of image slices of an image representing a field of view FOV of the image capture device, the first plurality of image slices forming less than the entire image, the FOV including a plurality of sparse points, and
sequentially capturing a second plurality of image segments captured subsequent to the first plurality of image segments and forming at least another portion of the image;
a hardware processor programmed to:
a first set of sparse points is identified based in part on the first plurality of image segments,
at least one of a position or an orientation of the image capture device within an environment of the imaging system is determined based on the first set of sparse points,
a second set of sparse points is identified based in part on the second plurality of image segments,
updating the at least one of the position or orientation of the image capture device within the environment based at least in part on a rolling set of sparse points, the rolling set of sparse points comprising a predetermined number of most recently recognized sparse points selected first from the second set of sparse points and then from the first set of sparse points,
Identifying a third set of sparse points based in part on a third plurality of image segments, an
The at least one of the position or orientation of the image capture device within the environment is updated based at least in part on an updated rolling set of sparse points including the predetermined number of most recently recognized sparse points selected first from the third set of sparse points, then from the second set of sparse points, then from the first set of sparse points.
2. The imaging system of claim 1, wherein the image capture device comprises a rolling shutter image sensor.
3. The imaging system of claim 1, further comprising a non-transitory buffer configured to sequentially receive the first plurality of image segments and the second plurality of image segments as the image segments are captured by the image capture device, the non-transitory buffer having a storage capacity based at least in part on a number of image segments included in the first image segment or the second plurality of image segments.
4. The imaging system of claim 1, wherein the first set of sparse points or the second set of sparse points comprises a number of sparse points between 10 sparse points and 20 sparse points.
5. The imaging system of claim 1, wherein the predetermined number of the most recently identified sparse points is equal to a number of sparse points in the first set of sparse points.
CN202310724682.XA 2016-06-30 2017-05-17 Estimating pose in 3D space Pending CN116777994A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201662357285P 2016-06-30 2016-06-30
US62/357,285 2016-06-30
CN201780053000.XA CN109643373B (en) 2016-06-30 2017-05-17 Estimating pose in 3D space
PCT/US2017/033139 WO2018004863A1 (en) 2016-06-30 2017-05-17 Estimating pose in 3d space

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201780053000.XA Division CN109643373B (en) 2016-06-30 2017-05-17 Estimating pose in 3D space

Publications (1)

Publication Number Publication Date
CN116777994A true CN116777994A (en) 2023-09-19

Family

ID=60785196

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202310724682.XA Pending CN116777994A (en) 2016-06-30 2017-05-17 Estimating pose in 3D space
CN201780053000.XA Active CN109643373B (en) 2016-06-30 2017-05-17 Estimating pose in 3D space

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201780053000.XA Active CN109643373B (en) 2016-06-30 2017-05-17 Estimating pose in 3D space

Country Status (10)

Country Link
US (3) US10163011B2 (en)
EP (1) EP3479160B1 (en)
JP (3) JP7011608B2 (en)
KR (2) KR20210107185A (en)
CN (2) CN116777994A (en)
AU (2) AU2017291131B2 (en)
CA (1) CA3029541A1 (en)
IL (2) IL280983B (en)
NZ (1) NZ749449A (en)
WO (1) WO2018004863A1 (en)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210107185A (en) 2016-06-30 2021-08-31 매직 립, 인코포레이티드 Estimating pose in 3d space
CN108428242B (en) 2017-02-15 2022-02-08 宏达国际电子股份有限公司 Image processing apparatus and method thereof
US10048753B1 (en) * 2017-04-20 2018-08-14 Robert C. Brooks Perspective or gaze based visual identification and location system
CN116203731A (en) 2017-05-01 2023-06-02 奇跃公司 Matching of content to a spatial 3D environment
US10621751B2 (en) * 2017-06-16 2020-04-14 Seiko Epson Corporation Information processing device and computer program
CN108229290B (en) * 2017-07-26 2021-03-02 北京市商汤科技开发有限公司 Video object segmentation method and device, electronic equipment and storage medium
US20190057180A1 (en) * 2017-08-18 2019-02-21 International Business Machines Corporation System and method for design optimization using augmented reality
WO2019126238A1 (en) 2017-12-22 2019-06-27 Magic Leap, Inc. Methods and system for managing and displaying virtual content in a mixed reality system
US10970425B2 (en) 2017-12-26 2021-04-06 Seiko Epson Corporation Object detection and tracking
US12008465B1 (en) 2017-12-29 2024-06-11 Perceive Corporation Dynamic generation of data sets for training machine-trained network
CN108227929B (en) * 2018-01-15 2020-12-11 廖卫东 Augmented reality lofting system based on BIM technology and implementation method
CA3089646A1 (en) 2018-02-22 2019-08-20 Magic Leap, Inc. Browser for mixed reality systems
CN111801641A (en) 2018-02-22 2020-10-20 奇跃公司 Object creation with physical manipulation
CN112219205B (en) 2018-06-05 2022-10-25 奇跃公司 Matching of content to a spatial 3D environment
US11624909B2 (en) 2018-06-18 2023-04-11 Magic Leap, Inc. Head-mounted display systems with power saving functionality
US11694435B2 (en) 2018-06-18 2023-07-04 Magic Leap, Inc. Systems and methods for temporarily disabling user control interfaces during attachment of an electronic device
EP3807710B1 (en) 2018-06-18 2024-01-17 Magic Leap, Inc. Augmented reality display with frame modulation functionality
US11103763B2 (en) 2018-09-11 2021-08-31 Real Shot Inc. Basketball shooting game using smart glasses
US11141645B2 (en) 2018-09-11 2021-10-12 Real Shot Inc. Athletic ball game using smart glasses
EP3853765A1 (en) * 2018-09-27 2021-07-28 Google LLC Training a deep neural network model to generate rich object-centric embeddings of robotic vision data
US10764558B2 (en) * 2018-09-27 2020-09-01 Valve Corporation Reduced bandwidth stereo distortion correction for fisheye lenses of head-mounted displays
US11544320B2 (en) * 2018-11-29 2023-01-03 Entigenlogic Llc Image processing utilizing an entigen construct
US11361511B2 (en) * 2019-01-24 2022-06-14 Htc Corporation Method, mixed reality system and recording medium for detecting real-world light source in mixed reality
EP3948747A4 (en) 2019-04-03 2022-07-20 Magic Leap, Inc. Managing and displaying webpages in a virtual three-dimensional space with a mixed reality system
CN112013844B (en) * 2019-05-31 2022-02-11 北京小米智能科技有限公司 Method and device for establishing indoor environment map
US10916062B1 (en) * 2019-07-15 2021-02-09 Google Llc 6-DoF tracking using visual cues
JP7327083B2 (en) * 2019-10-30 2023-08-16 富士通株式会社 Region clipping method and region clipping program
US11367212B2 (en) 2019-11-21 2022-06-21 Ford Global Technologies, Llc Vehicle pose detection with fiducial marker
CN115023743A (en) * 2020-02-13 2022-09-06 Oppo广东移动通信有限公司 Surface detection and tracking in augmented reality sessions based on sparse representations
CN112991414A (en) * 2021-02-07 2021-06-18 浙江欣奕华智能科技有限公司 Vslam feature point depth determination device

Family Cites Families (68)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1073936C (en) * 1990-04-20 2001-10-31 佳能株式会社 Rocording apparatus
US6222525B1 (en) 1992-03-05 2001-04-24 Brad A. Armstrong Image controllers with sheet connected sensors
US5670988A (en) 1995-09-05 1997-09-23 Interlink Electronics, Inc. Trigger operated electronic device
CA2478671C (en) 2002-03-13 2011-09-13 Imax Corporation Systems and methods for digitally re-mastering or otherwise modifying motion pictures or other image sequences data
USD514570S1 (en) 2004-06-24 2006-02-07 Microsoft Corporation Region of a fingerprint scanning device with an illuminated ring
US7460730B2 (en) 2005-08-04 2008-12-02 Microsoft Corporation Video registration and image sequence stitching
US8696113B2 (en) 2005-10-07 2014-04-15 Percept Technologies Inc. Enhanced optical and perceptual digital eyewear
US11428937B2 (en) 2005-10-07 2022-08-30 Percept Technologies Enhanced optical and perceptual digital eyewear
US20070081123A1 (en) 2005-10-07 2007-04-12 Lewis Scott W Digital eyewear
US7925049B2 (en) * 2006-08-15 2011-04-12 Sri International Stereo-based visual odometry method and system
NO327279B1 (en) 2007-05-22 2009-06-02 Metaio Gmbh Camera position estimation device and method for augmented reality imaging
WO2009078056A1 (en) * 2007-12-14 2009-06-25 Fujitsu Limited Moving object detecting apparatus and moving object detecting program
JP2011043419A (en) * 2009-08-21 2011-03-03 Sony Corp Information processor, information processing method, and program
US8345984B2 (en) 2010-01-28 2013-01-01 Nec Laboratories America, Inc. 3D convolutional neural networks for automatic human action recognition
JP2011192141A (en) * 2010-03-16 2011-09-29 Sony Corp Moving body detecting device and moving body detection method and program
JP2013141049A (en) * 2010-03-24 2013-07-18 Hitachi Ltd Server and terminal utilizing world coordinate system database
WO2012063469A1 (en) * 2010-11-11 2012-05-18 パナソニック株式会社 Image processing device, image processing method and program
US9304319B2 (en) 2010-11-18 2016-04-05 Microsoft Technology Licensing, Llc Automatic focus improvement for augmented reality displays
CN102129708A (en) * 2010-12-10 2011-07-20 北京邮电大学 Fast multilevel imagination and reality occlusion method at actuality enhancement environment
US10156722B2 (en) 2010-12-24 2018-12-18 Magic Leap, Inc. Methods and systems for displaying stereoscopy with a freeform optical system with addressable focus for virtual and augmented reality
CN107179607B (en) 2010-12-24 2019-12-13 奇跃公司 ergonomic head-mounted display device and optical system
JP6316186B2 (en) 2011-05-06 2018-04-25 マジック リープ, インコーポレイテッドMagic Leap,Inc. Wide-area simultaneous remote digital presentation world
EP2760363A4 (en) 2011-09-29 2015-06-24 Magic Leap Inc Tactile glove for human-computer interaction
RU2017115669A (en) 2011-10-28 2019-01-28 Мэджик Лип, Инк. SYSTEM AND METHOD FOR ADDITIONAL AND VIRTUAL REALITY
KR102440195B1 (en) 2011-11-23 2022-09-02 매직 립, 인코포레이티드 Three dimensional virtual and augmented reality display system
WO2013152205A1 (en) 2012-04-05 2013-10-10 Augmented Vision Inc. Wide-field of view (fov) imaging devices with active foveation capability
US9671566B2 (en) 2012-06-11 2017-06-06 Magic Leap, Inc. Planar waveguide apparatus with diffraction element(s) and system employing same
CN107817556B (en) 2012-06-11 2020-01-31 奇跃公司 Multi-depth planar three-dimensional display using a waveguide reflector array projector
US9740006B2 (en) 2012-09-11 2017-08-22 Magic Leap, Inc. Ergonomic head mounted display device and optical system
US9996150B2 (en) 2012-12-19 2018-06-12 Qualcomm Incorporated Enabling augmented reality using eye gaze tracking
EP2946236B1 (en) 2013-01-15 2021-06-16 Magic Leap, Inc. Ultra-high resolution scanning fiber display
US9503653B2 (en) 2013-02-18 2016-11-22 Tsinghua University Method for determining attitude of star sensor based on rolling shutter imaging
KR102270699B1 (en) 2013-03-11 2021-06-28 매직 립, 인코포레이티드 System and method for augmented and virtual reality
US9183746B2 (en) * 2013-03-12 2015-11-10 Xerox Corporation Single camera video-based speed enforcement system with a secondary auxiliary RGB traffic camera
US9898866B2 (en) 2013-03-13 2018-02-20 The University Of North Carolina At Chapel Hill Low latency stabilization for head-worn displays
NZ751593A (en) 2013-03-15 2020-01-31 Magic Leap Inc Display system and method
CN103247045B (en) * 2013-04-18 2015-12-23 上海交通大学 A kind of method obtaining artificial scene principal direction and image border from multi views
US20140323148A1 (en) * 2013-04-30 2014-10-30 Qualcomm Incorporated Wide area localization from slam maps
US10262462B2 (en) 2014-04-18 2019-04-16 Magic Leap, Inc. Systems and methods for augmented and virtual reality
US9874749B2 (en) * 2013-11-27 2018-01-23 Magic Leap, Inc. Virtual and augmented reality systems and methods
US20140380249A1 (en) 2013-06-25 2014-12-25 Apple Inc. Visual recognition of gestures
US9514571B2 (en) * 2013-07-25 2016-12-06 Microsoft Technology Licensing, Llc Late stage reprojection
US9646384B2 (en) 2013-09-11 2017-05-09 Google Technology Holdings LLC 3D feature descriptors with camera pose information
WO2015036056A1 (en) * 2013-09-16 2015-03-19 Metaio Gmbh Method and system for determining a model of at least part of a real object
EP2851868A1 (en) * 2013-09-20 2015-03-25 ETH Zurich 3D Reconstruction
US20150092048A1 (en) 2013-09-27 2015-04-02 Qualcomm Incorporated Off-Target Tracking Using Feature Aiding in the Context of Inertial Navigation
KR102462848B1 (en) 2013-10-16 2022-11-03 매직 립, 인코포레이티드 Virtual or augmented reality headsets having adjustable interpupillary distance
NZ755272A (en) 2013-11-27 2020-05-29 Magic Leap Inc Virtual and augmented reality systems and methods
US9857591B2 (en) 2014-05-30 2018-01-02 Magic Leap, Inc. Methods and system for creating focal planes in virtual and augmented reality
AU2013407879B2 (en) * 2013-12-19 2017-08-10 Apple Inc. Slam on a mobile device
US20150193971A1 (en) * 2014-01-03 2015-07-09 Motorola Mobility Llc Methods and Systems for Generating a Map including Sparse and Dense Mapping Information
WO2015117039A1 (en) 2014-01-31 2015-08-06 Magic Leap, Inc. Multi-focal display system and method
EP3100099B1 (en) 2014-01-31 2020-07-01 Magic Leap, Inc. Multi-focal display system and method
US10203762B2 (en) * 2014-03-11 2019-02-12 Magic Leap, Inc. Methods and systems for creating virtual and augmented reality
WO2015161307A1 (en) * 2014-04-18 2015-10-22 Magic Leap, Inc. Systems and methods for augmented and virtual reality
US9652893B2 (en) 2014-04-29 2017-05-16 Microsoft Technology Licensing, Llc Stabilization plane determination based on gaze location
US20150324568A1 (en) 2014-05-09 2015-11-12 Eyefluence, Inc. Systems and methods for using eye signals with secure mobile communications
USD759657S1 (en) 2014-05-19 2016-06-21 Microsoft Corporation Connector with illumination region
WO2015184413A1 (en) 2014-05-30 2015-12-03 Magic Leap, Inc. Methods and systems for generating virtual content display with a virtual or augmented reality apparatus
USD752529S1 (en) 2014-06-09 2016-03-29 Comcast Cable Communications, Llc Electronic housing with illuminated region
US10484697B2 (en) * 2014-09-09 2019-11-19 Qualcomm Incorporated Simultaneous localization and mapping for video coding
US9940533B2 (en) * 2014-09-30 2018-04-10 Qualcomm Incorporated Scanning window for isolating pixel values in hardware for computer vision operations
CN104463842A (en) * 2014-10-23 2015-03-25 燕山大学 Automobile accident process reappearing method based on motion vision
GB2532194A (en) * 2014-11-04 2016-05-18 Nokia Technologies Oy A method and an apparatus for automatic segmentation of an object
USD758367S1 (en) 2015-05-14 2016-06-07 Magic Leap, Inc. Virtual reality headset
USD805734S1 (en) 2016-03-04 2017-12-26 Nike, Inc. Shirt
USD794288S1 (en) 2016-03-11 2017-08-15 Nike, Inc. Shoe with illuminable sole light sequence
KR20210107185A (en) 2016-06-30 2021-08-31 매직 립, 인코포레이티드 Estimating pose in 3d space

Also Published As

Publication number Publication date
US20180005034A1 (en) 2018-01-04
EP3479160B1 (en) 2024-07-24
KR20190026762A (en) 2019-03-13
US11765339B2 (en) 2023-09-19
AU2017291131A1 (en) 2019-01-17
JP2023175052A (en) 2023-12-08
JP2022051761A (en) 2022-04-01
AU2022204584A1 (en) 2022-07-21
KR20210107185A (en) 2021-08-31
NZ749449A (en) 2023-06-30
US20220101004A1 (en) 2022-03-31
JP7576054B2 (en) 2024-10-30
IL280983B (en) 2022-07-01
US20190087659A1 (en) 2019-03-21
IL263872A (en) 2019-01-31
US10163011B2 (en) 2018-12-25
CN109643373B (en) 2023-06-27
JP7011608B2 (en) 2022-01-26
IL263872B (en) 2021-02-28
EP3479160A4 (en) 2020-03-25
AU2017291131B2 (en) 2022-03-31
CN109643373A (en) 2019-04-16
CA3029541A1 (en) 2018-01-04
KR102296267B1 (en) 2021-08-30
EP3479160A1 (en) 2019-05-08
US11200420B2 (en) 2021-12-14
WO2018004863A1 (en) 2018-01-04
JP2019522851A (en) 2019-08-15
IL280983A (en) 2021-04-29

Similar Documents

Publication Publication Date Title
CN109643373B (en) Estimating pose in 3D space
US10628675B2 (en) Skeleton detection and tracking via client-server communication
CN108139204B (en) Information processing apparatus, method for estimating position and/or orientation, and recording medium
CN105814611B (en) Information processing apparatus and method, and non-volatile computer-readable storage medium
US11086395B2 (en) Image processing apparatus, image processing method, and storage medium
EP1960970B1 (en) Stereo video for gaming
JP6456347B2 (en) INSITU generation of plane-specific feature targets
CN108492316A (en) A kind of localization method and device of terminal
WO2004088348A1 (en) Eye tracking system and method
CN109453517B (en) Virtual character control method and device, storage medium and mobile terminal
US10838515B1 (en) Tracking using controller cameras
CN112207821B (en) Target searching method of visual robot and robot
US20180227601A1 (en) Client-server communication for live filtering in a camera view
CN110520904B (en) Display control device, display control method, and program
CN110969706B (en) Augmented reality device, image processing method, system and storage medium thereof
EP3805899A1 (en) Head mounted display system and scene scanning method thereof
WO2021065607A1 (en) Information processing device and method, and program
WO2023149125A1 (en) Information processing device and information processing method
KR20230067311A (en) Method for processing image for 3D avatar movement in 3D modeling space
JP2001175860A (en) Device, method, and recording medium for three- dimensional body recognition including feedback process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination