DE102016123149A1

DE102016123149A1 - IMAGE-DATA-BASED RECONSTRUCTION OF THREE-DIMENSIONAL SURFACES

Info

Publication number: DE102016123149A1
Application number: DE102016123149.5A
Authority: DE
Inventors: Lars Dornheim; Sebastian Heerwald; Marc Andreas Mörig
Original assignee: Dornheim Medical Images GmbH
Current assignee: Dornheim Medical Images GmbH
Priority date: 2016-11-30
Filing date: 2016-11-30
Publication date: 2018-05-30

Abstract

Es werden ein computer-implementiertes Verfahren, Speichermedium und ein zugehöriges Computersystem bereitgestellt zur Berechnung von Daten, die eine dreidimensionale Oberfläche repräsentieren, unter Verwendung von Bildern der Oberfläche. Eine Serie von Bildern der Oberfläche wird empfangen, wobei die Bilder zeitlich aufeinanderfolgend aufgenommen wurden und wobei jeweils zeitlich benachbarte Bilder einen überlappenden Oberflächenbereich abbilden. Bildmerkmale werden in wenigstens einem Teil der empfangenen Bilder erkannt und von Bild zu Bild verfolgt. Dann wird unter Verwendung der verfolgten Bildmerkmale für jedes Bild des wenigstens einen Teils der empfangenen Bilder eine Kameraposition und Kameraausrichtung in einem dreidimensionalen globalen Koordinatensystem berechnet. Schliesslich werden unter Verwendung der berechneten Kamerapositionen und Kameraausrichtungen Bildpunkte aus Bildern des wenigstens einen Teils der empfangenen Bilder in Koordinaten des dreidimensionalen globalen Koordinatensystems transformiert. Die Rekonstruktion eines Bewegungsverlaufes der Kamera ist ebenfalls möglich.A computer-implemented method, storage medium, and associated computer system are provided for computing data representing a three-dimensional surface using images of the surface. A series of images of the surface are received, wherein the images were taken in temporal succession and wherein each temporally adjacent images map an overlapping surface area. Image features are recognized in at least a portion of the received images and tracked from image to image. Then, using the tracked image features for each image of the at least a portion of the received images, a camera position and camera orientation in a three-dimensional global coordinate system is calculated. Finally, using the calculated camera positions and camera orientations, pixels are transformed from images of the at least a portion of the received images into coordinates of the three-dimensional global coordinate system. The reconstruction of a movement of the camera is also possible.

Description

Gebiet der ErfindungField of the invention

Die Erfindung betrifft computer-implementierte Verfahren und zugehörige Speichermedien und Computersysteme zur Berechnung von Daten, die eine dreidimensionale Oberfläche repräsentieren, unter Verwendung von Bildern der Oberfläche. Die Erfindung ist besonders vorteilhaft im Bereich der Medizintechnik.The invention relates to computer-implemented methods and associated storage media and computer systems for calculating data representing a three-dimensional surface using images of the surface. The invention is particularly advantageous in the field of medical technology.

Hintergrundbackground

In bildgebenden Verfahren der Medizintechnik, aber auch in anderen Bereichen, entstehen große Mengen an Bildmaterial. Wird beispielsweise eine endoskopische Untersuchung vorgenommen, erhält der Endoskopierende auf dem Bildschirm eine visuelle Darstellung des Innenraums des untersuchten Organs oder Hohlraums. Die dargestellte Oberfläche, beispielsweise die Innenwand eines organischen oder technischen Hohlraums, kann dann untersucht oder sogar manipuliert werden. Der Endoskopierende kann dabei aber nur diejenige Bildinformation nutzen, die aktuell von der Endoskopkamera erfasst und auf dem Bildschirm angezeigt wird. Zwar können die Bilder auch aufgezeichnet werden, so dass der Endoskopierende die Bilder während oder nach der Endoskopie erneut betrachten kann; es bleibt aber auch dann schwierig, dem Bildmaterial mehr als nur visuelle Informationen zu entnehmen.Imaging processes in medical technology, but also in other areas, generate large quantities of visual material. If, for example, an endoscopic examination is performed, the endoscopist receives on the screen a visual representation of the interior of the examined organ or cavity. The illustrated surface, for example the inner wall of an organic or technical cavity, can then be examined or even manipulated. However, the endoscopist can only use that image information that is currently captured by the endoscope camera and displayed on the screen. Although the images may also be recorded so that the endoscopist may review the images during or after endoscopy; however, it remains difficult to extract more than just visual information from the footage.

Die beschriebene endoskopische Anwendung ist nur eines von vielen Beispielen, in denen es wünschenswert wäre, anfallendes Bildmaterial einer computergestützten Auswertung zuzuführen. Wenn es beispielsweise möglich wäre, aus solchem Bildmaterial Daten zu gewinnen, die eine abgebildete Oberfläche repräsentieren, so könnte die auf diese Weise datentechnisch repräsentierte Oberfläche am Computer angezeigt oder vermessen werden, vollautomatisch oder benutzergeführt. Die Daten liessen sich auch zur Fertigung von zu der Oberfläche paßgenauen Formen verwenden.The endoscopic application described is only one of many examples in which it would be desirable to supply the resulting image material to a computer-aided evaluation. For example, if it were possible to obtain data from such imagery representing a mapped surface, the computer-assisted surface represented in this manner could be displayed or measured on the computer, fully automatically or user-guided. The data could also be used for the production of forms that were perfectly fitting to the surface.

Im Allgemeinen enthalten Bilddaten (z. B. in Form von Einzelbildern oder Videos), die mit optischen Aufnahmegeräten (z. B. Kameras) erfasst oder anderweitig generiert werden, visuelle Eigenschaften der abgebildeten Oberflächen (z. B. Textur, Farbe), wodurch diese mit entsprechend geeigneten visuellen Ausgabegeräten für Menschen (räumlich) wahrnehmbar bleiben. Die der ursprünglichen Oberfläche anhaftenden Tiefeninformationen stehen jedoch für nachfolgende Prozessschritte, die räumliche Informationen erfordern (z. B. die Herstellung physikalischer Modelle, Vermessung, Dokumentation), nicht mehr zur Verfügung. Dies erschwert die obengenannte computergestützte Auswertung erheblich.In general, image data (eg, in the form of still images or video) captured or otherwise generated by optical recorders (eg, cameras) contain visual characteristics of the imaged surfaces (eg, texture, color) these with appropriate visual output devices for people (spatially) perceptible. However, the depth information attached to the original surface is no longer available for subsequent process steps that require spatial information (eg, the production of physical models, surveying, documentation). This makes the above-mentioned computer-aided evaluation considerably more difficult.

Im Fall von monokularen Bilddaten (Bilder einer Einzelkamera) wird für den Betrachter die Tiefeninformation häufig nur durch die Bewegung der Kamera während der Aufnahme verständlich. Auch bei Betrachtung stereoskopischer Bilddaten entsteht die ursprüngliche Oberfläche nur mental beim Betrachter. Objektive Aussagen zur Räumlichkeit einer Oberfläche, die z. B. mit einer (stereoskopischen) Kamera bei deren Bewegung entlang eines Pfades erfasst worden ist, sind damit nicht mehr möglich.In the case of monocular image data (images of a single camera), the depth information for the viewer is often understood only by the movement of the camera during the recording. Even when viewing stereoscopic image data, the original surface only arises mentally in the viewer. Objective statements on the spatiality of a surface, the z. B. with a (stereoscopic) camera has been detected in their movement along a path, are no longer possible.

Überblick über die ErfindungOverview of the invention

Der Erfindung liegt daher die Aufgabe zugrunde, ein computer-implementiertes Verfahren und zugehöriges Speichermedium und Computersystem bereitzustellen, das es gestattet, Daten, die eine dreidimensionale Oberfläche repräsentieren, zuverlässig und robust zu berechnen.The invention is therefore based on the object to provide a computer-implemented method and associated storage medium and computer system that allows to reliably and robustly calculate data representing a three-dimensional surface.

In einer Ausgestaltung betrifft die Erfindung ein computer-implementiertes Verfahren zur Berechnung von Daten, die eine dreidimensionale Oberfläche repräsentieren, unter Verwendung von Bildern der Oberfläche. Das Verfahren umfasst das Empfangen einer Serie von Bildern der Oberfläche, wobei die Bilder zeitlich aufeinanderfolgend aufgenommen wurden und wobei jeweils zeitlich benachbarte Bilder einen überlappenden Oberflächenbereich abbilden. Ferner umfasst das Verfahren das Erkennen von Bildmerkmalen in wenigstens einem Teil der empfangenen Bilder und das Verfolgen der erkannten Bildmerkmale von Bild zu Bild. In dem erfindungsgemäßen Verfahren wird dann unter Verwendung der verfolgten Bildmerkmale für jedes Bild des wenigstens einen Teils der empfangenen Bilder eine Kameraposition und Kameraausrichtung in einem dreidimensionalen globalen Koordinatensystem berechnet. Ferner umfasst das Verfahren das Transformieren von Bildpunkten aus Bildern des wenigstens einen Teils der empfangenen Bilder in Koordinaten des dreidimensionalen globalen Koordinatensystems unter Verwendung der berechneten Kamerapositionen und Kameraausrichtungen.In one embodiment, the invention relates to a computer-implemented method for computing data representing a three-dimensional surface using images of the surface. The method comprises receiving a series of images of the surface, wherein the images were taken in temporal succession and wherein each temporally adjacent images map an overlapping surface area. Further, the method includes recognizing image features in at least a portion of the received images and tracking the recognized image features from image to image. In the method according to the invention, a camera position and camera orientation in a three-dimensional global coordinate system is then calculated using the tracked image features for each image of the at least a portion of the received images. Further, the method includes transforming pixels from images of the at least a portion of the received images into coordinates of the three-dimensional global coordinate system using the calculated camera positions and camera orientations.

In einer Ausgestaltung der Erfindung wird die Serie von Bildern der Oberfläche zum Zeitpunkt der Bildaufnahme in Echtzeit empfangen. In einer anderen Ausgestaltung umfasst der Schritt des Empfangens der Serie von Bildern der Oberfläche das Auslesen archivierter oder anderweitig gespeicherter Bilddaten. In one embodiment of the invention, the series of images of the surface are received in real time at the time of image acquisition. In another embodiment, the step of receiving the series of images of the surface comprises reading out archived or otherwise stored image data.

Die Bilder der Serie sind vorzugsweise monokulare Bilder, wobei das Verfahren dann ferner das Berechnen einer Tiefenkarte mittels einer Bundle-Adjustment-Technik umfassen kann.The images of the series are preferably monocular images, the method then further comprising calculating a depth map using a bundle adjustment technique.

In einer ebenso bevorzugten Ausgestaltung der Erfindung sind die Bilder der Serie stereoskopische Bilder, wobei das Verfahren dann ferner das Berechnen einer Tiefenkarte mittels einer Blockmatching-Technik umfassen kann.In an equally preferred embodiment of the invention, the images of the series are stereoscopic images, the method then further comprising calculating a depth map using a block matching technique.

Die empfangenen Bilder können vor der Erkennung von Bildmerkmalen vorverarbeitet werden, wobei die Vorverarbeitung vorzugsweise eine Kalibrierung umfasst.The received images can be preprocessed prior to the recognition of image features, wherein the preprocessing preferably comprises a calibration.

Sind die Bilder der Serie stereoskopische Bilder, so kann das Berechnen der Kameraposition und Kameraausrichtung ferner eine dynamische Verkürzung oder Verlängerung der virtuellen Distanz zwischen dem linken und dem rechten Bild umfassen, um eine Asynchronität der Einzelbildaufnahmen zu kompensieren.Further, if the images of the series are stereoscopic images, calculating the camera position and orientation may further include dynamically shortening or lengthening the virtual distance between the left and right images to compensate for asynchronism of the still images.

In einer Ausgestaltung der Erfindung umfasst das Berechnen der Kameraposition und Kameraausrichtung die Anwendung eines Kalmanfilters.In one embodiment of the invention, calculating the camera position and camera orientation involves the use of a Kalman filter.

Sind die Bilder der Serie stereoskopische Bilder, so kann der Kalmanfilter einen Zustandsvektor verwenden, der zur Asynchronitätskorrektur eine zusätzliche Variable enthält, mit deren Hilfe die Position des linken oder rechten Bildes verschoben wird. Hierbei kann die zusätzliche Variable des Zustandsvektors eine additive Rauschkomponente besitzen.If the images in the series are stereoscopic images, the Kalman filter can use a state vector that contains an additional variable for the asynchronicity correction, with the aid of which the position of the left or right image is shifted. In this case, the additional variable of the state vector may have an additive noise component.

In einer Ausgestaltung der Erfindung verwendet der Kalmanfilter einen Zustandsvektor mit einer Geschwindigkeits- und Rotationsgeschwindigkeitskomponente, die additive Rauschkomponenten besitzen.In one embodiment of the invention, the Kalman filter uses a state vector having a velocity and rotational velocity component that has additive noise components.

In jeder der beschriebenen Ausgestaltungen kann das Berechnen der Kameraposition und Kameraausrichtung ein Aufintegrieren von Variablen umfassen, die eine Kamerageschwindigkeit und Kamerarotationsgeschwindigkeit angeben.In any of the described embodiments, calculating the camera position and orientation may include integrating variables indicative of a camera speed and camera rotation speed.

Ferner kann das Verfahren das Durchführen einer Z-Drift-Korrektur zu transformierender Bildpunkte umfassen. Hierbei umfasst das Durchführen der Z-Drift-Korrektur vorzugsweise das Durchführen eines Raycastings mit Strahlen, die von neu zu transformierenden Bildpunkten ausgehend die Punktwolke zuvor transformierter Bildpunkte durchdringen, das Aufsammeln von auf den Strahlen oder in deren Nähe gelegenen Bildpunkten, das Bilden eines Durchschnitts der je Strahl gesammelten Bildpunkte, das Bilden einer Transformation unter Verwendung der gebildeten Durchschnitte und der globalen Koordinaten der Punktwolke und das Anwenden der gebildeten Transformation auf die neu zu transformierenden Bildpunkte.Further, the method may include performing a Z-drift correction to pixels to be transformed. Herein, performing the Z-drift correction preferably comprises performing ray-casting with beams penetrating the point cloud of previously transformed pixels from newly-to-be-transformed pixels, collecting pixels located on or near the beams, forming an average of pixels collected per ray, forming a transform using the averages formed and the global coordinates of the point cloud, and applying the formed transform to the pixels to be re-transformed.

Das Berechnen der Kameraposition und Kameraausrichtung kann ferner die Anwendung eines Kalmanfilters umfassen, der einen Zustandsvektor mit mehreren Variablen verwendet. Hierbei kann die gebildete Transformation ferner angewendet werden, um einen oder mehrere der Variablen des Zustandsvektors zu korrigieren.Calculating the camera position and camera orientation may further include the application of a Kalman filter using a multi-variable state vector. In this case, the formed transformation can further be applied to correct one or more of the variables of the state vector.

Im Schritt des Erkennens von Bildmerkmalen werden vorzugsweise solche Bildmerkmale erkannt, die zueinander einen vordefinierten Mindestabstand aufweisen.In the step of recognizing image features, those image features are preferably recognized which have a predefined minimum distance from each other.

Den Bildern der Serie sind in einer bevorzugten Ausgestaltung der Erfindung relative oder absolute Aufnahmezeitinformationen zugeordnet. Der Schritt des Berechnens der Kamerapositionen und Kameraausrichtungen in dem dreidimensionalen globalen Koordinatensystem kann dann diese Aufnahmezeitinformationen verwenden, um einen Kamerabewegungsverlauf zu berechnen.In a preferred embodiment of the invention, the images of the series are assigned relative or absolute recording time information. The step of calculating the camera positions and camera orientations in the three-dimensional global coordinate system may then use this acquisition time information to calculate a camera movement history.

In Ausgestaltungen der Erfindung umfasst das Erkennen von Bildmerkmalen die Anwendung eines Harris-Corner-Detektors und das Verfolgen der erkannten Bildmerkmale die Anwendung eines Lukas-Kanade-Trackers.In embodiments of the invention, the recognition of image features includes the use of a Harris-Corner detector and the tracking of the recognized image features involves the use of a Lukas-Kanade tracker.

Ferner wird erfindungsgemäß ein computer-lesbares Speichermedium bereitgestellt, das computerausführbare Instruktionen speichert, die bei ihrer Ausführung durch einen oder mehrere Prozessoren des Computers diesen einen oder diese mehreren Prozessoren dazu veranlassen, ein Verfahren durchzuführen, wie es oben beschrieben wurde oder weiter unten unter Bezugnahme auf die Figuren beschrieben werden wird.Further provided in accordance with the invention is a computer-readable storage medium storing computer-executable instructions which, when executed, are executed by one or more processors of the computer Computer cause this one or more processors to perform a method as described above or will be described below with reference to the figures.

Des Weiteren betrifft die Erfindung ein Computersystem zur Berechnung von Daten, die eine dreidimensionale Oberfläche repräsentieren, unter Verwendung von Bildern der Oberfläche. Das Computersystem umfasst eine Bilddatenempfangseinheit zum Empfangen einer Serie von Bildern der Oberfläche, wobei die Bilder zeitlich aufeinanderfolgend aufgenommen wurden und wobei jeweils zeitlich benachbarte Bilder einen überlappenden Oberflächenbereich abbilden. Das Computersystem umfasst ferner eine Datenverarbeitungseinheit zum Erkennen von Bildmerkmalen in wenigstens einem Teil der empfangenen Bilder und Verfolgen der erkannten Bildmerkmale von Bild zu Bild, Berechnen einer Kameraposition und Kameraausrichtung in einem dreidimensionalen globalen Koordinatensystem für jedes Bild des wenigstens einen Teils der empfangenen Bilder unter Verwendung der verfolgten Bildmerkmale und Transformieren von Bildpunkten aus Bildern des wenigstens einen Teils der empfangenen Bilder in Koordinaten des dreidimensionalen globalen Koordinatensystems unter Verwendung der berechneten Kamerapositionen und Kameraausrichtungen.Furthermore, the invention relates to a computer system for calculating data representing a three-dimensional surface using images of the surface. The computer system comprises an image data receiving unit for receiving a series of images of the surface, wherein the images were taken in temporal succession and wherein each temporally adjacent images map an overlapping surface area. The computer system further comprises a data processing unit for detecting image features in at least a portion of the received images and tracking the detected image features from image to image, calculating a camera position and camera orientation in a three-dimensional global coordinate system for each image of the at least a portion of the received images using tracking image features and transforming pixels from images of the at least a portion of the received images into coordinates of the three-dimensional global coordinate system using the calculated camera positions and camera orientations.

Das Computersystem kann ferner ausgestaltet sein, um das oben beschriebene Verfahren bzw. das in den Figuren gezeigte und in der zugehörigen Beschreibung dargelegte Verfahren auszuführen.The computer system may be further configured to carry out the method described above or the method shown in the figures and described in the accompanying description.

Die in den Ausgestaltungen beschriebene Erfindung gestattet es, Tiefeninformationen aus monokularen oder stereoskopischen Eingabedaten auf robuste Weise zu rekonstruieren und so die hochpräzise Vermessung allgemeiner Oberflächen auf Basis monokularer oder stereoskopischer Einzelbild- und/oder Videodaten ermöglichen. Dies ermöglicht die komplette Rekonstruktion der Objektgrenzflächen beliebiger physischer Objekte oder Objektbestandteile (z. B. in der Medizin, Biologie, Werkstofftechnik oder Materialprüfung) sowie die anschließende Vermessung oder Fertigung der rekonstruierten Objekte oder Objektbestandteile inklusive davon abgeleiteter Formen (z. B. mittels Rapid-Prototyping-Verfahren).The invention described in the embodiments makes it possible to robustly reconstruct depth information from monocular or stereoscopic input data and thus enable the high-precision measurement of general surfaces on the basis of monocular or stereoscopic frame and / or video data. This allows the complete reconstruction of the object interfaces of arbitrary physical objects or object components (eg in medicine, biology, material technology or material testing) as well as the subsequent measurement or production of the reconstructed objects or object components including forms derived therefrom (eg by means of rapid analysis). prototyping method).

Figurenlistelist of figures

Bevorzugte Ausgestaltungen der Erfindung werden nachfolgend unter Bezugnahme auf die Figuren näher beschrieben, in denen:

1 ein Flußdiagramm ist, das der Funktionsbeschreibung der Erfindung in einer bevorzugten Ausgestaltung dient;
2 ein Flußdiagramm ist, das ein erfindungsgemäßes Verfahren zur Berechnung von Daten, die eine dreidimensionale Oberfläche repräsentieren, gemäß einer Ausgestaltung zeigt;
3 ein Diagramm zur Verdeutlichung der Bildverschiebung bei sich synchron bewegenden stereoskopischen Einzelbildaufnahmen ist;
4 ein Diagramm zur Verdeutlichung der Bildverschiebung bei sich asynchron bewegenden stereoskopischen Einzelbildaufnahmen ist;
5 ein Flußdiagramm ist, das ein erfindungsgemäßes Verfahren zur Z-Drift-Korrektur gemäß einer Ausgestaltung zeigt;
6A und 6B die Durchführung der Z-Drift-Korrektur gemäß einer Ausgestaltung verdeutlichen; und
7 ein Blockdiagramm ist, das ein Computersystem in einer Ausgestaltung der Erfindung darstellt.

Preferred embodiments of the invention are described in more detail below with reference to the figures, in which:

1 Fig. 3 is a flow chart which serves the functional description of the invention in a preferred embodiment;
2 Fig. 10 is a flowchart showing a method of calculating data representing a three-dimensional surface according to an embodiment of the present invention;
3 is a diagram for illustrating the image shift in synchronously moving stereoscopic frame images;
4 is a diagram for illustrating the image shift in asynchronously moving stereoscopic frame images;
5 Fig. 10 is a flowchart showing a method of Z-drift correction according to an embodiment of the present invention;
6A and 6B illustrate the performance of the Z-drift correction according to an embodiment; and
7 Fig. 10 is a block diagram illustrating a computer system in an embodiment of the invention.

Detaillierte Beschreibung bevorzugter AusgestaltungenDetailed description of preferred embodiments

Wie aus dem Vorstehenden sowie dem Nachfolgenden ersichtlich ist, betrifft die Erfindung eine Technik, welche Tiefeninformationen aus bildbasierten Eingabedaten generiert. Die mittels Kamera(s) oder durch andere bildgebende Verfahren aufgezeichneten Oberflächenbereiche werden somit dreidimensional, beispielsweise in Form und Textur, rekonstruiert. Hierfür werden im weitesten Sinne lediglich überlappende Bilder benötigt, also Bilder, die räumlich zusammenhängende Oberflächenbereiche zeigen und die mindestens in Teilen dieselben Oberflächenbereiche abbilden. Diese Bilder können die (zeitveränderlichen) Daten einer einzelnen Kamera (Monobilder), als auch eines Stereokamerasystems (Stereobilder) sein. Dadurch besteht eine besondere Eignung für die Verarbeitung von Videos, wie sie z. B. bei der Archivierung von Endoskopieaufnahmen anfallen.As can be seen from the foregoing and the following, the invention relates to a technique which generates depth information from image-based input data. The recorded by camera (s) or by other imaging methods surface areas are thus three-dimensional, for example in shape and texture, reconstructed. For this purpose, only overlapping images are needed in the broadest sense, ie images that show spatially contiguous surface areas and at least partially depict the same surface areas. These images can be the (time-varying) data of a single camera (mono images), as well as a stereo camera system (stereo images). As a result, there is a particular suitability for the processing of videos, as z. B. incurred in the archiving of Endoskopieaufnahmen.

Die Erfindung kommt ohne Tracking der aufzeichnende(n) Kamera(s) aus. Sie beinhaltet im Gegensatz die Positionsbestimmung der Kamera(s) auf Basis der erfassten Daten, was auch im Nachhinein erfolgen kann. Dies funktioniert sowohl direkt im Live-Modus, d. h. unter direkter Verarbeitung der von der Kamera bzw. den Kameras aufgezeichneten Daten, als auch nachträglich, d. h. auch noch Jahre später ohne Qualitätsverlust auf archivierten Bild- und Videodaten.The invention does not require tracking of the recording camera (s). In contrast, it contains the position determination of the camera (s) on the basis of the acquired data, which can also be done retrospectively. This works both directly in the live mode, ie with direct processing of the data recorded by the camera or the cameras, as well as subsequently, ie years later without any loss of quality on archived image and video data.

Erfindungsgemäß können in Ausgestaltungen auch Stereokamerasysteme (z. B. Stereo-Endoskope) eingesetzt werden, die Stereobilder nicht zeitlich synchron aufzeichnen, was jedoch durch bisherige 3D-Rekonstruktionsverfahren vorausgesetzt wird. Die Erfindung kann dagegen auch mit zeitlich versetzt aufgenommenen Stereobildern arbeiten. Durch zusätzliches stabiles Tracking von relevanten Bildmerkmalen über die Zeit und unter Verwendung statistischer Verfahren kann die präzise Rekonstruktion von Tiefeninformationen und den damit korrespondierenden Objektgrenzflächen ermöglicht werden. Ferner wird es möglich, eine semantisch zusammenhängende Objektgrenzfläche als Abbild eines physikalischen Objektes aus zeitveränderlichen Kameradaten zu generieren und daraus charakteristische quantitative Formmerkmale (z. B. Größen, Abstände, Winkel) abzuleiten.According to the invention, stereo camera systems (eg stereo endoscopes) can be used in embodiments that do not record stereo images synchronously in time, but this is presupposed by previous 3D reconstruction methods. By contrast, the invention can also work with temporally staggered stereo images. By additional stable tracking of relevant image features over time and using statistical methods, the precise reconstruction of depth information and the corresponding object interfaces can be made possible. Furthermore, it becomes possible to generate a semantically coherent object interface as an image of a physical object from time-varying camera data and to derive therefrom characteristic quantitative shape features (eg, sizes, distances, angles).

Im Folgenden werden bevorzugte Ausgestaltungen der Erfindung unter Bezugnahme auf die Figuren näher erläutert. Insbesondere wird zunächst auf 1 verwiesen.In the following, preferred embodiments of the invention will be explained in more detail with reference to the figures. In particular, first on 1 directed.

Die mittels einer monokularen oder stereoskopischen Kamera akquirierten Bilder 100, 105 werden zunächst vorzugsweise initial vorverarbeitet. Dabei kann für monokulare Bilder 100 eine Entzerrung 115 basierend auf einer initial durchzuführenden Kalibrierung erfolgen. Beispielsweise kann hierfür die in Zhengyou Zhang, „A flexible new technique for camera calibration“, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(11):1330-1334, 2000 beschriebene Technik eingesetzt werden. Durch die Kalibrierung steht neben den Entzerrungsparametern vorzugsweise auch die Intrinsik zur Verfügung, die in späteren Verarbeitungsschritten verwendet werden kann.The images acquired by means of a monocular or stereoscopic camera 100 , 105 are preferably initially preprocessed first. It can be used for monocular images 100 an equalization 115 based on a calibration to be performed initially. For example, this can be done in Zhengyou Zhang, "A flexible new technique for camera calibration", Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22 (11): 1330-1334, 2000 described technique can be used. Aside from the equalization parameters, the calibration preferably also makes available the intrinsics, which can be used in later processing steps.

Für stereoskopische Daten kann durch eine Kalibrierung 110 (wie etwa die vorgenannte) des Kamerasetups die Extrinsik und ebenfalls die Intrinsik des Stereokamerasystems erkannt werden. Für den Fall, dass die Kameras nicht parallel zueinander ausgerichtet sind, werden die Stereobilder vorzugsweise so korrigiert, dass sie Bildern aus einem parallelen Stereokamera-Setup entsprechen. Dazu kann eine Rektifizierung gemäß Richard I Hartley, „Theory and practice of projective rectification“, International Journal of Computer Vision, 35(2):115-127, 1999 angewandt werden, die das Bild so verzerrt, dass die Kameras rein virtuell parallel ausrichtet sind. Dadurch fallen die Epipolarlinien zwischen den beiden Stereobildern vorzugsweise auf dieselbe Zeilenhöhe. Insbesondere Matching-Algorithmen zur Stereorekonstruktion können von diesem Schritt profitieren.For stereoscopic data can be obtained by calibration 110 (such as the aforementioned) of the camera setup the extrinsics and also the intrinsics of the stereo camera system are detected. In the event that the cameras are not aligned in parallel, the stereo images are preferably corrected to correspond to images from a parallel stereo camera setup. For this purpose, a rectification according to Richard I Hartley, "Theory and practice of projective rectification," International Journal of Computer Vision, 35 (2): 115-127, 1999 be applied, which distorts the image so that the cameras are purely virtual aligned in parallel. As a result, the epipolar lines between the two stereo images preferably fall on the same line height. In particular, matching algorithms for stereo reconstruction can benefit from this step.

Die (korrigierten) Bilder werden in der in 1 gezeigten Ausgestaltung der Erfindung nun in zweierlei Form verwendet. Zum Einen wird eine Tiefenkarte berechnet (Block 140). Im Fall stereoskopischer Bilder erfolgt dies vorzugsweise über ein semiglobales Blockmatching, z. B. dem nach Heiko Hirschmüller, „Stereo processing by semiglobal matching and mutual information“, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(2):328-341, 2008. Potenzielle Ausreißer in dieser Tiefenkarte, welche durch Fehl-Matches verursacht werden können, können aus der Tiefenkarte entfernt werden (Block 145). Weiterhin kann auf Basis der (korrigierten) Stereobilder ein Tracking 125 durchgeführt werden, welches auffällige Punkte (im Folgenden Keypoints oder Bildmerkmale genannt) räumlich (zwischen den Stereobildern) sowie zeitlich stabil verfolgt. Hierfür können vorzugsweise Techniken verwendet werden, die in Jianbo Shi et al., „Good features to track“, Computer Vision and Pattern Recognition, 1994, Proceedings CVPR'94, 1994 IEEE Computer Society Conference on, pages 593-600, IEEE, 1994 und Jean-Yves Bouguet, „Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm“, Intel Corporation, 5(1-10):4, 2001 beschrieben sind.The (corrected) images are in the in 1 shown embodiment of the invention now used in two forms. On the one hand, a depth map is calculated (block 140). In the case of stereoscopic images, this is preferably done via a semiglobal block matching, z. For example, according to Heiko Hirschmuller, "Stereo Processing by semiglobal matching and mutual information", Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30 (2): 328-341, 2008. Potential outliers in this depth map caused by mis-matches can be removed from the depth map (block 145 ). Furthermore, based on the (corrected) stereo images, a tracking 125 which observes conspicuous points (referred to below as keypoints or image features) spatially (between the stereo images) as well as temporally stable. For this, preferably techniques may be used which are described in US Pat Jianbo Shi et al., "Good features to track", Computer Vision and Pattern Recognition, 1994, Proceedings CVPR'94, 1994 IEEE Computer Society Conference on, pages 593-600, IEEE, 1994 and Jean-Yves Bouguet, "Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm", Intel Corporation, 5 (1-10): 4, 2001 are described.

Diese Keypoints werden anschließend vorzugsweise zur Stereobild-basierten Rekonstruktion der Kameratrajektorie genutzt. Mit Hilfe eines statistischen Verfahrens 130 (vorzugsweise eines erweiterten Kalmanfilters, auch „Extended Kalmanfilter“) können die Keypoints aus den 2D-Aufnahmen der Stereokameras verarbeitet werden. Dabei wird vorzugsweise mittels statistischer Verfahren und Modellwissen eine Abschätzung der realen Kamerabewegung generiert. Diese Information kann dazu verwendet werden, verrauschte Daten (z. B. bedingt durch natürliche Aufnahmefehler und Matching-Ungenauigkeiten) zu reduzieren. Im Gegensatz zu der in Oscar G Grasa et al., „Visual slam for handheld monocular endoscope“, IEEE Transactions on Medical Imaging, 1 (33):135-146, 2014 beschriebenen Technik wird in erfindungsgemäßen Ausgestaltungen auch die Verarbeitung von Stereobilddaten ermöglicht.These keypoints are then preferably used for stereo image-based reconstruction of the camera trajectory. With the aid of a statistical method 130 (preferably an extended Kalman filter, also known as "extended Kalman filter"), the keypoints can be processed from the 2D images of the stereo cameras. In this case, an estimate of the real camera movement is preferably generated by means of statistical methods and model knowledge. This information can be used to reduce noisy data (eg due to natural recording errors and matching inaccuracies). Unlike the in Oscar G Grasa et al., Visual Slam for Handheld Monocular Endoscope, IEEE Transactions on Medical Imaging, 1 (33): 135-146, 2014 In accordance with embodiments of the invention, the processing of stereo image data is also made possible.

Für die monokularen Bilder kann ebenfalls ein Tracking 125 der Keypoints durchgeführt werden. Da hier allerdings am Anfang nicht einfach eine Tiefenkarte bereitsteht, wird zunächst eine initiale Schätzung 120 der 3D-Position der Punkte durchgeführt. Dies kann innerhalb der ersten Frames des Videos durch ein Bundle-Adjustment-Verfahren durchgeführt werden. Hier wird die Kameraposition und die Punktposition initial anhand der 2D-Bilder und eines bekannten Längenmaßes in der Szene bestimmt. Nach der initialen Berechnung während der ersten Frames kann analog zur Stereo-Variante ein statistisches Verfahren 130 (z.B. Extended Kalmanfilter) verwendet werden. Mittels der initalen Schätzung 120 der 3D-Positionen der Keypoints kann dieses Modell auch auf 2D-Bildern weiterarbeiten.For the monocular images can also tracking 125 the keypoints are performed. However, since there is not just a depth map ready at the beginning, an initial guess will be made first 120 performed the 3D position of the points. This can be done within the first frames of the video through a bundle Adjustment procedures are performed. Here, the camera position and the dot position are initially determined based on the 2D images and a known length measure in the scene. After the initial calculation during the first frames, a statistical procedure analogous to the stereo variant can be used 130 (eg extended Kalman filter). By means of the inital estimate 120 The 3D positions of the keypoints allow this model to continue working on 2D images.

Die Tiefenkarte der monokularen Bilder kann anschließend vorzugsweise über aufeinanderfolgende Bilder bestimmt werden (Block 140). Bewegt sich die Kamera zwischen den Einzelbildern bzw. Frames, so kann eine virtuelle Stereokamera erzeugt werden. Da die derzeitige Position und die vorherige Position bekannt ist, kann hier wieder ein Tiefenbild berechnet werden. Vorzugsweise wird dabei jedoch nicht das direkt vorherige Bild zur Tiefenrekonstruktion verwendet, sondern ein Bild selektiert, in welchem die Kamera einen räumlichen Mindestabstand gegenüber dem jeweils zweiten Vergleichsbild aufweist. Anschließend kann aus diesen Bildern mit Hilfe der bekannten Positionen und einer Rektifizierung der virtuellen Stereokamera wieder eine Tiefenkarte bestimmt werden. Aufgrund der damit vorliegenden virtuellen Approximation stereoskopischer Bilddaten wird im weiteren Verlauf nicht explizit zwischen monokularen und stereoskopischen Bildern differenziert, sondern, solange nicht explizit anders beschrieben, stereoskopische Daten synonym für beide Ausprägungen verwendet.The depth map of the monocular images may then be determined preferably over successive images (block 140 ). If the camera moves between the frames or frames, a virtual stereo camera can be generated. Since the current position and the previous position are known, a depth image can be calculated here again. Preferably, however, not the directly preceding image is used for the depth reconstruction, but an image is selected in which the camera has a spatial minimum distance with respect to the respective second comparison image. Subsequently, a depth map can be determined from these images using the known positions and a rectification of the virtual stereo camera again. Because of the virtual approximation of stereoscopic image data thus available, in the further course it is not explicitly differentiated between monocular and stereoscopic images, but, unless explicitly stated otherwise, stereoscopic data is used synonymously for both forms.

Die Approximation der realen Kamerabewegung liefert eine Abschätzung für die Position und die Ausrichtung der Kamera, was auf jedes Einzelbild übertragen werden kann. Dadurch kann die rekonstruierte Tiefenkarte in ein Modell im globalen dreidimensionalen Koordinatensystem (im Folgenden Modell genannt) eingebettet werden (Registrierung 150).The approximation of the real camera movement provides an estimate of the position and orientation of the camera, which can be transferred to each frame. This allows the reconstructed depth map to be embedded in a model in the global three-dimensional coordinate system (called model hereafter) (registration 150).

In einer erfindungsgemäßen Ausgestaltung wird zudem vorzugsweise eine Z-Drift-Korrektur durchgeführt, die, über den zeitlichen Verlauf der Stereobilddaten betrachtet, bei der Rekonstruktion der Kamerabewegung eine leichte Verschiebung in der Tiefe (Z-Drift) kompensiert. Dabei wird die Tiefenrekonstruktion 140 nochmals an das bisher rekonstruierte Modell angepasst. Es wird vorzugsweise eine Transformation ermittelt, welche als Korrektur bei der Registrierung 150 der erzeugten Punktwolken einbezogen werden kann. Weiterhin kann das verwendete Modell des statistischen Verfahrens auf die nun korrigierte Kameraposition angepasst werden. Die wird weiter unten unter Bezugnahme auf 5, 6A und 6B näher erläutert werden.In one embodiment according to the invention, moreover, a Z-drift correction is preferably carried out which, considered over the temporal course of the stereo image data, compensates for a slight shift in the depth (Z drift) during the reconstruction of the camera movement. This is the depth reconstruction 140 again adapted to the previously reconstructed model. It is preferably determined a transformation, which as a correction in the registration 150 the generated point clouds can be included. Furthermore, the model of the statistical method used can be adapted to the now corrected camera position. It will be explained below with reference to 5 . 6A and 6B be explained in more detail.

In einer weiteren bevorzugten Ausgestaltung kann es die Erfindung zudem ermöglichen, Bilder von Stereokameras zu verarbeiten, die keine exakt synchronisierten Stereobilder liefern. Der durch asynchrone Stereoeinzelbilder entstehende Effekt ist vergleichbar zu einer virtuellen Verschiebung der Einzelkameras zueinander. Um zu vermeiden, dass dies im Kontext der Tiefenrekonstruktion 140 zu zusätzlichen Fehlern führt, wird in das Modell des statistischen Verfahrens 130 eine Korrektur 135 einbezogen, die eine Schätzung dieser virtuellen Kameraverschiebung beinhaltet. Vorzugsweise wird hierbei eine Abschätzung des Modellstatus anhand der Messdaten vorgenommen, wobei es sich als vorteilhaft erwiesen hat, die in Peter Hansen et al., „Online continuous stereo extrinsic parameter estimation“, Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference, pages 1059-1066, IEEE, 2012 vorgestellte Modifikation einzusetzen. Die Korrektur 135 basiert auf dem statistischen Verfahren 130 und kann direkt in die Tiefenrekonstruktion 140 in Form eines korrigierten Kameraabstandes einbezogen werden.In a further preferred embodiment, the invention can also make it possible to process images from stereo cameras that do not provide exactly synchronized stereo images. The effect resulting from asynchronous stereo frames is comparable to a virtual shift of the single cameras to each other. To avoid this in the context of depth reconstruction 140 leading to additional errors, is included in the model of the statistical procedure 130 a correction 135 which includes an estimate of this virtual camera shift. In this case, an estimation of the model status is preferably carried out on the basis of the measured data, wherein it has proven to be advantageous to use the in Peter Hansen et al., "Online Continuous Stereo Extrinsic Parameter Estimation", Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference, pages 1059-1066, IEEE, 2012 to introduce the proposed modification. The correction 135 based on the statistical method 130 and can go directly to the depth reconstruction 140 be included in the form of a corrected camera distance.

Aus den vorangegangenen Schritten resultiert nach einem Registrierungsschritt 150 eine Punktwolke, welche z. B. für die weitere Rekonstruktion 160 einer Oberflächengeometrie mit anschließenden Vermessungstätigkeiten genutzt werden kann. Weiterhin wird es so möglich, die Oberfläche mittels Texturemapping 165 unter Verwendung der aus den Stereobildern vorliegenden Farbwerte plausibel zu texturieren.From the previous steps results after a registration step 150 a point cloud, which z. B. for further reconstruction 160 a surface geometry with subsequent surveying activities can be used. Furthermore, it becomes possible, the surface by means of texture mapping 165 plausibly textured using the color values available from the stereo images.

2 zeigt ein erfindungsgemäßes Verfahren zur Berechnung von Daten, die eine dreidimensionale Oberfläche repräsentieren, gemäß einer Ausgestaltung. Zunächst wird in Schritt 200 eine Serie von Bildern 100, 105 der Oberfläche empfangen, wobei die Bilder zeitlich aufeinanderfolgend aufgenommen wurden und wobei jeweils zeitlich benachbarte Bilder einen überlappenden Oberflächenbereich abbilden. Wie bereits beschrieben, können die Bilder Einzelbilder oder Videobilder sowie monokular oder stereoskopisch sein. In Schritt 210 werden Bildmerkmale (Keypoints) in wenigstens einem Teil der empfangenen Bilder erkannt und von Bild zu Bild verfolgt, d.h es wird das Keypoint-Tracking 125 durchgeführt. Zudem wird dann in Schritt 220, vorzugsweise unter Anwendung statistischer Verfahren 130 und unter Verwendung der verfolgten Bildmerkmale, für jedes Bild des wenigstens einen Teils der empfangenen Bilder eine Kameraposition und Kameraausrichtung in einem dreidimensionalen globalen Koordinatensystem berechnet. In Schritt 230 werden dann in der Tiefenrekonstruktion 140 unter Verwendung der berechneten Kamerapositionen und Kameraausrichtungen Bildpunkte aus Bildern des wenigstens einen Teils der empfangenen Bilder in Koordinaten des dreidimensionalen globalen Koordinatensystems transformiert. Schliesslich erfolgt in Schritt 240 die Registrierung 150 der Bildpunkte. Es ist anzumerken, dass das Verfahren nach 2 um weitere Schritte ergänzt werden kann, die den weiteren Blöcken aus 1 entsprechen. Im Folgenden wird das in Schritt 210 durchgeführte Tracking 125 in einer bevorzugten Ausgestaltung der Erfindung näher beschrieben. 2 shows a method according to the invention for calculating data representing a three-dimensional surface according to an embodiment. First, in step 200 a series of pictures 100 . 105 received the surface, wherein the images were taken in temporal succession and wherein each temporally adjacent images map an overlapping surface area. As already described, the images may be single images or video images as well as monocular or stereoscopic. In step 210 Image features (keypoints) are detected in at least a part of the received images and tracked from image to image, ie it is the keypoint tracking 125 carried out. In addition, then in step 220 , preferably using statistical methods 130 and using the tracked image features, compute a camera position and camera orientation in a three-dimensional global coordinate system for each image of the at least a portion of the received images. In step 230 Then, in the depth reconstruction 140, using the calculated camera positions and camera orientations, pixels are obtained from images of the at least a portion of the received images in coordinates of the three-dimensional image transformed global coordinate system. Finally, in step 240 the registration 150 of the pixels. It should be noted that the procedure after 2 To further steps can be added, which the other blocks from 1 correspond. The following is the step in 210 performed tracking 125 in a preferred embodiment of the invention described in more detail.

Um eine spätere Bestimmung der Kameratrajektorie zuzulassen, werden vorteilhafterweise charakteristische Punkte (Keypoints) in den Stereobildern detektiert. Diese Keypoints werden vorzugsweise so gewählt, dass sie möglichst stabil über aufeinanderfolgende Bilder detektierbar sind.In order to allow a later determination of the camera trajectory, characteristic points (keypoints) are advantageously detected in the stereo images. These keypoints are preferably chosen so that they are as stable as possible to detect successive images.

Vorzugsweise wird dafür ein robuster und schnell zu berechnender Merkmalsdetektor, wie z. B. der Harris-Corner-Detektor verwendet. Zunächst werden Keypoints initialisiert und nach mehreren Kriterien gefiltert, um so eine möglichst gleichmäßig über das Bild verteilte Menge charakteristischer Punkte zu erhalten. Bei Verwendung des Harris-Corner-Detektors werden die initial detektierten Keypoint-Kandidaten anhand der Länge des Harris-Eigenvektors gefiltert. Weiterhin werden vorzugsweise der Mindestabstand zu anderen Keypoints sowie die dem Keypoint zugeordnete Minimal-Farbvarianz berücksichtigt. Für das folgende Tracking wird schließlich vorzugsweise eine kleine Submenge von Keypoints (z. B. 20 Stück) weiterverwendet.Preferably, this is a robust and quickly calculated feature detector, such. B. Harris Corner detector used. First of all, key points are initialized and filtered according to several criteria in order to obtain a set of characteristic points distributed as evenly as possible over the image. Using the Harris Corner detector, the initial detected keypoint candidates are filtered based on the length of the Harris eigenvector. Furthermore, the minimum distance to other Keypoints and the minimum color variance associated with the keypoint are preferably taken into account. Finally, for the following tracking, preferably a small subset of keypoints (eg 20 pieces) is used.

Die Keypoints werden bevorzugterweise so ausgewählt, dass sie an besonders markanten Bildpunkten lokalisiert sind. Ebenso vorzugsweise weist die direkte Keypoint-Umgebung einen hohen Informationsgehalt auf. Diese Umgebung kann auch im weiteren Verlauf analysiert und über mehrere Aufnahmen hinweg verfolgt werden. Der Informationsgehalt um einen Keypoint wird beispielsweise über die Farbvarianz berechnet. Sie gibt an, wie stark die Streuung der Farbwerte in einem Gebiet ist. Bei der Schätzung der interessanten Bereiche mittels der Farbvarianz wird vorzugsweise ein Histogramm pro Farbkanal über ein Gebiet um einen Keypoint berechnet. Für jedes Histogramm kann anschließend die Varianz bestimmt werden. Das Ergebnis für die Schätzung der Relevanz eines Bereiches ist dann vorzugsweise das Maximum der Varianzen der Farbkanäle (siehe Formel (1)). Regionen mit der größten Farbvarianz in ihrem lokalen Histogramm können beispielsweise für die Initialisierung der Keypoints und des Tracking-Algorithmus eingesetzt werden. $V_{c} (H_{c}) = (\frac{1}{N} \sum_{i = 0}^{255} H_{c; i}^{2}) - {(\frac{1}{N} \sum_{i = 0}^{255} H_{c; i})}^{2}$

V (H_{r, g, b}) = m a x (V_{r} (H_{r}), V g (H_{g}), V_{b} (H_{b}))

The keypoints are preferably selected so that they are located on particularly prominent pixels. Also preferably, the direct keypoint environment has a high information content. This environment can also be analyzed later and tracked over multiple recordings. The information content around a keypoint is calculated, for example, via the color variance. It indicates how strong the dispersion of color values in a region is. When estimating the regions of interest by means of the color variance, it is preferable to calculate one histogram per color channel over an area around a keypoint. For each histogram, the variance can then be determined. The result for the estimation of the relevance of a region is then preferably the maximum of the variances of the color channels (see formula (1)). For example, regions with the largest color variance in their local histogram can be used to initialize the keypoints and the tracking algorithm.

V_{c} (H_{c}) = (\frac{1}{N} Σ_{i = 0}^{255} H_{c; i}^{2}) - {(\frac{1}{N} Σ_{i = 0}^{255} H_{c; i})}^{2}

V (H_{r . G . b}) = m a x (V_{r} (H_{r}) . V G (H_{G}) . V_{b} (H_{b}))

Hierin bezeichnet c einen Farbwert (r: rot, g: grün, b: blau). H_c ist das Histogramm über den Farbwert c in einer Umgebung um einen Pixel. H_c;i ist der i-te Eintrag in dem Histogramm H_c. V_c(H_c) bezeichnet die Farbvarianz. V(H_r,g,b) ist die Varianz eines Pixels, ermittelt aus den einzelnen Varianzen (V_r, V_g, V_b) der Farbhistogramme H_r, H_g und H_b.Here c denotes a color value (r: red, g: green, b: blue). H _c is the histogram over the color value c in an environment around a pixel. H _{c; i} is the ith entry in the histogram H _c . V _c (H _c ) denotes the color variance. V (H _{r, g, b} ) is the variance of a pixel, determined from the individual variances (V _r , V _g , V _b ) of the color histograms H _r , H _g and H _b .

Bei der Initialisierung von Keypoints wird in einer bevorzugten Ausgestaltung der Erfindung darauf geachtet, dass die Keypoints einen gewissen Mindestabstand zueinander aufweisen, so dass sie sich über das Bild hinweg verteilen. Die gleichmäßige Verteilung der Keypoints ist hilfreich, um eine sich später anschließende Trajektorienschätzungen zu stabilisieren. Treten bei der Initialisierung lokale Häufungen von Keypoints auf, so werden davon vorzugsweise die Keypoints mit der stärksten Farbvarianz ausgewählt.In the initialization of keypoints, care is taken in a preferred embodiment of the invention that the keypoints have a certain minimum distance from each other so that they spread over the image. The even distribution of keypoints is helpful to stabilize later trajectory estimates. If local accumulations of keypoints occur during initialization, preferably the keypoints with the strongest color variance are selected from them.

Nach initialer Detektion der Keypoints werden diese über mehrere Bilder hinweg verfolgt, was z. B. mittels eines Lukas-Kanade-Trackers erfolgen kann. Vorzugsweise wird ein Lukas-Kanade-Tracker eingesetzt, wie er in Jean-Yves Bouguet, „Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm“, Intel Corporation, 5(1-10):4, 2001 beschrieben ist.After initial detection of the Keypoints they are tracked across multiple images, which z. B. can be done by means of a Lukas Kanade tracker. Preferably, a Lukas-Kanade tracker as described in Jean-Yves Bouguet, "Pyramidal implementation of the Affine Lucas Canada feature tracker description of the algorithm", Intel Corporation, 5 (1-10): 4, 2001 is used.

Der Lukas-Kanade-Tracker der erfindungsgemäßen Ausgestaltung berechnet den optischen Fluss für eine Region um den Keypoint. Mit Hilfe einer pyramidalen Skala wird der optische Fluss vorzugsweise erst auf gröber aufgelösten Bereichen berechnet und dann für die zu trackenden Punkte präzisiert. Innerhalb des Trackers werden die partiellen Ableitungen der Helligkeiten der Pixel [1..n] der Region in x-, y- und Zeitrichtung (I_x, I_y, I_t) berechnet. Die Richtung des optischen Flusses wird durch V_x und V_y bezeichnet. Dies wird für eine Region von Pixeln durchgeführt, wobei angenommen wird, dass der optische Fluss in dieser Region konstant ist. Der Fluss kann durch ein überbestimmtes Gleichungssystem (3) berechnet werden, welches als Least-Squares-Problem gelöst werden kann. Das hat zur Folge, dass die Bewegung des optischen Flusses durch die Umgebung geglättet und das Rauschen minimiert wird. Dadurch eignet es sich besonders für das Tracking der Keypoints. $[\begin{matrix} I_{x_{1}} & I_{y_{1}} \\ I_{x_{2}} & I_{y_{2}} \\ ⋮ & ⋮ \\ I_{x_{n}} & I_{y_{n}} \end{matrix}] [\begin{matrix} V_{x} \\ V_{y} \end{matrix}] = [\begin{matrix} - I_{t_{1}} \\ - I_{t_{2}} \\ ⋮ \\ - I_{t_{n}} \end{matrix}]$

The Lukas-Kanade tracker of the embodiment of the invention calculates the optical flux for a region around the keypoint. With the help of a pyramidal scale, the optical flux is preferably calculated only on coarser areas and then specified for the points to be tracked. Within the tracker, the partial derivatives of the brightnesses of the pixels [1..n] of the region in x-, y- and time direction (I _x , I _y , I _t ) are calculated. The direction of optical flow is denoted by V _x and V _y . This is done for a region of pixels assuming that the optical flux in this region is constant is. The flow can be controlled by an overdetermined system of equations ( 3 ), which can be solved as a least squares problem. As a result, the movement of the optical flow through the environment is smoothed and noise is minimized. This makes it particularly suitable for tracking the keypoints.

[\begin{matrix} I_{x_{1}} & I_{y_{1}} \\ I_{x_{2}} & I_{y_{2}} \\ ⋮ & ⋮ \\ I_{x_{n}} & I_{y_{n}} \end{matrix}] [\begin{matrix} V_{x} \\ V_{y} \end{matrix}] = [\begin{matrix} - I_{t_{1}} \\ - I_{t_{2}} \\ ⋮ \\ - I_{t_{n}} \end{matrix}]

Jeder der verfolgten Keypoints kann in einer Liste akkumuliert werden und bekommt vorzugsweise eine eindeutige Identifikationsnummer. Dadurch ist die Identifikation über mehrere Bilder hinweg möglich. Für neu gefundene Keypoints werden vorzugsweise neue einzigartige Identifikationsnummern vergeben. Die Liste mit allen verfolgten Keypoints wird anschließend an den Rekonstruktionsalgorithmus für die Trajektorie der Kamera übergeben.Each of the tracked keypoints can be accumulated in a list and preferably gets a unique identification number. This allows identification across multiple images. For newly found keypoints, new unique identification numbers are preferably assigned. The list of all tracked keypoints is then passed to the camera's trajectory reconstruction algorithm.

Wie oben bereits ausgeführt, wird in Schritt 220, vorzugsweise unter Anwendung statistischer Verfahren 130 und unter Verwendung der in Schritt 210 erkannten und verfolgten Bildmerkmale, für jedes Bild des wenigstens einen Teils der empfangenen Bilder eine Kameraposition und Kameraausrichtung in einem dreidimensionalen globalen Koordinatensystem berechnet. Hierzu wird vorzugsweise ein erweiterter Kalmanfilter eingesetzt. Es kann zusätzlich auch eine dynamische Asynchronitätskorrektur 135 erfolgen. Beides wird nachfolgend näher beschrieben.As already stated above, in step 220 , preferably using statistical methods 130 and using the in step 210 detected and tracked image features, compute a camera position and camera orientation in a three-dimensional global coordinate system for each image of the at least a portion of the received images. For this purpose, an extended Kalman filter is preferably used. It can also be a dynamic asynchrony correction 135 respectively. Both will be described in more detail below.

Anhand der in den Stereobildern verfolgten Keypoints bestimmt der Kalmanfilter die Position und Ausrichtung der (ggf. virtuellen) Stereokamera im Raum. Dabei wird im Modell eine Projektion der 3D-Position der Keypoints auf die Kamerabildoberfläche angenommen. Mit Hilfe dieses Modells können Fehler beim Matching gedämpft werden.Based on the keypoints tracked in the stereo images, the Kalman filter determines the position and orientation of the (possibly virtual) stereo camera in the room. In the model, a projection of the 3D position of the keypoints on the camera image surface is assumed. With the help of this model, errors in matching can be damped.

Wird tatsächlich eine Stereokamera eingesetzt, so entsteht durch Bewegung der Stereokamera und Asynchronität der Einzelbildaufnahmen eine virtuelle Verschiebung zwischen den zusammengehörigen Einzelbildaufnahmen. Je nach Bewegungsrichtung der Stereokamera kann diese Differenz größer oder kleiner als die initiale Stereokalibrierung T_s sein.If a stereo camera is actually used, the movement of the stereo camera and the asynchronicity of the single image recordings create a virtual shift between the associated single image recordings. Depending on the direction of movement of the stereo camera, this difference may be greater or smaller than the initial stereo calibration T _s .

In 3 ist zu sehen, dass bei synchronen Kameras keine Änderung der Extrinsik geschieht. Die Pfeile zeigen die Bewegungsrichtung an. In 4 sind asynchrone Kameras zu sehen, die linke Kamera nimmt vor der rechten ein Bild auf. Bewegt sich das Setup nach links (linker Teil der Figuren), dann schiebt sich das Kamerasetup zusammen. Bewegt es sich nach rechts (rechter Teil der Figuren), dann entfernen sich die Kameras voneinander.In 3 It can be seen that with synchronous cameras no change of the Extrinsik happens. The arrows indicate the direction of movement. In 4 are asynchronous cameras to see the left camera takes a picture in front of the right. If the setup moves to the left (left part of the figures), the camera setup collapses. If it moves to the right (right part of the figures), then the cameras move away from each other.

Zur Kompensation dieses Effektes wird in einer Ausgestaltung der Erfindung eine ständige Rekalibrierung der Extrinsik T_s des Stereosetups durchgeführt. Während horizontaler Stereokamera-Bewegungen wird durch das Kalmanfiltermodell geschätzt, wie die linke und rechte Kamera des Stereokamera-Setups virtuell verschoben werden müssten, um die durch die Asynchronität während der Bewegung hervorgerufene Bildverschiebung zu kompensieren. Auf diese Weise ist eine dynamische Verkürzung oder Verlängerung der virtuellen Distanz zwischen dem linken und rechten Bild und damit auch die Kompensation der Abweichung von der initialen Stereokalibrierung, welche dem System im Ruhezustand entspricht, möglich.To compensate for this effect, in one embodiment of the invention, a continuous recalibration of the extrinsic T _{s of} the stereo set-up is performed. During horizontal stereo camera movements, the Kalman filter model estimates how the left and right camera of the stereo camera setup would have to be virtually displaced to compensate for the image shift caused by asynchrony during movement. In this way, a dynamic shortening or extension of the virtual distance between the left and right image and thus the compensation of the deviation from the initial stereo calibration, which corresponds to the system at rest, possible.

Werden monokulare Bilder verwendet, so kann die Baselinekorrektur 135 ausgelassen werden, da die Position des virtuellen Stereosetups bereits direkt durch den Kalmanfilter bestimmt wird. Der Kalmanfilter berechnet jedoch weiterhin die 3D-Positionen der Punkte sowie die Ausrichtung und Position der Kamera anhand des jeweiligen 2D-Bildes anstatt der zwei Stereobilder.If monocular images are used, the baseline correction may be 135 are omitted because the position of the virtual stereo setup is already determined directly by the Kalman filter. However, the Kalman filter continues to calculate the 3D positions of the points and the orientation and position of the camera based on the particular 2D image instead of the two stereo images.

Der Kalmanfilter der Ausgestaltung kann einen Zustandsvektor x_k verwenden, wie er in Gleichung (4) gezeigt ist. Der Zustandsvektor besteht zum Einen aus Daten wie der Kameraposition und -bewegung, als auch aus Daten über die Keypoints in der Umgebung. p_k ist die Position der Kamera beim Schritt k. v_k dementsprechend die Geschwindigkeit. Zudem wird noch die Orientierung Φ_k und deren Veränderung ω_k über Winkel und Winkelgeschwindigkeiten gespeichert. Im restlichen Zustandsvektor werden alle Positionen y_k,ider derzeit sichtbaren Keypoints gespeichert. $x_{k} = (\begin{matrix} p_{k} \\ ϕ_{k} \\ v_{k} \\ ω_{k} \\ b_{k} \\ y_{k,1} \\ ⋮ \\ y_{k, n} \end{matrix})$

The Kalman filter of the embodiment may use a state vector x _k as shown in equation (4). The state vector consists on the one hand of data such as the camera position and movement, as well as data on the Keypoints in the environment. p _k is the position of the camera at step k. v _k accordingly the speed. In addition, the orientation Φ _k and its change ω _k via angles and angular velocities saved. In the remaining state vector, all positions y _{k, i of} the currently visible keypoints are stored.

x_{k} = (\begin{matrix} p_{k} \\ φ_{k} \\ v_{k} \\ ω_{k} \\ b_{k} \\ y_{k,1} \\ ⋮ \\ y_{k . n} \end{matrix})

Zu jedem Zeitpunkt k kann somit die Position der Keypoints, als auch die Position und Ausrichtung der Kamera abgelesen werden. Damit der Kalmanfilter die Verschiebung im Rahmen der Asynchronitätskorrektur berechnen kann, wird er noch um eine Variable b_k erweitert. Bei Verwendung monokularer Bilder wird b_k vorzugsweise nicht verwendet, kann aber dennoch im Zustandsvektor vorhanden sein.Thus, at any time k, the position of the keypoints as well as the position and orientation of the camera can be read. In order for the Kalman filter to calculate the shift as part of the asynchrony correction, it is extended by a variable b _k . When using monocular images, b _{k is} preferably not used, but may still be present in the state vector.

Wenn die Bewegung der Kamera angenommen wird, werden vorzugsweise die Geschwindigkeiten (v_k und für die Rotation ω_k aufintegriert. Die Variation der Geschwindigkeiten wird in einer Ausgestaltung der Erfindung durch normalverteiltes Rauschen w_v und w_ω angeglichen. Dadurch ergibt sich direkt die Position p_k und die Ausrichtung Φ_k der Kamera. Unter der Annahme, dass die Keypoints statisch sind, werden die existierenden Keypoint-Positionen vorzugsweise direkt übernommen. Die Keypoints werden hier nun mit ihren globalen Koordinaten weiterverwendet. Der Nullpunkt liegt vorzugsweise dort, wo die Kamera gestartet ist. Auch die Baselinekorrektur 135 bekommt vorzugsweise noch ein Rauschverhalten über w_b hinzugefügt, so dass sie über die Zeit variieren kann. Im Fall monokularer Bilder findet b_k keine Verwendung, wodurch b_k+ w_bdann vollständig entfallen kann. $x_{k | k - 1} = f (x_{k - 1}) = (\begin{matrix} p_{k} + v_{k} \\ ϕ_{k} + ω_{k} \\ v_{k} + w_{k} \\ ω_{k} + w_{ω} \\ b_{k} + w_{b} \\ y_{k,1} \\ ⋮ \\ y_{k, n} \end{matrix})$

If the motion of the camera is assumed, it is preferable to integrate the velocities (v _k and for the rotation ω _k .) In one embodiment of the invention, the variation of the velocities is matched by normally distributed noise w _v and w _ω _k and the orientation Φ _{k of} the camera Assuming that the keypoints are static, the existing keypoint positions are preferably taken over directly.The keypoints are now reused with their global coordinates.The zero point is preferably where the camera started also the baseline correction 135 preferably still adds a noise behavior over w _b so that it can vary over time. In the case of monocular images b _{k is} not used, whereby b _k + w _{b can} then be completely eliminated.

x_{k | k - 1} = f (x_{k - 1}) = (\begin{matrix} p_{k} + v_{k} \\ φ_{k} + ω_{k} \\ v_{k} + w_{k} \\ ω_{k} + w_{ω} \\ b_{k} + w_{b} \\ y_{k,1} \\ ⋮ \\ y_{k . n} \end{matrix})

Das Messmodell gemäß einer Ausgestaltung der Erfindung beinhaltet vorzugsweise die Projektion der Keypoints y_k,i, so dass die durch den Zustandsvektor repräsentierten 3D-Punkte mit den 2D-Messdaten verglichen werden können. Zudem befinden sich die Keypoints nun im globalen Koordinatensystem, weswegen die Keypoints zuerst in das Kamerakoordinatensystem zurücktransformiert werden und danach erst in die Stereokameraebenen zurückprojiziert werden. Dazu werden die Keypoints durch die derzeitige Position der Kamera p und die derzeitige Rotation R_Φ und im Falle der rechten Kamera durch die Extrinsik (e_t und E_rot) in das Kamerakoordinatensystem zurücktransformiert. Anschließend erfolgt die Projektion in die linke und rechte Kamera durch die Intrinsiken (I_Cl und I_Cr). Da nicht normierte homogene Koordinaten aus der Projektion herauskommen, werden diese vorzugsweise wieder normiert durch die Funktion hom(a). Die rechte Kamera wird hier durch die Baselinekorrektur b_k verschoben, so dass sie der Bewegung und der Asynchronität des Kamerasetups entgegenwirkt. Anstelle der rechten Kamera kann alternativ auch die linke Kamera verwendet werden, oder beide Kameras in anteiliger Weise. Da eine Baselinekorrektur im Fall monokularer Bilder nicht erforderlich ist, findet b_k dabei keine Verwendung und wird beispielsweise auf den Wert 0 gesetzt. $h (x_{k | k - 1}) = h ((\begin{matrix} p_{k} \\ ϕ_{k} \\ v_{k} \\ ω_{k} \\ b_{k} \\ y_{k,1} \\ ⋮ \\ y_{k, n} \end{matrix})) = (\begin{matrix} h o m (I_{C_{l}} R_{ϕ}^{- 1} (y_{k,1} - p)) \\ ⋮ \\ h o m (I_{C_{l}} R_{ϕ}^{- 1} (y_{k, n} - p)) \\ h o m (I_{C_{r}} R_{ϕ}^{- 1} (E_{r o t}^{- 1} (y_{k,1} - e_{t} - b_{k}) - p)) \\ ⋮ \\ h o m (I_{C_{r}} R_{ϕ}^{- 1} (E_{r o t}^{- 1} (y_{k, n} - e_{t} - b_{k}) - p)) \end{matrix})$

h o m (a) = (\begin{matrix} \frac{a_{x}}{a_{z}} \\ \frac{a_{y}}{a_{z}} \end{matrix})

The measurement model according to one embodiment of the invention preferably includes the projection of the keypoints y _{k, i} , so that the 3D points represented by the state vector can be compared with the 2D measurement data. In addition, the keypoints are now in the global coordinate system, which is why the keypoints are first transformed back into the camera coordinate system and then projected back into the stereo camera planes. For this purpose, the keypoints are transformed back into the camera coordinate system by the current position of the camera p and the current rotation R _Φ and in the case of the right camera by the extrinsic (e _t and E _red ). Subsequently, the projection into the left and right camera takes place through the intrinsics (I _Cl and I _Cr ). Since non-normalized homogeneous coordinates come out of the projection, they are preferably normalized again by the function hom (a). The right camera is moved here by the baseline correction b _k , so that it counteracts the movement and the asynchrony of the camera setup. Alternatively, the left camera can be used instead of the right camera or both cameras in a proportionate way. Since a baseline correction is not required in the case of monocular images, b _{k is} not used and is for example based on the value 0 set.

H (x_{k | k - 1}) = H ((\begin{matrix} p_{k} \\ φ_{k} \\ v_{k} \\ ω_{k} \\ b_{k} \\ y_{k,1} \\ ⋮ \\ y_{k . n} \end{matrix})) = (\begin{matrix} H O m (I_{C_{l}} R_{φ}^{- 1} (y_{k,1} - p)) \\ ⋮ \\ H O m (I_{C_{l}} R_{φ}^{- 1} (y_{k . n} - p)) \\ H O m (I_{C_{r}} R_{φ}^{- 1} (e_{r O t}^{- 1} (y_{k,1} - e_{t} - b_{k}) - p)) \\ ⋮ \\ H O m (I_{C_{r}} R_{φ}^{- 1} (e_{r O t}^{- 1} (y_{k . n} - e_{t} - b_{k}) - p)) \end{matrix})

H O m (a) = (\begin{matrix} \frac{a_{x}}{a_{z}} \\ \frac{a_{y}}{a_{z}} \end{matrix})

Hierin ist h(x_k|k-1) das Messmodell des Kalmanfilters. Diese Funktion bildet die Modellwerte (3D-Positionen der Keypoints) im Kalmanfilter auf die Bildpunkte in der Stereokamera ab. y_k,i ist dabei die 3D-Position des i-ten Keypoints zum Zeitschritt k. p ist die Position der Kamera. R_Φ ist die Rotation der Stereokamera. I_Cl und I_Cr sind die Intrinsiken der linken und rechten Kamera. E_rot ist die Rotation zwischen den beiden Stereokameras und e_t ist die Translation der Stereokameras zueinander. b_k bezeichnet die Baselinekorrektur, die durch den Kalmanfilter durchgeführt wird. hom() ist eine Hilfsfunktion, die den Vektor a homogenisiert. Der Vektor a wird durch Homogenisierung, also durch Teilen seiner letzten Komponente a_z, wieder in einen normalen Vektor überführt.Herein, h (x _{k | k-1} ) is the measurement model of the Kalman filter. This function maps the model values (3D positions of the keypoints) in the Kalman filter to the pixels in the stereo camera. y _{k, i} is the 3D position of the ith keypoint at time step k. p is the position of the camera. R _Φ is the rotation of the stereo camera. I _Cl and I _Cr are the intrinsics of the left and right camera. E _red is the rotation between the two stereo cameras and e _t is the translation of the stereo cameras to each other. b _k denotes the baseline correction performed by the Kalman filter. hom () is an auxiliary function which homogenizes the vector a. The vector a is converted back into a normal vector by homogenization, ie by dividing its last component a _z .

Es war oben bereits erwähnt worden, dass für die einzelnen Bilder im Zuge der Rekonstruktion eine Kameraposition und Kameraausrichtung in einem dreidimensionalen globalen Koordinatensystem berechnet werden. Diese Berechnung stellt eine Form der räumlichen Verfolgung (des Trackings) der die Bilddaten aufnehmenden Kamera da. In einer bevorzugten Ausgestaltung der Erfindung sind zusätzlich zu den für die Oberflächenrekonstruktion genutzten Bildern relative oder absolute Aufnahmezeitinformationen bekannt. In diesem Fall wird der relative bzw. absolute zeitliche Bewegungsverlauf der die Bilddaten aufnehmenden Kamera durch Sammlung und zeitliche Sortierung aller berechneten Kamerapositionen und Kameraausrichtungen und Assoziation mit den korrespondierenden Aufnahmezeitinformationen der jeweils zugrundeliegenden Bilder rekonstruiert. Die Rekonstruktion kann dabei für zum Zeitpunkt der Bildaufnahme in Echtzeit empfangene Bilddaten oder auch auf Basis archivierter oder anderweitig gespeicherter Bilddaten erfolgen und stellt eine Form des räumlichen Verfolgungsverlaufes (des Tracking-Verlaufes) der die Bilddaten aufnehmenden Kamera da.It has already been mentioned above that for the individual images in the course of the reconstruction, a camera position and camera orientation are calculated in a three-dimensional global coordinate system. This calculation provides a form of spatial tracking of the camera taking the image data. In a preferred embodiment of the invention, in addition to the images used for the surface reconstruction, relative or absolute recording time information is known. In this case, the relative or absolute temporal course of movement of the camera receiving the image data is reconstructed by collecting and temporally sorting all calculated camera positions and camera orientations and associating with the corresponding exposure time information of the respective underlying images. The reconstruction can take place for image data received in real time at the time of image recording or else on the basis of archived or otherwise stored image data and provides a form of spatial tracking (of the tracking curve) of the camera receiving the image data.

Wird nun wieder Bezug genommen auf 1 und die zugehörige Beschreibung weiter oben, so wird im Folgenden die bereits erwähnte Z-Drift-Korrektur 155 näher erläutert. Hierzu wird insbesondere auf 5 verwiesen, die ein entsprechendes Verfahren gemäß einer Ausgestaltung der Erfindung zeigt, sowie auf die 6A und 6B, die die Z-Drift-Korrektur anhand eines vereinfachten Beispiels verdeutlichen.Will be referred back to 1 and the associated description above, so the above-mentioned Z-drift correction 155 is explained in more detail below. This is in particular on 5 referenced, which shows a corresponding method according to an embodiment of the invention, as well as on the 6A and 6B , which illustrate the Z-drift correction using a simplified example.

Durch die Abschätzung 220 der Kameratrajektorie mittels des Kalmanfilters 130 erhält man für jedes Bild (z.B. für jeden Frame eines Videos) die Kameraposition und -orientierung. Dementsprechend kann die rekonstruierte Punktwolke direkt an ihre globalen Koordinaten transformiert werden, so dass ein Gesamtmodell aus den Einzelrekonstruktionen entsteht.By the estimate 220 the camera trajectory using the Kalman filter 130 For each image (eg for each frame of a video) you get the camera position and orientation. Accordingly, the reconstructed point cloud can be transformed directly to its global coordinates, so that an overall model arises from the individual reconstructions.

Die erfindungsgemäße Z-Drift-Korrektur geht von der Erkenntnis aus, dass Positionsabschätzungen des Kalmanfilters unterschiedlich starke Fehler aufweisen können. Entlang der Projektionsebene der Kamera (rechts, links, oben, unten) kann die Abschätzung beispielsweise pixelgenau sein, wohingegen in Blockrichtung (vorne, hinten) die Schätzung auf der Genauigkeit der Keypointmatches beruhen kann. Das kann dazu führen, dass die Abschätzung der Tiefenposition relativ zur Kamera mit größerem Fehler behaftet ist. Registriert man die Einzelbilder bzw. Frames Schritt für Schritt aufeinander, so kann es durch diese Unsicherheit in der Tiefenabschätzung trotzdem dazu kommen, dass die Registrierung entartet und die Punktwolken nicht mehr übereinander liegen.The Z-drift correction according to the invention is based on the knowledge that position estimates of the Kalman filter can have different degrees of errors. Along the projection plane of the camera (right, left, top, bottom), the estimate may be, for example, pixel-precise, whereas in the block direction (front, back) the estimate may be based on the accuracy of the keypoint matches. This can lead to the estimation of the depth position relative to the camera having a greater error. If one records the individual images or frames one after the other, this uncertainty in the depth estimation may nevertheless cause the registration to degenerate and the point clouds to no longer overlap one another.

Bevor die aus einem Einzelbild bzw. Frame resultierende Punktwolke 610 in das bis zum jeweiligen Zeitpunkt rekonstruierte Weltmodell eingebettet wird, wird vorzugsweise eine globale Nachkorrektur durchgeführt. Dazu wird die zusätzliche, zu registrierende Punktwolke 610 in die vom Kalmanfilter berechnete Position und Ausrichtung transformiert. Anschließend wird für jeden Punkt aus der zu registrierenden Punktwolke 610 aus der Kamera heraus ein Raycasting durchgeführt (Schritt 500). Die Strahlen gehen dabei durch die Punktwolke 600 des bisher registrierten Modells (6A). Nun werden alle Punkte, die auf oder um den Strahl herum liegen, aufgesammelt (Schritt 510). Aus dem Durchschnitt dieser gesammelten Punkte (Schritt 520) und dem 3D-Punkt der zu registrierenden Punktwolke 610 entsteht ein Punktepaar, welches aufeinander zu registrieren ist. Danach wird in Schritt 530 eine rigide Transformation gesucht, die die Punktwolken aufeinander abbildet. Die Transformation wird anschließend in Schritt 540 auf die zu registrierende Punktwolke 610 angewendet und letztere dann in das Weltmodell eingebettet. Außerdem kann die Transformation in Schritt 550 als Korrektur auf das Modell des Kalmanfilters angewendet werden. Diese Korrektur betrifft vorzugsweise die Position der Kamera und der Keypoints sowie die Ausrichtung der Kamera. All diese Werte werden vorzugsweise im Zustandsvektor des Kalmanfilters aktualisiert.Before the point cloud resulting from a frame or frame 610 is embedded in the reconstructed to the respective time world model, preferably a global post-correction is performed. This is the additional, to be registered point cloud 610 transformed into the position and orientation calculated by the Kalman filter. Subsequently, ray-scanning is performed for each point from the point cloud 610 to be registered out of the camera (step 500 ). The rays go through the point cloud 600 of the previously registered model ( 6A ). Now all points lying on or around the beam are picked up (step 510). From the average of these accumulated points (step 520 ) and the 3D point of the point cloud to be registered 610 creates a pair of points, which is to register each other. After that, in step 530 sought a rigid transformation that maps the point clouds to each other. The transformation is then in step 540 on the point cloud to be registered 610 applied and the latter then embedded in the world model. In addition, the transformation in step 550 be applied as a correction to the model of the Kalman filter. This correction preferably relates to the position of the camera and the keypoints as well as the orientation of the camera. All these values are preferably updated in the state vector of the Kalman filter.

6A zeigt die bereits registrierte Punktwolke 600 und die zu registrierende Punktwolke 610 vor bzw. während der Durchführung der Z-Drift-Korrektur. 6B zeigt hingegen die Punktwolken nach Berichtigung durch die Z-Drift-Korrektur. 6A shows the already registered point cloud 600 and the point cloud to be registered 610 before or during the performance of the Z-drift correction. 6B however, shows the point clouds after correction by the Z-drift correction.

Wird schliesslich noch auf 7 Bezug genommen, so wird dort ein beispielhaftes Computersystem gezeigt, das in Verbindung mit jedem der oben beschriebenen oder beanspruchten Ausgestaltungen der Erfindung verwendet werden kann. Es ist eine Bilddatenempfangseinheit 700 vorgesehen, die die monokularen und/oder stereoskopischen Bilder empfängt. Die empfangenen Bilder werden dann an eine Datenverarbeitungseinheit 705 weitergeleitet, die die oben beschriebenen Berechnungen durchführt. Die empfangenen Bilddaten können zudem in einem Speicher 760 gespeichert werden. Der Speicher kann Teil des Computersystems sein, es kann aber auch ein externer Speicher Anwendung finden. Der (interne und/oder externe) Speicher kann zudem neben den gespeicherten Bilddaten 765 auch registrierte Punktwolken 770 speichern.Finally it will open 7 With reference to the drawings, there is shown an exemplary computer system that may be used in conjunction with any of the above-described or claimed embodiments of the invention. It is an image data receiving unit 700 provided which receives the monocular and / or stereoscopic images. The received images are then sent to a data processing unit 705 which performs the calculations described above. The received image data can also be stored in memory 760 get saved. The memory can be part of the computer system, but it can also find an external memory application. The (internal and / or external) memory can also be next to the stored image data 765 also registered point clouds 770 to save.

Die Datenverarbeitungseinheit 705 enthält eine Reihe von Untereinheiten 710-755, die einzelne der oben beschriebenen Aufgaben erfüllen, beispielsweise eine Entzerrung 115, eine 3D-Schätzung 120, ein Keypoint-Tracking 125, einen Kalmanfilter 130, eine Baseline-Korrektur 135, eine Tiefenrekonstruktion 140, eine Outlier-Entfernung 145, eine Z-Drift-Korrektur 155, ein Meshing 160 und ein Texturing 165. Es müssen nicht in allen Ausgestaltungen alle dieser Untereinheiten vorhanden sein, wohingegen aber auch weitere Untereinheiten vorhanden sein können. Zudem können in anderen Ausgestaltungen die Untereinheiten auch in verschiedenen Kombinationen miteinander kombiniert vorliegen. Beispielsweise kann es Untereinheiten geben, die mehrere der durchzuführenden Funktionen durchführen. Jede der Untereinheiten kann als Softwaremodul ausgestaltet sein, die Untereinheiten können aber zumindest teilweise auch durch dedizierte Hardware realisiert sein.The data processing unit 705 contains a number of subunits 710 - 755 that perform any of the tasks described above, such as equalization 115, 3D estimation 120, keypoint tracking 125 , a Kalman filter 130 , a baseline correction 135 , a deep reconstruction 140 , an outlier distance 145 , a Z-drift correction 155, a meshing 160 and a texturing 165 , It may not be present in all embodiments, all of these subunits, but also other subunits may be present. In addition, in other embodiments, the subunits may be combined with each other in various combinations. For example, there may be subunits that perform several of the functions to be performed. Each of the subunits can be configured as a software module, but the subunits can also be implemented, at least in part, by dedicated hardware.

ZITATE ENTHALTEN IN DER BESCHREIBUNG QUOTES INCLUDE IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of the documents listed by the applicant has been generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte Nicht-PatentliteraturCited non-patent literature

Zhengyou Zhang, "A flexible new technique for camera calibration", Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22 (11): 1330-1334, 2000. [0031]
Richard I Hartley, "Theory and practice of projective rectification," International Journal of Computer Vision, 35 (2): 115-127, 1999. [0032]
Jianbo Shi et al., "Good features to track", Computer Vision and Pattern Recognition, 1994, Proceedings CVPR'94, 1994 IEEE Computer Society Conference on, pages 593-600, IEEE, 1994 [0033]
Jean-Yves Bouguet, "Pyramidal implementation of the affine lucas kanad feature tracker description of the algorithm", Intel Corporation, 5 (1-10): 4, 2001 [0033]
Oscar G Grasa et al., "Visual slam for handheld monocular endoscope", IEEE Transactions on Medical Imaging, 1 (33): 135-146, 2014 [0034]
Peter Hansen et al., "Online Continuous Stereo Extrinsic Parameter Estimation", Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference, pages 1059-1066, IEEE, 2012 [0039]

Claims

A computer-implemented method for computing data representing a three-dimensional surface using images of the surface, the method comprising: Receiving (200) a series of images (100, 105) of the surface, wherein the images were taken in temporal succession and wherein each temporally adjacent images map an overlapping surface area; Recognizing (125, 210) image features in at least a portion of the received images and tracking (125, 210) the detected image features from image to image; Calculating (130, 220) a camera position and camera orientation in a three-dimensional global coordinate system for each image of the at least a portion of the received images using the tracked image features; and Transforming (130, 140, 230) pixels from images of the at least a portion of the received images into coordinates of the three-dimensional global coordinate system using the calculated camera positions and camera orientations.

Method according to Claim 1 wherein the series of images of the surface are received in real time at the time of image acquisition.

Method according to Claim 1 wherein the step of receiving the series of images of the surface comprises reading archived or otherwise stored image data.

Method according to one of Claims 1 to 3 wherein the images of the series are monocular images and the method further comprises calculating (140) a depth map using a bundle adjustment technique.

Method according to one of Claims 1 to 3 wherein the images of the series are stereoscopic images and the method further comprises calculating (140) a depth map using a block matching technique.

Method according to one of Claims 1 to 5 wherein the received images are preprocessed prior to recognition of image features, and wherein the preprocessing comprises a calibration (110).

Method according to one of Claims 1 to 6 wherein the images of the series are stereoscopic images, and calculating the camera position and orientation further comprises dynamically shortening or extending (135) the virtual distance between the left and right images to compensate for asynchrony of the still images.

Method according to one of Claims 1 to 7 Calculating the camera position and camera orientation involves the use of a Kalman filter.

Method according to Claim 7 and 8th in which the images of the series are stereoscopic images and the Kalman filter uses a state vector which, for asynchronism correction, contains an additional variable used to shift the position of the left or right image.

Method according to Claim 9 , wherein the additional variable of the state vector has an additive noise component.

Method according to one of Claims 8 to 10 wherein the Kalman filter uses a state vector having a velocity and rotational velocity component that has additive noise components.

Method according to one of Claims 1 to 11 wherein calculating the camera position and camera orientation comprises integrating variables indicative of a camera speed and camera rotation speed.

Method according to one of Claims 1 to 12 The method further comprises performing a Z-drift correction (155) for pixels to be transformed.

Method according to Claim 13 wherein performing the Z-drift correction comprises: performing (500) raycasting with rays that penetrate the point cloud of previously transformed pixels from new-to-be-transformed pixels; Collecting (510) pixels located on or near the beams; Forming (520) an average of the pixels collected per beam; Forming (530) a transformation using the formed averages and the global coordinates of the point cloud; and applying (540) the formed transform to the pixels to be re-transformed.

Method according to Claim 14 wherein calculating the camera position and camera orientation comprises applying a Kalman filter using a multi-variable state vector, and wherein the formed transform is further applied to correct one or more of the variables of the state vector (550).

Method according to one of Claims 1 to 15 , wherein in the step of recognizing image features such image features are detected, which have a predefined minimum distance to each other.

Method according to one of Claims 1 to 16 wherein the images of the series are assigned relative or absolute recording time information, and the step of calculating the camera positions and camera orientations in the three-dimensional global coordinate system uses this recording time information to calculate a camera movement history.

Method according to one of Claims 1 to 17 wherein the recognition of image features comprises the application of a Harris corner detector and / or wherein the tracking of the recognized image features comprises the application of a Lukas Kanade tracker.

A computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors of the computer, cause that one or more processors to perform a method of computing data representing a three-dimensional surface using images of the surface which includes: Receiving a series of images of the surface, wherein the images were taken in temporal succession, and wherein each temporally adjacent images map an overlapping surface area; Recognizing image features in at least a portion of the received images and tracking the recognized image features from image to image; Calculating a camera position and orientation in a three-dimensional global coordinate system for each image of the at least a portion of the received images using the tracked image features; and Transforming pixels from images of the at least a portion of the received images into coordinates of the three-dimensional global coordinate system using the calculated camera positions and camera orientations.

A computer system for computing data representing a three-dimensional surface using images of the surface, the computer system comprising: an image data receiving unit (700) for receiving a series of images of the surface, wherein the images were taken in temporal succession, and wherein each temporally adjacent images map an overlapping surface area; and a data processing unit (705) for detecting image features in at least a portion of the received images and tracking the detected image features from image to image, calculating a camera position and camera orientation in a three-dimensional global coordinate system for each image of the at least a portion of the received images using the tracked ones Image features and transforming pixels from images of the at least a portion of the received images into coordinates of the three-dimensional global coordinate system using the calculated camera positions and camera orientations.