CN116327103B

CN116327103B - Large-visual-angle laryngoscope based on deep learning

Info

Publication number: CN116327103B
Application number: CN202310618436.6A
Authority: CN
Inventors: 李金红; 刘磊; 李丽娟; 闫燕; 栾衡; 王丽
Original assignee: Peking University Third Hospital Peking University Third Clinical Medical College
Current assignee: Peking University Third Hospital Peking University Third Clinical Medical College
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2023-07-21
Anticipated expiration: 2043-05-30
Also published as: CN116327103A

Abstract

The invention discloses a large-view laryngoscope based on deep learning, which comprises a micro-lens imaging component, a micro-lens imaging component and a camera, wherein the micro-lens imaging component is used for acquiring image data; the micro-lens imaging assembly comprises at least two groups of small-caliber micro-lenses with non-overlapping visual fields; the flexible image transmission assembly is connected with the micro-lens imaging assembly and used for transmitting image data; the image processing module is connected with the other end of the flexible image transmission component, and is used for training the anti-neural network according to the image data obtained by the micro-lens imaging component to obtain an image fusion model, and the image fusion model is used for re-splicing the training output result into a complete image with gaps between non-overlapping view field images eliminated; the display assembly is used for displaying the complete image. The technical scheme of the invention is based on the flexible image transmission and artificial intelligent image fusion technology, and the capability of carrying out high-resolution imaging on a scene with an ultra-large visual angle in single shooting is realized, and the ultra-large visual field image reconstructed by an intelligent image fusion algorithm is observed through a display screen.

Description

Large-visual-angle laryngoscope based on deep learning

Technical Field

The invention relates to the technical field of laryngoscopes, in particular to a large-viewing-angle laryngoscope based on deep learning.

Background

Examination of the throat is a major problem in the field of medical diagnostics. Because of the deep and complex structure of the throat, direct visual detection is generally difficult, and high dynamic, high spatial resolution imaging and viewing is generally required with video detection equipment. Video laryngoscopes are a widely used representative of a class of laryngeal examination devices. For the difficult tracheal intubation process, the video laryngoscope collects laryngeal images through the image sensor and displays the laryngeal images on a display screen in real time through the video transmission line. On one hand, the visual field range of human eyes in a common laryngoscope is exceeded, and on the other hand, the lens of the video laryngoscope does not need to move the laryngeal structure, so that the pain on a patient is lighter, and meanwhile, the difficulty of tracheal intubation is reduced because the lens accords with the anatomical structure of the human laryngeal, and the device has the advantages of being fast in operation, convenient to collect, high in success rate, improved in glottis exposure condition and the like.

The imaging module in the common video laryngoscope structure is generally carried out in a cooperative manner of a camera handle and a host, namely, the camera handle comprises a set of lenses and an image sensor unit for high-speed imaging of a field of view corresponding to the lenses, and a light source for illumination in a darker environment in the throat. In the method, the observation effect of the video laryngoscope mainly depends on factors such as the size of the field of view of the adopted lens and sensor, the spatial resolution, the imaging signal to noise ratio and the like, and meanwhile, the method has certain operation limitation. On the one hand, since the angle of view of the camera system is fixed, when the field of view needs to be changed or moved, the camera system needs to be mechanically moved to find a desired structure. This increases both the difficulty of operation and the time cost of imaging. On the other hand, due to the thin throat structure, the adopted lens and image sensor are generally small in size, low in spatial resolution and incapable of providing higher imaging precision for throat observation with finer dimensions. For this reason, a method of implementing endoscopic observation by using a plurality of lenses has been proposed, but a conventional image fusion method needs to reserve a part of overlapping area between adjacent fields of view, and cannot maximize the angle of the imaging field of view.

Therefore, the current common video laryngoscope technology only adopts the traditional imaging technology, and the endoscopic method realized by combining a plurality of lenses does not use the advantages of the artificial intelligent method such as deep learning and the like in the aspects of learning and generating new features, so that the method has the advantages of small volume, high real-time performance, convenient operation and the like, and meanwhile, mechanical movement is not required in single shooting, and large-scale laryngeal images with ultra-large visual angles and high spatial resolution are acquired to the maximum extent.

Disclosure of Invention

The invention aims to provide a deep learning-based large-view laryngoscope, which automatically guides partial images shot by a micro-lens sequence into a computer and performs fusion display, does not need any expensive additional equipment, and simultaneously maintains all the inherent advantages of a general video laryngoscope.

The invention aims to achieve the aim, and is specifically realized by the following technical scheme:

a deep learning-based large viewing angle laryngoscope comprising:

the micro-lens imaging assembly is used for acquiring image data; the micro-lens imaging assembly comprises at least two groups of small-caliber micro-lenses with non-overlapping visual fields;

the flexible image transmission assembly is connected with the micro-lens imaging assembly and used for transmitting image data; the flexible image transmission beam is a flexible image transmission beam with the diameter of the outer ring of the tail end smaller than 1.6mm and the number of 1-6;

the image processing module is connected with the other end of the flexible image transmission component, and is used for training the anti-neural network according to the image data obtained by the micro-lens imaging component to obtain an image fusion model, and the image fusion model is used for re-splicing the training output result into a complete image with gaps between non-overlapping view field images eliminated;

the display component is connected with the image processing module and is used for displaying the complete image re-spliced by the image processing module;

the generating antagonistic neural network comprises a generator G and a discriminator D, and the training process is as follows: s1, acquiring training data; s2, building a generator neural network and a discriminator neural network; and S3, performing countermeasure training by the generator and the discriminator to obtain an image fusion model.

Further, in step S1, the step of obtaining the training data includes:

s11, aiming at throat static samples under the same conditions, a micro-lens imaging assembly moves along a lens distribution plane, at least 10 series of images containing gaps are continuously collected, and the overlapping area of adjacent series of images is not less than 50% of the image area of Yu Shanzhang;

s12, cutting a series of images containing gaps into 256 multiplied by 256 pixel image small blocks without overlapping by taking one series of images as a reference, adopting scale-invariant feature transformation for each image small block, respectively carrying out feature matching with other images in the series of images, and acquiring an image small block area matched with each image small block;

s13, analyzing the image small block and the image small block area matched with the image small block: discarding if both contain gaps; if no gap exists, any one of the two is used as the input of the condition x for generating the antagonistic neural network, and the other is used as the true value y; otherwise, inputting the image small blocks containing the gaps as a condition x, and inputting the image small blocks without the gaps as a true value y; thus obtaining pairs of image block inputs as training data.

Further, the criterion for judging whether the small image block contains a gap is as follows: an image patch having more than 50 points with luminance values of 0 is a slit-containing image; otherwise, judging the image to be the image without the gap.

Further, in step S2, the generator G adopts a U-shaped symmetric network structure with an encoder/decoder structure, the size of each layer of feature map of the encoder portion is halved, and the size of each layer of the decoder is doubled; the number of filters per layer is doubled layer by layer in the encoder section and halved in the decoder section.

Further, the number of first layer filters of the generator G is 64, and each of the encoder and the decoder has 7 layers.

Further, in step S2, the arbiter D adopts a full convolution form, and has 4 layers in total, and the size of each layer of feature images is halved, the number is doubled, and finally, the matrix is averaged, and the obtained value represents a probability judgment on whether the image result is a true value.

Further, in the training process of step S3, for the generator G and the corresponding discriminator D, the loss function against the neural network is expressed as:

（1）

wherein the method comprises the steps ofAnd->Representing the mean value under the whole data, wherein z is the noise input of the generator G, x is the condition input of the countermeasure network, and y is the true value input of the countermeasure network training;

the loss function expression for the L1 norm error between the predicted value and the true value is:

（2）

the objective function formula of the countermeasure network training is as follows:

（3）

in the formula (3), λ is the weight of two losses, and λ is set to 100.

According to the technical scheme, based on a flexible image transmission and artificial intelligent image fusion technology, an external image sensor is connected to the tail end of a flexible image guide device to relay partial images acquired by a micro-lens sequence, conventional laryngoscopy is performed, the capability of high-resolution imaging is performed on a scene with an ultra-large visual angle in single shooting, and the ultra-large visual field image reconstructed through an intelligent image fusion algorithm is observed through a display screen, so that the detection range and imaging precision of a video laryngoscope are improved, and the method has important significance in the field of video laryngoscope diagnosis.

Advantages of the present technology also include: (1) The technology only needs to refit the traditional video laryngoscope with low cost, and keeps the advantages of convenient operation, light weight and the like;

(2) On one hand, the throat image obtained by the technology has obviously improved visual field range and spatial resolution, and on the other hand, the throat image with multiple visual angles can be synchronously recorded in different areas due to the adoption of the in-vitro image sensor with high spatial resolution and pixel number, so that the method is suitable for real-time video imaging; on the other hand, unlike the traditional image fusion, which needs to reserve the overlapping area between adjacent fields of view, the method is processed by a rapid deep learning algorithm without reserving the overlapping area, thereby further effectively improving the imaging flux of the system and effectively avoiding the sacrifice of time or space resolution;

(3) The invention can be applied to other in-vivo detection fields except laryngoscopes, and can be applied to wide clinical scenes only by adjusting the adopted micro lens sequence, the image sensor and the artificial intelligent algorithm parameters according to actual conditions.

Drawings

FIG. 1 is a schematic diagram of the overall structure of the present invention;

FIG. 2 is a flow chart of the generation of an image fusion model in the present invention;

FIG. 3 is a schematic diagram of an antagonistic neural network generated in accordance with the present invention.

In the figure, a 1-micro lens imaging assembly; 2-a flexible image transmission assembly; 3-an image processing module; 4-display assembly.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

As shown in fig. 1, the large-view laryngoscope based on deep learning of the invention comprises a micro-lens imaging component 1 for acquiring image data; the micro-lens imaging assembly comprises at least two groups of small-caliber micro-lenses with non-overlapping visual fields;

the flexible image transmission assembly 2 is connected with the micro-lens imaging assembly 1 and is used for transmitting image data; the flexible image transmission beam is a flexible image transmission beam with the diameter of the outer ring of the tail end smaller than 1.6mm and the number of 1-6;

the image processing module 3 is connected to the other end of the flexible image transmission component 2, the image processing module 3 performs training by using a generated antagonistic neural network according to the image data obtained by the micro-lens imaging component 1 to obtain an image fusion model, and the image fusion model re-splices the training output result into a complete image with gaps between non-overlapping view field images eliminated;

and the display component 4 is connected with the image processing module 3 and is used for displaying the complete image which is spliced again by the image processing module 3.

The invention expands the imaging field angle by replacing a single lens inside the lens in a conventional video laryngoscope with a micro lens group having a different orientation. And a plurality of lenses which are symmetrical front and back by taking the original direction as a symmetry axis are additionally arranged outside the direction which keeps the same orientation with the original lenses, the included angles of the lenses and the direction of the original lenses are distributed in an equi-differential array, and the field-of-view ranges of the adjacent lenses are ensured not to overlap at the same time, so that the field-of-view range is enlarged as much as possible, and the relative positions of the lenses are fixed.

As shown in FIG. 1, in a preferred embodiment, the micro-lens imaging unit 1 employs three groups of small-caliber micro-lenses, and the small-caliber lenses employ small-caliber lens modules 1/18 SEL120-015 manufactured by Sumito optical glass company (Sumita) of Japan, which are suitable for use in medical endoscopes, and have a horizontal angle of view of about 62.8. When the fields of view of adjacent lenses are not overlapped, the equivalent imaging field angle generated by the 3 adjacent lenses is at least 188.4 degrees. And (5) covering the front end of the laryngoscope lens with a corresponding transparent lens. The number and the position relation of the lenses in the step can be increased or decreased according to the actual requirement, but the size of the lenses should be selected mainly by considering that the size of the front end of the laryngoscope is not too large. One possible embodiment is that for patients with a mouth opening of more than 10mm, it is not preferable to select micro-lenses with a length exceeding 3 mm.

The arrangement of the micro lens groups can be adjusted according to practical situations, including whether the micro lens groups are overlapped, symmetry and other requirements can be adjusted. The purpose of the micro lens group is to generate a larger angle of view, and the model of the micro lens group can be adjusted according to the requirement. For example, the requirement on the quality of the lens can be reduced by adopting an image fusion algorithm with better performance; if a larger-scale lens is used, larger edge distortion may occur, and at this time, adjacent areas may be properly imaged with a certain overlap, so as to obtain better image fusion performance.

For the flexible image transmission component 2, the invention leads out the imaging result of the micro lens group to the outside of the body by adopting the bendable, light and long-service-life glass fiber bundle, thereby avoiding the difficulty of integrating a large-scale sensor to the front end of the laryngoscope. A specific embodiment corresponding to the above embodiment is that 3 flexible image transmission beams with an outer diameter of 1.2mm manufactured by the german schottky (schottky) company are used and are respectively connected with 3 micro lenses of the micro lens imaging unit 1. The symmetry of the 3 micro lenses ensures that the image transmission distances of the image transmission beams are in theoretical agreement. The 3 flexible image transmission bundles are tightly fixed and integrated into the laryngoscope to be led out of the laryngoscope, the other sides of the image transmission bundles are respectively fixed in different areas of the high-resolution sensor, and the images of the sensor are displayed on a screen in real time through matched software after subsequent processing. The number of lenses is consistent with the number of the adopted flexible image transmission beams, and the upper limit of the lenses is mainly considered that the size of the front end of the laryngoscope is not excessively large. One possible example is that for patients with a flare of more than 10mm, at most 5-6 image-transmitting bundles are used with a corresponding number of micro-lenses, since the proximal ferrule diameter of the flexible image-transmitting bundle with an outer diameter of 1.2mm is about 1.6 mm.

The image transmission device adopted by the flexible image transmission assembly 2 can be theoretically replaced by any other image transmission device such as an optical fiber panel, and the flexible image transmission assembly has the necessary properties of similar size, flexibility, easy connection with a lens, low imaging loss and the like.

In the large-view-angle image fusion part based on deep learning, the image processing method of the invention firstly places the acquired sub-images of each non-overlapping area at different positions of a blank image according to the spatial position relation of the micro lens group, so that the image theoretically comprises a plurality of gaps generated by non-overlapping images. The slit area does not belong to any sub-image, and the luminance value is 0. In this case, the elimination of the slit is completed, that is, the fusion of the images is realized. In particular, the present invention refers to the adoption of the characteristics of generating a challenge network (GAN) itself, learning and high robustness to the target signal, thus proposing a deep neural network based on a conditional GAN. The algorithm adopts the thought of supervised learning, and training of the deep neural network model is firstly carried out before execution.

In the large-view-angle image fusion part, the adopted artificial intelligent algorithm can be replaced by other neural network structures in theory, and parameters can be adjusted according to actual conditions.

As shown in fig. 2 and 3, the generating antagonistic neural network in the present invention includes a generator G and a discriminator D, and the training process is as follows: s1, acquiring training data; s2, building a generator neural network and a discriminator neural network; and S3, performing countermeasure training by the generator and the discriminator to obtain an image fusion model.

In a specific embodiment, in step S1 of generating a training process for an antagonistic neural network, the step of acquiring training data includes:

s11, aiming at throat static samples under the same conditions, the micro-lens imaging assembly 1 moves along a lens distribution plane, at least 10 series of images containing gaps are continuously collected, and the overlapping area of adjacent series of images is not less than 50% of the image area of Yu Shanzhang; the number of images shot by the 3 micro-lenses of the flexible image transmission assembly 1 is one, and as the fields of view of the adjacent lenses are not overlapped, a gap is necessarily included in one image; the overlapping area of the adjacent series of images is not smaller than 50% of the image area of Yu Shanzhang, so that when the images are matched with small blocks in image processing, enough paired training data can be fully obtained through matching; to ensure adequate training data, the series of images containing the slit is more preferably at least 20;

Preferably, the criterion for judging whether the small image block contains a gap is as follows: an image patch having more than 50 points with luminance values of 0 is a slit-containing image; otherwise, judging the image to be the image without the gap.

Preferably, in step S2, the generator G adopts a U-shaped symmetrical network structure with an encoder/decoder structure, the size of each layer of the feature map of the encoder section is halved, and the size of each layer of the decoder is doubled; the number of filters per layer is doubled layer by layer in the encoder section and halved in the decoder section.

Preferably, the number of first layer filters of generator G is 64, and the encoder and decoder each have 7 layers.

Preferably, in step S2, the arbiter D adopts a full convolution form, and there are 4 layers in total, and the size of each layer of feature map is halved, the number is doubled, and finally the matrix is averaged, and the obtained value represents a probability judgment on whether the image result is a true value.

Preferably, in the training process of step S3, for the generator G and the corresponding arbiter D, the loss function against the neural network is expressed as:

（1）

（2）

（3）

in the formula (3), λ is the weight of two losses, and λ is set to 100.

The lambda parameter serves to regulate the magnitude of both losses, set to 100 in the present invention. The deep learning procedure of the invention is implemented by Tensorflow, and an Adam optimizer (momentum of 0.5) is adopted, and the initial learning rate is set to 0.0002. For each sample, the training set size is 10000. The training model was obtained after 400 training on a single NVIDIA GeForce RTX 2080 Ti GPU.

In the embodiment, the real-time image acquired by the micro lens group of the micro lens imaging assembly 1 is divided into 256×256 pixel image small blocks without overlapping, the image small blocks are respectively input into the trained deep neural network model proposed by the method, the result G (x, z) output by the generator (G) is spliced again into a complete image, and the complete image is displayed on a screen. The execution time of the fusion algorithm is less than 1 second, and the real-time requirement is met.

The specific embodiments of the present invention are intended to be illustrative, rather than limiting, of the invention, and modifications thereof will be suggested to persons skilled in the art to which the present invention pertains without inventive contribution, as desired, after having read the present specification, but are to be protected by the patent law within the scope of the appended claims.

Claims

1. A deep learning-based large viewing angle laryngoscope comprising:

a micro-lens imaging component (1) for acquiring image data; the micro-lens imaging assembly comprises at least two groups of small-caliber micro-lenses with non-overlapping visual fields;

the flexible image transmission assembly (2) is connected with the micro-lens imaging assembly and is used for transmitting image data; the flexible image transmission beam is a flexible image transmission beam with the diameter of the outer ring of the tail end smaller than 1.6mm and the number of 1-6;

the image processing module (3) is connected with the other end of the flexible image transmission component, and is used for training by using the generated antagonistic neural network according to the image data obtained by the micro-lens imaging component so as to obtain an image fusion model, and the image fusion model is used for re-splicing the training output result into a complete image with gaps between non-overlapping view field images eliminated;

the display component (4) is connected with the image processing module and is used for displaying the complete image re-spliced by the image processing module;

the generating antagonistic neural network comprises a generator G and a discriminator D, and the training process is as follows: s1, acquiring training data; s2, building a generator neural network and a discriminator neural network; s3, performing countermeasure training by the generator and the discriminator to obtain an image fusion model;

in step S1, the training data obtaining step includes:

s11, aiming at throat static samples under the same conditions, the micro-lens imaging assembly (1) moves along a lens distribution plane, at least 10 series of images containing gaps are continuously collected, and the overlapping area of adjacent series of images is not less than 50% of the image area of Yu Shanzhang;

2. The deep learning based large viewing angle laryngoscope according to claim 1, wherein the criterion for whether the image patch contains a slit is: an image patch having more than 50 points with luminance values of 0 is a slit-containing image; otherwise, judging the image to be the image without the gap.

3. The deep learning based large viewing laryngoscope according to claim 1, wherein in step S2, the generator G adopts a U-shaped symmetric network structure with encoder/decoder structure, the size of each layer of feature map of the encoder section is halved, and each layer of the decoder is doubled in size; the number of filters per layer is doubled layer by layer in the encoder section and halved in the decoder section.

4. A deep learning based large view laryngoscope according to claim 3 wherein the number of first layer filters of generator G is 64 and the encoder and decoder each have 7 layers.

5. The deep learning-based large viewing angle laryngoscope according to claim 1, wherein in step S2, the discriminator D adopts a full convolution form, and there are 4 layers, each layer of feature images is halved in size and doubled in number, and finally the matrix is averaged, and the obtained value represents a probability determination as to whether the image result is true.

6. The deep learning based large viewing laryngoscope according to claim 1, wherein during the training of step S3, for generator G and corresponding discriminator D, the loss function against neural network is expressed as:

（1）

（2）

（3）

in the formula (3), λ is the weight of two losses, and λ is set to 100.