CN118279734A

CN118279734A - Underwater particulate matter and biological image in-situ acquisition method, medium and system

Info

Publication number: CN118279734A
Application number: CN202410421143.3A
Authority: CN
Inventors: 张国豪; 谭畅; 符巧生; 李庆林
Original assignee: Qingdao Daowan Technology Co ltd
Current assignee: Qingdao Daowan Technology Co ltd
Priority date: 2024-04-09
Filing date: 2024-04-09
Publication date: 2024-07-02
Anticipated expiration: 2044-04-09
Also published as: CN118279734B

Abstract

The invention provides an underwater particulate matter and biological image in-situ acquisition method, medium and system, which belong to the technical field of underwater particulate matter and biological image in-situ acquisition and comprise the following steps: the underwater terminal is used for identifying and extracting a two-dimensional data table representing an image based on the image shot by the underwater camera, uploading the two-dimensional data table to an on-water server, and the on-water module is arranged in the on-water server and used for restoring the two-dimensional data table into the image. And the underwater module detects each particle or organism by using a pre-trained particle or organism tracking model and a particle or organism full pose recognition model. The invention solves the technical problems that the object movement and the pose deformation are serious, so that the object is difficult to identify and the communication is not smooth and the image is difficult to transmit when underwater particles and biological images are acquired in situ in the prior art.

Description

Underwater particulate matter and biological image in-situ acquisition method, medium and system

Technical Field

The invention belongs to the technical field of underwater particulate matter and biological image in-situ collection, and particularly relates to an underwater particulate matter and biological image in-situ collection method, medium and system.

Background

The underwater environment has the characteristics of complexity and variability, and the underwater detection and observation bring great challenges. Due to the optical characteristics of water absorption, scattering and the like, the problems of low contrast, color distortion, limited visual field and the like of the underwater image are common, and the problems bring serious test to the detection, identification and tracking of the target. Conventional underwater detection usually adopts acoustic or electromagnetic means, but the methods are limited in resolution, penetration depth, data dimension and the like, and the requirements of fine observation of particles and organisms are difficult to meet. With the continuous penetration of the fields of ocean resource development, ecological environment protection and the like, the real-time and accurate monitoring demands on underwater particulate matters and organisms are more and more urgent.

Currently, in the field of in-situ observation of underwater particulate matters and organisms, vision acquisition technology is increasingly receiving attention. By arranging the high-resolution camera and the powerful light source under water, the image or video data of the underwater scene can be acquired in real time, and a foundation is laid for subsequent target detection, tracking and analysis. However, due to the complex variability of the underwater environment, the capability of purely relying on manual visual judgment and manual labeling is limited, objects are easy to miss or misjudge, and the requirements of fine and large-scale observation are difficult to meet.

To solve this problem, researchers have applied artificial intelligence techniques such as computer vision and deep learning to the field of underwater target detection and recognition. By training the deep neural network model, the particulate matters and biological targets can be automatically detected and segmented from the complex underwater images, and accurate target tracking and identification are realized. However, the existing deep learning method faces a plurality of challenges in underwater scenes:

1. Target movement and pose deformation are serious: under the influence of water flow, underwater particles and biological targets tend to move or present complex pose states, and more difficulties are brought to target detection and identification.

2. Difficult transmission of images with poor communication: the data transmission between underwater and above water is limited by bandwidth and reliability, which presents challenges for real-time presentation.

Disclosure of Invention

In view of the above, the invention provides an in-situ collection method, medium and system for underwater particles and biological images, which can solve the technical problems that in the prior art, when underwater particles and biological images are collected in situ, the movement and the pose deformation of a target are serious, so that the recognition is difficult, and the communication is not smooth, so that the images are difficult to transmit.

The invention is realized in the following way:

The invention provides an underwater particulate matter and biological image in-situ collection method, which comprises an underwater module and an on-water module, wherein the underwater module is arranged at an underwater terminal and is used for identifying and extracting a two-dimensional data table representing an image based on an image shot by an underwater camera, the two-dimensional data table is uploaded to an on-water server, and the on-water module is arranged in the on-water server and is used for restoring the two-dimensional data table into an image.

Wherein, the underwater module is used for executing the following steps:

S11, acquiring a plurality of continuous images shot underwater, and recording the continuous images as a first image set;

s12, preprocessing the first image set to obtain a second image set;

S13, performing model calculation on each image in the second image set by utilizing a pre-trained particle or organism tracking model to obtain labeling information of the particle and organism corresponding to each image, wherein the labeling information is used for uniquely coding and marking the particle or organism in the image;

S14, dividing each image in the second image set based on the marking information to obtain a plurality of divided images, and dividing the plurality of divided images into a plurality of divided image sets according to the marking information, wherein the divided images in each divided image set have the same marking information;

s15, recognizing each segmented image in the segmented image set by utilizing a pre-trained particulate matter or biological full pose recognition model to obtain a recognition vector of each segmented image, wherein the recognition vector comprises the type, the size and the recognition rate of the recognized particulate matter or biological, and the recognition vector with the highest recognition rate of the segmented image set is used as a recognition result of the segmented image set;

S16, for each image in the second image set, establishing a two-dimensional data table based on the identification result of the segmented image set, wherein the two-dimensional data table comprises coordinates and the identification result of each segmented image; and transmitting the two-dimensional data table to an on-water module.

Further, the underwater module is configured to perform the following steps:

s21, acquiring the two-dimensional data table;

S22, establishing an empty image, wherein the size of the empty image is the size of an image shot by the underwater camera;

S23, dividing the empty image based on the two-dimensional data table, and replacing particles or biological images corresponding to the identification result in the two-dimensional data table in each division area to generate an image containing the particles or the biological.

Further, the particulate matter or biological tracking model includes:

The characteristic extraction network is used for extracting characteristics of the input image group by using the convolutional neural network to obtain multi-scale and multi-level characteristic mapping;

The target detection head predicts the classification and the position of the target object by using a target detection algorithm based on the feature mapping and outputs a candidate target frame;

An instance segmentation head for generating a segmentation mask of the target object by an instance segmentation algorithm for each candidate target frame;

and the target association module associates and marks detection results belonging to the same target object in different frames of the input image group through a target tracking algorithm, so as to realize cross-frame tracking.

Further, the training step of the particulate matter or biological tracking model includes:

Preprocessing and data enhancement are carried out on a training data set, wherein the training data set is specifically an image set obtained by manually labeling particles or organisms on a large number of real images collected in different water areas in the field;

Dividing the training data set into a training set, a verification set and a test set;

Defining a loss function comprising a part of classification loss, regression loss, segmentation mask loss and the like;

Inputting training data into a model in a small batch mode for forward calculation and backward propagation, and updating model parameters;

Evaluating the performance of the model on the verification set after each training period is completed and adjusting the super-parameters; the optimal model is evaluated and selected on the test set.

Further, the feature extraction network of the particulate matter or the biological tracking model adopts ResNet, inception or EFFICIENTNET.

Further, the feature extraction network of the particulate matter or biological full pose recognition model adopts the same feature extraction network as the particulate matter or biological tracking model.

Further, the training step of the particulate matter or biological full pose recognition model comprises the following steps:

Acquiring a full pose training data set, namely acquiring known all particles or biological images, carrying out three-dimensional reconstruction on the particles or biological images by adopting a computer three-dimensional model to obtain three-dimensional images of the particles or the biological, respectively guiding out the images of the full pose of the particles or the biological in a mode of rotating on three-dimensional XYZ axes, guiding out one image every 1 DEG of rotation, and taking all the guiding-out images and specific particle or biological names thereof as the full pose training data set;

and training a convolutional neural network by adopting the full pose training data set to obtain a particulate matter or biological full pose recognition model.

A second aspect of the present invention provides a computer readable storage medium having stored therein program instructions that, when executed, are configured to perform an underwater particulate matter and biological image in situ collection method as described above.

A third aspect of the present invention provides an underwater particulate matter and biological image in situ collection system comprising the computer readable storage medium described above.

Compared with the prior art, the underwater particulate matter and biological image in-situ acquisition method, medium and system provided by the invention have the beneficial effects that:

1) The method adopts the two-stage model for recognition, comprises a pre-trained tracking model and a full-pose recognition model, firstly tracks specific particles and organisms, and then carries out full-pose recognition on tracked images, so that the problems that underwater particles and biological targets tend to move or present complex pose states under the influence of water flow are effectively solved, the targets are more difficult to detect and recognize, the method has strong generalization capability, and can cope with various challenges of various underwater scenes.

2) Compared with the original image data, the two-dimensional data table has smaller data volume and more compact data structure, so that the data transmission pressure between underwater and water is reduced, and the instantaneity and reliability are improved. And accurately reconstructing a target image based on the coordinate, size and gesture information, and restoring the real state of the underwater target.

In summary, the technical problems that in the prior art, when underwater particles and biological images are acquired in situ, the movement and the pose deformation of a target are serious, so that the recognition is difficult, and the communication is not smooth and the images are difficult to transmit are solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the steps performed by an underwater module of the method of the present invention;

FIG. 2 is a flow chart of the steps performed by the water module of the method of the present invention;

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

As shown in fig. 1, the invention provides a flowchart of an in-situ collection method of underwater particulate matters and biological images, which comprises the following steps: the underwater terminal comprises an underwater module and an on-water module, wherein the underwater module is arranged at the underwater terminal and is used for identifying and extracting a two-dimensional data table representing an image based on the image shot by an underwater camera, uploading the two-dimensional data table to an on-water server, and the on-water module is arranged in the on-water server and is used for restoring the image based on the two-dimensional data table.

Wherein, the underwater module is used for executing the following steps:

s12, preprocessing the first image set to obtain a second image set;

Further, the underwater module is configured to perform the following steps:

s21, acquiring the two-dimensional data table;

The following describes in detail the specific embodiments of the above steps:

Step S11: acquiring a plurality of continuous images photographed underwater, and recording the continuous images as a first image set

The purpose of this step is to obtain a continuous sequence of images from the underwater camera as input data for subsequent processing. The specific implementation mode is as follows:

1) And configuring an underwater camera, and setting proper resolution, frame rate and exposure parameters so as to adapt to an underwater shooting environment.

2) And starting the underwater camera, and continuously acquiring an image sequence of the underwater scene. The acquisition time may depend on the requirements of the application scenario, and it is often necessary to ensure that a sequence of images of sufficient length is acquired to cover the whole region of interest.

3) The acquired image sequence is stored as a first image set. An appropriate image format (e.g., JPEG, PNG, etc.) may be selected for storage for subsequent reading and processing.

4) Metadata information of the first image set, such as acquisition time, location, camera parameters, etc., is recorded for later use.

Step S12: preprocessing the first image set to obtain a second image set

The aim of this step is to pre-process the originally acquired image to improve the efficiency and accuracy of the subsequent processing. The specific implementation mode is as follows:

1) Denoising an image: due to the complexity and uncertainty of the underwater environment, the acquired images may have varying degrees of noise and interference. The image may be denoised using classical image denoising algorithms (e.g., gaussian filtering, median filtering, etc.) or deep learning based denoising networks (e.g., FFDNet, CBDNet, etc.).

2) Image enhancement: underwater images often suffer from low contrast, color distortion, etc. The image can be contrast enhanced and color corrected using conventional image enhancement methods such as histogram equalization, retinex algorithm, CLAHE, etc., or image enhancement networks based on deep learning (e.g., UWCNN, UWAN, etc.).

3) Image clipping: if the original image has a large number of background areas or areas not of interest, the image can be adaptively cut by adopting a method based on depth estimation or semantic segmentation, the areas of interest are reserved, and the efficiency of subsequent processing is improved.

4) Image size adjustment: and uniformly scaling the preprocessed images to a specified resolution to adapt to the input requirements of the subsequent model. Care is taken to maintain the aspect ratio during the scaling process to avoid distortion.

5) Data enhancement: and performing data enhancement operations on the preprocessed image, such as random clipping, overturning, rotating, color dithering and the like, so as to increase the diversity of training data and improve the generalization capability of the model.

6) And storing the preprocessed image sequence as a second image set as input data of a subsequent step.

It should be noted that the parameter settings of the preprocessing operations (e.g., filter kernel size, enhancement strength, etc.) need to be adjusted and optimized according to the specific data set and application scenario. In addition, the order of the pretreatment steps may be adjusted according to the actual requirements.

Step S13: performing model calculation on each image in the second image set by using a pre-trained particulate matter or biological tracking model to obtain labeling information of particulate matters and organisms corresponding to each image

The aim of the step is to use the trained particle or biological tracking model to perform reasoning calculation on each image in the second image set, so as to obtain detection, segmentation and tracking information of the particle or biological target in each image. The specific implementation mode is as follows:

1) Loading a pre-trained particulate matter or biological tracking model. The model should be capable of target detection, instance segmentation, and cross-frame target correlation.

2) And sequentially inputting each image in the second image set into the model for forward reasoning calculation.

3) The model firstly utilizes a target detection head to process an input image, generates a series of candidate target frames, and predicts the object category of each candidate frame.

4) For each candidate target frame, the model's instance segmentation head will generate a corresponding segmentation mask describing the fine wheel of the target object

In the whole underwater particulate matter and biological image in-situ acquisition method, the establishment and training of a particulate matter or biological tracking model are key links. The model is used for carrying out model calculation on each image in the second image set by utilizing a pre-trained model, so as to obtain the labeling information of the particulate matters and the living beings corresponding to each image. The labeling information is used for uniquely coding and marking particles or organisms in the images, and provides a basis for image segmentation and identification in subsequent steps.

The process of building and training a particulate matter or biological tracking model mainly involves the following aspects:

Acquisition of training data set

Acquisition of the training dataset is the basis for model training. Aiming at the particulate matter or biological tracking model, the training data set should cover various underwater environments, different types of particulate matter and organisms, different illumination conditions and the like so as to ensure that the model has better universality and robustness.

Specifically, the training data set may be obtained by the following ways:

1. and (3) field collection: the method is characterized by organizing special underwater sampling operation and collecting video or image data in different water areas by using diving equipment or a remote-controlled diving device and the like. This approach enables the acquisition of the truest underwater scene data, but at a higher cost and for a longer period of time.

2. Synthetic data: by utilizing computer graphics and image synthesis technology, different types of particulate matters or biological models are superimposed on the existing real underwater background image to generate a large amount of synthetic image data rich in diversity. This approach is flexible to operate, but there is a gap between the synthesized data and the real data.

3. And (3) Internet collection: and collecting underwater image or video resources from the Internet, and forming a training data set after manual screening and labeling. This approach has some time efficiency, but the diversity and quality of the data may be affected.

4. A combination of the above: in general, the three modes are combined, so that training data sets with sufficient quantity, rich variety and higher quality can be obtained.

In addition to the original video or image data, the training dataset needs to include manually annotated tag information. For particulate matter or biological tracking models, the tag information mainly includes:

1) Bounding box: the positions for locating the target objects (particles or organisms) in the image are usually marked in the form of rectangular boxes.

2) Segmentation mask: the actual outline of the target object is subjected to pixel-level fine labeling, so that the shape and the boundary of the target object can be described more accurately.

3) Object class: the specific type of target object, such as different types of particulate matter or organisms, is noted.

4) Object number: if there are multiple target objects of the same category in one image, each object needs to be assigned a unique number to achieve tracking of the same target object in different frames.

Labeling is typically done by a trained human labeling personnel, or by means of semi-automated labeling tools. The quality of the labeling directly influences the training effect of the model.

(II) model Structure

The structure of a particle or biological tracking model typically employs a design that combines target detection and example segmentation. Mainly comprises the following parts:

1. Feature extraction network: and extracting the characteristics of the input image by using a convolutional neural network to obtain multi-scale and multi-layer characteristic mapping. Common feature extraction networks include ResNet, inception, efficientNet, and the like.

2. Target detection head: based on the feature map, a target detection algorithm (such as Faster R-CNN, YOLO and the like) is utilized to predict the classification and the position of a target object, and a series of candidate target frames are output.

3. Example split header: for each candidate target frame, a fine segmentation Mask of the target object is further generated using an instance segmentation algorithm (e.g., mask R-CNN, etc.).

4. A target association module: and (3) associating and identifying detection results belonging to the same target object in different frames through a target tracking algorithm (such as DeepSort, byteTrack and the like), so as to realize cross-frame target tracking and unique coding.

5. And a post-processing module: and further filtering, denoising and optimizing the result output by the model, thereby improving the accuracy and stability of detection and segmentation.

The modules may be combined and designed in series or in parallel to form an end-to-end particulate matter or biological tracking model. The specific structure and parameter setting of the model need to be adjusted and optimized according to the characteristics of the training data set, the difficulty level of target detection and segmentation and other factors.

(III) model training

After a sufficient training data set is obtained, training of the particulate matter or biological tracking model may begin. The model training aims at enabling the model to accurately detect and divide particles or biological targets in an image under the condition of given input images through continuous iterative optimization, and unique coding identification is assigned to each target.

The main steps of model training are as follows:

1. Data preprocessing: the training data set is subjected to necessary preprocessing, including operations such as image clipping, scaling, data enhancement (such as rotation, flipping, noise addition and the like), so as to increase the diversity of the data and improve the generalization capability of the model.

2. Dividing the data set: the preprocessed data set is divided into a training set, a validation set and a test set. The training set is used for optimizing model parameters, the verification set is used for adjusting model super parameters, and the test set is used for evaluating the final performance of the model. Typically the training set is the largest, and the validation set and the test set each account for a small fraction.

3. Defining a loss function: a suitable loss function is designed based on the difference between the model's output and ground truth tags. The penalty function typically includes multiple parts of classification penalty, bounding box regression penalty, segmentation mask penalty, etc., which need to be weighted summed.

4. Selecting an optimizer: and selecting a proper optimization algorithm and a learning rate strategy for continuously adjusting model parameters in the training process and minimizing the value of the loss function. Common optimizers include SGD, adam, RMSProp, etc.

5. Training and iterating: and (3) inputting training data into the model batch by batch in a small batch (Mini-batch) mode to perform forward calculation and backward propagation, and updating model parameters. After each iteration period (Epoch) is completed, the performance of the model is evaluated on the verification set, and the super-parameters or the optimization strategy is adjusted according to the evaluation index (such as average precision mAP).

6. Model evaluation: during the training process, the performance of the model may be periodically evaluated on the test set to see if the model is over-fitted or under-fitted. The model with the best performance on the test set is finally selected as the final particulate matter or biological tracking model.

7. Model deployment: and (3) performing model compression, quantization and other optimization on the trained model, and deploying the model on a target hardware platform (such as embedded equipment, a cloud server and the like) for actual particulate matter or biological image acquisition application.

In the model training process, the following key points need to be noted:

1) Quality of the dataset: the high quality, diverse training data set is the basis for the model to achieve good performance. Training data can be expanded and enriched by means of data enhancement, semi-supervised learning, etc.

2) Design of a loss function: reasonable loss function design is critical to model convergence and performance. The losses of the different parts need to be weighted according to the characteristics of the specific task.

3) Positive and negative sample balance: since the target object is not contained in most images, there is a problem of unbalance of the positive and negative samples. This problem can be alleviated by techniques such as difficult-to-case mining, online difficult-to-case mining, etc.

4) Super parameter tuning: the setting of super parameters such as learning rate, batch size, regularization strength and the like has great influence on the performance of the model, and the model needs to be optimized by grid search, random search and other methods.

5) Hardware acceleration: the speed of model training can be greatly improved by using a GPU or a special accelerator (such as TPU, FPGA and the like).

6) Migration learning: with sufficient annotation data, the migration learning can be based on models pre-trained on large public data sets (e.g., COCO, imageNet, etc.) to achieve faster convergence speeds and better performance.

The following is a specific implementation of the establishment and training of the full pose recognition model:

In the underwater particulate matter and biological image in-situ acquisition method, the full pose identification of the particulate matter or the living beings is a key link. The aim of the step is to identify each segmented image in the segmented image set by utilizing a pre-trained particulate matter or biological full pose identification model, so as to obtain an identification vector of each segmented image, wherein the identification vector comprises the type, the size and the identification rate of the identified particulate matter or biological. This information will provide basis for subsequent steps to ensure that the actual morphology and status of the particulate matter or organisms are revealed in the final generated image.

The establishment and training process of the full pose recognition model mainly relates to the following aspects:

Acquisition of training data set

The training dataset is the basis for model training, whose quality and diversity directly affect the performance of the model. For a particulate matter or biological full pose recognition model, the following aspects need to be considered for acquiring a training data set:

(1) Diversity of kinds of target objects

The training dataset should contain various types of particulates and organisms, such as different kinds of sand, silt, algae, plankton, shells, etc., to ensure that the model has a broad recognition capability.

(2) Morphological diversity of target objects

The same type of particulate matter or organism may have a certain variation in its morphology due to individual differences or different growth environments. Therefore, the training data set needs to contain various forms of the same kind of target object under different visual angles and different postures so as to enhance the robustness of the model to deformation.

(3) Diversity of background environments

The complexity of the underwater environment results in variability of the background, such as color, turbidity, lighting conditions, etc., of different bodies of water. The training dataset needs to cover various background environments to improve the generalization ability of the model.

(4) Diversity of image quality

The image quality in the training dataset should have some variability, including different resolution, noise level, contrast, etc., to enhance the model's adaptability to changes in image quality.

The acquisition of training data sets meeting the diversity requirements described above may take several forms:

a) And (3) field collection: real image and video data are collected in different water areas by using diving equipment or a remote-control submersible, and marked by professionals. This approach can acquire the truest data, but is costly and periodic.

B) And (3) data synthesis: based on computer graphics and three-dimensional modeling technology, synthesizing particles or biological models of different types, forms and postures on a real background image to generate a large amount of high-quality synthesized image data.

C) And (3) Internet collection: and collecting underwater images and video resources from the Internet, and forming a training data set after manual screening and labeling.

D) A combination of the above: in general, the three modes are combined, so that training data sets with sufficient quantity, rich variety and higher quality can be obtained.

In addition to the original image data, the training dataset needs to include detailed annotation information, mainly comprising:

(1) Fine segmentation mask of target object: the actual outline of the target object is subjected to pixel-level fine labeling, and the precise shape and boundaries of the target object are described.

(2) 3D model parameters of the target object: parameters such as the type, the size, the gesture (position, direction and the like) and the like of the target object are included and are used for representing the full pose state of the target object.

(3) Semantic attributes of the target object: such as the particle state (circular, angular, etc.), the active state (motion, rest, etc.) of the living being, etc.

(II) model Structure

The structure of the model for recognizing the full pose of the particulate matter or the organism usually adopts an end-to-end design based on deep learning, and the functions of feature extraction, pose estimation, attribute prediction and the like are integrated in a unified network frame. The model mainly comprises the following parts:

(1) Feature extraction network

And performing feature extraction on the input image by using a Convolutional Neural Network (CNN) to obtain multi-scale and multi-level feature mapping. Common feature extraction networks include ResNet, inception, efficientNet, and the like.

(2) Pose estimation head

Based on the feature mapping, 3D pose parameters of the target object, including size, position, direction and the like, are predicted by using a regression algorithm. Common pose estimation algorithms include poisenet, DOPE, etc.

(3) Attribute pre-measuring head

Also based on the feature map, semantic attributes of the target object, such as category, shape, activity status, etc., are predicted using a classification algorithm.

(4) Loss function

And comparing the pose estimation and attribute prediction output with ground truth labels, and calculating corresponding regression loss and classification loss to serve as an objective function of model optimization.

(5) Post-processing module

And further filtering, optimizing and fusing the result output by the model, and improving the accuracy and stability of pose estimation and attribute prediction.

The modules can be combined and designed in a cascading or parallel mode to form an end-to-end particulate matter or biological full pose recognition model. The specific structure and parameter setting of the model need to be adjusted and optimized according to the characteristics of the training data set, the complexity of the task and other factors.

(III) model training

After a sufficient training data set is obtained, training of the particulate matter or biological full pose recognition model can be started. The model training aims at accurately predicting 3D pose parameters and semantic attributes of a target object in an image under the condition of given input images through continuous iterative optimization.

The main steps of model training are as follows:

(1) Data preprocessing

The training data set is subjected to necessary preprocessing, including operations such as image clipping, scaling, data enhancement (such as rotation, overturning, color dithering and the like), so as to increase the diversity of data and improve the generalization capability of the model.

(2) Partitioning data sets

The preprocessed data set is divided into a training set, a validation set and a test set. The training set is used for optimizing model parameters, the verification set is used for adjusting model super parameters, and the test set is used for evaluating the final performance of the model.

(3) Defining a loss function

And designing a proper loss function according to the difference between the model output (pose estimation result and attribute prediction result) and ground truth labels. The loss function typically includes multiple parts of pose regression loss, attribute classification loss, etc., which need to be weighted and summed.

(4) Selection optimizer

And selecting a proper optimization algorithm and a learning rate strategy for continuously adjusting model parameters in the training process and minimizing the value of the loss function. Common optimizers include SGD, adam, RMSProp, etc.

(5) Training iterations

And (3) inputting training data into the model batch by batch in a small batch (Mini-batch) mode to perform forward calculation and backward propagation, and updating model parameters. After each iteration period (Epoch) is completed, the performance of the model is evaluated on the verification set, and the super-parameters or optimization strategies are adjusted according to the evaluation indexes (such as average precision AP, average attitude error APE and the like).

(6) Model evaluation

During the training process, the performance of the model may be periodically evaluated on the test set to see if the model is over-fitted or under-fitted. And finally, selecting the model with the optimal performance on the test set as a final full pose recognition model.

(7) Model deployment

And (3) carrying out model compression, quantization and other optimization on the trained model, and deploying the model on a target hardware platform for actual particulate matter or biological image acquisition application.

In the model training process, the following key points need to be noted:

a) Design of a loss function: reasonable loss function design is critical to model convergence and performance. The pose regression loss and the attribute classification loss are weighted and balanced according to the characteristics of specific tasks.

B) Positive and negative sample balance: since the target object is not contained in most images, there is a problem of unbalance of the positive and negative samples. This problem can be alleviated by techniques such as difficult-to-case mining, online difficult-to-case mining, etc.

C) Super parameter tuning: the setting of super parameters such as learning rate, batch size, regularization strength and the like has great influence on the performance of the model, and the model needs to be optimized by grid search, random search and other methods.

D) Hardware acceleration: the speed of model training can be greatly improved by using a GPU or a special accelerator.

E) Migration learning: with sufficient annotation data, the migration learning can be performed based on models pre-trained on large public data sets to achieve faster convergence speed and better performance.

Detailed description of step S15

Step S15: and identifying each segmented image in the segmented image set by utilizing a pre-trained particulate matter or biological full pose identification model to obtain an identification vector of each segmented image, wherein the identification vector comprises the type, the size and the identification rate of the identified particulate matter or biological, and the identification vector with the highest identification rate is used as the identification result of the segmented image set.

The aim of the step is to utilize a trained full pose recognition model to perform inference calculation on each segmented image in the segmented image set obtained in the step S14, obtain the type, size and recognition confidence of the particulate matters or organisms corresponding to each image, and select the result with the highest confidence as the final recognition result of the segmented image set.

The specific implementation mode is as follows:

1. loading a pre-trained particulate matter or biological full pose recognition model.

2. Traversing the segmented image set obtained in the step S14, and executing the following operations on each segmented image:

(1) And inputting the current segmentation image into the full pose recognition model for forward reasoning calculation.

(2) The model firstly utilizes a feature extraction network to extract features of an input image to obtain multi-scale feature mapping.

(3) Based on the feature map, the pose estimation head of the model predicts the 3D pose parameters of the target object, including dimensions (sizes), positions, directions, etc.

(4) Meanwhile, the attribute predicting head of the model predicts semantic attributes of the target object, such as types (specific types of particulate matters or living things), shapes and the like.

(5) And fusing the pose estimation result and the attribute prediction result to form an identification vector of the current segmented image, wherein the identification vector comprises types, sizes and identification confidence (which can be obtained by weighted summation of pose regression loss and attribute classification loss).

(6) For the same segmented image set, the model will repeatedly perform the above operation on each of the segmented images, resulting in a plurality of recognition vectors.

3. And comparing confidence values of all the recognition vectors in the same segmented image set, and selecting the one with the highest confidence as the final recognition result of the segmented image set.

4. And (5) repeatedly executing the process on all the divided image sets obtained in the step (S14) to obtain the optimal recognition result corresponding to each divided image set.

In carrying out this step, attention is paid to the following key points:

(1) Calculation efficiency in model reasoning process: since a large number of segmented images need to be identified, the speed of reasoning of the model has an important impact on the real-time of the whole process. The reasoning efficiency can be improved through optimization means such as model compression and quantization.

(2) Reliability evaluation of the identification result: the confidence value output by the model reflects the credibility of the recognition result, and the setting of the confidence threshold value directly influences the accuracy and stability of recognition. Generally, a reasonable confidence threshold range, such as 0.6-0.9, can be set according to the specific application scenario and model performance.

(3) Consistency processing of the same kind of target object: for multiple target objects of the same kind within the same segmented image set, the recognition results of the model should remain consistent, i.e. the size and attribute information should be close. If a large discrepancy occurs, it is necessary to make the optimization and adjustment by the post-processing module.

Detailed description of step S16

Step S16: for each image in the second image set, establishing a two-dimensional data table based on the identification result of the segmented image set, wherein the two-dimensional data table comprises coordinates and the identification result of each segmented image; and sends the two-dimensional data table to the water module.

The purpose of this step is to correlate the recognition result of each segmented image set obtained in step S15 with the position information in the original image, construct a two-dimensional data table, and provide necessary data support for the image reconstruction of the subsequent water module.

The specific implementation mode is as follows:

1. initializing an empty two-dimensional data table;

2. traversing each image in the second image set, and executing the following operations:

(1) Index number or time stamp information of the current image in the whole image sequence is acquired and used for uniquely identifying the image.

(2) Based on the result of step S14, information of all the divided image sets corresponding to the image is acquired.

(3) The following is performed for each segmented image set:

a) The best recognition result of the segmented image set is obtained from the result of step S15, including information such as category, size and confidence.

B) The coordinate range of the segmented image set in the original image is calculated, and the boundary coordinates of the circumscribed rectangular frame or the fine segmentation mask of the segmented image can be taken.

C) The coordinate range of the divided image set and the recognition result are added to a two-dimensional data table as one line of data.

3. After traversing all the images in the second image set, the construction of the two-dimensional data table is completed.

4. The two-dimensional data table is encoded and compressed as necessary to reduce the amount of data transmission.

5. And transmitting the coded two-dimensional data table to the water module in a wireless transmission or wired connection mode.

In carrying out this step, attention is paid to the following key points:

(1) Unification of a coordinate system: when calculating the coordinate range of the segmented image set, a coordinate system consistent with the water module needs to be ensured to be adopted, so that subsequent coordinate conversion errors are avoided.

(2) Data encoding and compression: the reasonable data coding and compression mode can greatly reduce the data transmission quantity and improve the transmission efficiency. Common coding modes include run length coding, entropy coding, etc., and compression algorithms include entropy coding algorithms (e.g., huffman coding, arithmetic coding, etc.), dictionary coding algorithms (e.g., LZW algorithm), etc.

(3) Reliability of data transmission: the data transmission environment between underwater and water is bad, and error detection and retransmission mechanisms are needed to be adopted, so that the integrity and the correctness of data are ensured.

(4) Real-time requirements: depending on the specific application scenario, the real-time requirements for data transmission are also different. If real-time display is required, optimization is required in each link of image acquisition, processing and transmission, and delay time is shortened.

Detailed description of step S21

The purpose of this step is to receive and obtain the two-dimensional data table sent in step S16 at the water module end, and provide necessary input data for the subsequent image reconstruction process.

The specific implementation mode is as follows:

1. on the hardware platform (such as server, industrial computer, etc.) of the above-water module, a proper wired or wireless data receiving module is configured.

2. And according to the data transmission protocol and the coding mode adopted in the step S16, the corresponding configuration and initialization of the receiving module are carried out.

3. Starting the receiving module, and continuously monitoring the data transmission from the underwater module.

4. Upon receipt of the data packet, the data packet is immediately checked for integrity and correctness. If the check is passed, the data packet is decoded and decompressed to restore the original two-dimensional data table.

5. And carrying out necessary pretreatment, such as coordinate conversion, data format conversion and the like, on the restored two-dimensional data table so as to adapt to the subsequent image reconstruction process.

6. And storing the preprocessed two-dimensional data table into a storage medium of the water module to be used as input data for image reconstruction.

In carrying out this step, attention is paid to the following key points:

(1) Configuration of the data receiving module: according to the practical application scene, a proper wired or wireless data receiving module is selected, and correct parameter configuration is carried out on the data receiving module, so that stable data transmission from underwater can be received.

(2) Data integrity and correctness checking: due to the complexity of the underwater data transmission environment, the received data may have the problems of packet loss, error code and the like. Therefore, it is necessary to implement integrity check (such as checksum, cyclic redundancy check, etc.) and correctness check (such as adding a data packet sequence number, etc.) at the data receiving end to ensure the reliability of the data.

(3) Data decoding and decompression: and (3) according to the specific coding and compression algorithm adopted in the step S16, corresponding decoding and decompression operations are realized at the water module end, and the original two-dimensional data table is restored.

(4) Data preprocessing: because the underwater and above-water modules may employ different coordinate systems and data formats, the received two-dimensional data sheet needs to be transformed and reconstructed as necessary to meet the requirements of subsequent image reconstruction.

Specific embodiment of step S22: and establishing an empty image, wherein the size of the empty image is the size of an image shot by the underwater camera.

The purpose of this step is to create a blank image at the water module end as canvas for the subsequent image reconstruction. The size of the blank image needs to be consistent with the original image acquired by the underwater camera to ensure that the reconstructed image can be matched with the original scene.

The specific implementation mode is as follows:

1. And acquiring image resolution parameters of the underwater camera, including width and height. These parameters may be obtained from the metadata information recorded in step S11 or transmitted by the underwater module together at the time of data transmission.

2. And distributing a continuous space in the internal memory of the water module according to the acquired resolution parameters, and storing the pixel data of the blank image.

3. The space is initialized and all pixel values are set to the background color (typically black or transparent).

4. Other metadata information such as image format, color space, bit depth, etc. may be added to the blank image if desired.

5. And storing the initialized blank image into a storage medium of the water module to serve as a starting point of subsequent image reconstruction.

In carrying out this step, attention is paid to the following key points:

(1) Acquisition of resolution parameters: the accurate acquisition of the resolution parameters of the underwater camera is a precondition for creating a blank image of the correct size. If the parameters are obtained incorrectly, the subsequent reconstructed image will not match the original scene.

(2) Memory allocation and initialization: and calculating the size of the required memory space according to the resolution parameters, and performing efficient memory allocation and initialization operation to ensure the integrity and consistency of the image data.

(3) Addition of metadata information: different image formats and color spaces correspond to different metadata information, and proper setting of the information facilitates display and processing of subsequent images.

(4) Selection of a storage medium: the storage medium of the blank image needs to have enough storage space and reading and writing speed to meet the requirement of the subsequent image reconstruction. The storage modes such as a high-speed hard disk, a solid state hard disk or a memory mapping file can be selected.

6. Specific embodiment of step S23: dividing the empty image based on a two-dimensional data table, and replacing the empty image with the particle or organism image corresponding to the identification result in the two-dimensional data table in each divided area to generate an image containing the particle or organism.

The purpose of this step is to perform the segmentation and pixel replacement operations on the blank image created in step S22 based on the two-dimensional data table acquired in step S21, and "inject" the recognition result of the particulate matter or the living being into the blank image, and finally generate a reconstructed image containing the particulate matter or the living being.

The specific implementation mode is as follows:

1. The blank image created in step S22 and the two-dimensional data table acquired in step S21 are loaded.

2. Traversing each row of data in the two-dimensional data table, and executing the following operations:

(1) Coordinate range information of the divided image is acquired from the current line data.

(2) And dividing the blank image according to the coordinate range to obtain a rectangular dividing region.

(3) And acquiring the recognition result of the segmented image from the current line data, wherein the recognition result comprises information such as category, size and confidence.

(4) And searching a matched target image template from a pre-established particulate matter or biological image library based on the identification result. The image library can be generated based on real data or three-dimensional modeling, and contains particulate matters or biological images of different types, sizes and attitudes.

(5) And carrying out proper scaling, rotation and translation transformation on the searched target image template to enable the size and the gesture of the target image template to be matched with the recognition result.

(6) And rendering the transformed target image template into the current partitioned area of the blank image, and replacing pixel values in the area.

3. After traversing all the data lines in the two-dimensional data table, all the segmented regions on the blank image are replaced with corresponding particulate matter or biological images, thereby generating a reconstructed image containing the particulate matter or biological.

4. The reconstructed image is subjected to necessary post-processing such as edge smoothing, background overlaying, etc., to enhance visual effects.

5. The final reconstructed image is displayed on a display device of the water module or saved as an image file for further analysis and processing.

In carrying out this step, attention is paid to the following key points:

(1) Consistency of the coordinate system: the blank image and the coordinate information in the two-dimensional data table must be consistent;

(2) Preparation of target image templates

In order to obtain the desired rendering effect, the pre-prepared particulate matter or biological image library needs to meet the following several requirements:

a) The variety is rich: the image library needs to contain a sufficient variety of particulate matter and organisms to be consistent with the training data set to match the recognition results.

B) The size is various: the same kind of target object needs to have different size templates to adapt to different size recognition results.

C) The gestures are various: templates in the image library should contain samples of the target object at different poses (positions, directions, etc.) to match different pose recognition results.

D) The quality requirements are as follows: the quality of the image template needs to be high, definition and authenticity are guaranteed, and visual quality degradation during rendering is avoided.

(3) Image transformation and rendering

Rendering the target image template into the partitioned areas of the blank image requires performing the following transformation operations:

a) Size scaling: and scaling and transforming the image template according to the size information in the identification result to enable the size of the image template to be matched with the size of the actual target object.

B) Rotation transformation: and carrying out rotation transformation on the image template according to the direction information in the identification result so that the direction of the image template is consistent with the direction of the actual target object.

C) Translation transformation: and translating the transformed image template according to the position of the segmentation area, so that the position of the transformed image template on the blank image is matched with the position of the actual target object.

D) Rendering algorithm: and rendering the transformed image template into a partitioned area of the blank image, wherein pixel value direct replacement, alpha blending or other rendering algorithms can be adopted.

(4) Post-processing of reconstructed images

After the rendering of all the divided regions is completed, the following post-processing operation may be performed on the reconstructed image to improve visual quality:

a) Edge smoothing: and carrying out smoothing treatment on pixel values of the edges of the division areas, and eliminating the phenomena of saw teeth and color blocks. Common algorithms include bilinear interpolation, gaussian smoothing, etc.

B) And (3) background rendering: if necessary, an underwater background image can be superimposed on the reconstructed image to make the reconstructed image more fit with the real scene.

C) Color correction: the color of the reconstructed image may differ from the original scene due to the influence of the underwater illumination. Global or local color correction can be performed to improve the color reduction degree.

D) Contrast enhancement: and contrast enhancement processing is carried out on the reconstructed image, so that definition and layering sense are improved.

E) Noise removal: and using an image denoising algorithm to eliminate noise and miscellaneous points in the reconstructed image and improve the image definition.

(5) Image display and storage

The reconstructed image can be displayed on a display device (such as a display, a projector and the like) of the water module in real time after post-processing for real-time monitoring and analysis. Meanwhile, the reconstructed image can be stored as an image file, so that the subsequent off-line processing and analysis are convenient.

In the process of image display and storage, attention is paid to the following points:

a) Parameter setting of the display device: the reconstructed image is scaled and color space converted appropriately according to parameters such as resolution, color space, etc. of the display device.

B) Image coding and compression: if the reconstructed image is to be saved as a file, a suitable image coding format (e.g., JPEG, PNG, etc.) and compression algorithm can be selected, balancing image quality with file size.

C) And (3) storing metadata information: metadata information (such as shooting time, place, camera parameters and the like) of the reconstructed image also needs to be stored together, so that subsequent data management and analysis are convenient.

Through the steps, the image containing particles or organisms can be successfully reconstructed at the water module end, and in-situ visual display of the underwater scene is realized.

Specifically, the principle of the invention is as follows: the invention combines the deep learning technology with the special data processing and transmission mechanism, and can realize the high-efficiency detection, identification and real-time visual display of underwater particles and organisms.

The key technical scheme of the invention comprises the following steps:

1. And accurately detecting and segmenting target particles or organisms from the underwater image sequence by utilizing a pre-trained particle or organism tracking model, and carrying out cross-frame association and unique identification on the same target. The model integrates various technologies such as target detection, instance segmentation and target tracking, and can effectively provide complex challenges for underwater environments.

2. And carrying out omnibearing pose estimation and semantic attribute prediction on the segmented target image by utilizing a pre-trained particulate matter or biological omnibearing pose recognition model, wherein the omnibearing pose estimation and semantic attribute prediction comprise important characteristic information such as the type, the size, the 3D pose and the like of the target. The model integrates 3D pose estimation and attribute prediction into a unified network framework to realize end-to-end prediction.

3. And establishing a two-dimensional data table containing target coordinates and identification results for the original image and the segmentation results of the original image at each time step, and transmitting the compact data to a water module. This design overcomes the bandwidth and reliability challenges of underwater and above-water data transmission.

4. And the water module reconstructs a real-time image containing particles or biological targets on the blank canvas by utilizing the received two-dimensional data table, and accurately renders the size and the gesture of the target image. Meanwhile, high-quality image reconstruction and display can be realized by combining a post-processing technology.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention.

Claims

1. The underwater particulate matter and biological image in-situ collection method is characterized by comprising an underwater module and an on-water module, wherein the underwater module is arranged at an underwater terminal and is used for identifying and extracting a two-dimensional data table representing an image based on the image shot by an underwater camera, uploading the two-dimensional data table to an on-water server, and the on-water module is arranged in the on-water server and is used for restoring the two-dimensional data table into an image.

2. The underwater particulate and biological image in-situ collection method of claim 1, wherein the underwater module is configured to perform the steps of:

s12, preprocessing the first image set to obtain a second image set;

3. The underwater particulate and biological image in-situ collection method of claim 2, wherein the underwater module is configured to perform the steps of:

s21, acquiring the two-dimensional data table;

4. An underwater particulate matter and biological image in situ collection method as claimed in claim 2, wherein the particulate matter or biological tracking model comprises:

5. The method of in situ collection of underwater particulates and biological images of claim 2, wherein the training step of the particulates or biological tracking model comprises:

6. An in situ underwater particulate and biological image acquisition method as in claim 4 wherein the particulate or biological tracking model feature extraction network employs ResNet, inception or EFFICIENTNET.

7. The method of in-situ collection of underwater particulates and biological images of claim 6, wherein the feature extraction network of the particulates or biological full pose recognition model employs the same feature extraction network as the particulates or biological tracking model.

8. The method for in-situ collection of underwater particulates and biological images according to claim 7, wherein the training step of the particulate or biological full pose recognition model comprises:

9. A computer readable storage medium having stored therein program instructions which, when executed, are adapted to carry out an in situ underwater particulate and biological image collection method as claimed in any one of claims 1 to 8.

10. An in situ underwater particulate and biological image acquisition system comprising the computer readable storage medium of claim 9.