Nothing Special   »   [go: up one dir, main page]

WO2024016812A1 - 显微图像的处理方法、装置、计算机设备及存储介质 - Google Patents

显微图像的处理方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2024016812A1
WO2024016812A1 PCT/CN2023/094954 CN2023094954W WO2024016812A1 WO 2024016812 A1 WO2024016812 A1 WO 2024016812A1 CN 2023094954 W CN2023094954 W CN 2023094954W WO 2024016812 A1 WO2024016812 A1 WO 2024016812A1
Authority
WO
WIPO (PCT)
Prior art keywords
skeleton
image
target object
feature
endpoint
Prior art date
Application number
PCT/CN2023/094954
Other languages
English (en)
French (fr)
Inventor
蔡德
韩骁
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2024016812A1 publication Critical patent/WO2024016812A1/zh
Priority to US18/603,081 priority Critical patent/US20240221400A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/695Preprocessing, e.g. image segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/698Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10056Microscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10056Microscopic image
    • G06T2207/10061Microscopic image from scanning electron microscope
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the present application relates to the field of image processing technology, and in particular to a microscopic image processing method, device, computer equipment and storage medium.
  • Nematodes are a classic multi-cellular organism with a short life cycle. Due to their small size, easy culture, large-scale operations like microorganisms, and a small number of cells that make up the body, there are some concerns about the morphology and shape of nematodes. Genealogy research needs.
  • Embodiments of the present application provide a microscopic image processing method, device, computer equipment and storage medium, which can save the labor cost of microscopic image analysis and improve the efficiency of microscopic image analysis.
  • the technical solution is as follows:
  • a microscopic image processing method which method includes:
  • skeleton morphology information Based on the skeleton morphology information, perform motion analysis on the target object to obtain multiple feature values, which are used to represent the weighting coefficients of multiple preset motion states when synthesizing the skeleton shape;
  • the feature value sequence composed of the plurality of feature values is determined as the motion component information of the target object.
  • a microscopic image processing device which device includes:
  • An instance segmentation module is used to perform instance segmentation on the microscopic image to obtain an instance image, where the instance image contains the target object in the microscopic image;
  • a skeleton extraction module used to extract the skeleton of the target object in the example image to obtain skeleton morphological information of the target object, where the skeleton morphological information represents the shape of the skeleton of the target object;
  • a motion analysis module configured to perform motion analysis on the target object based on the skeleton morphological information, obtain multiple feature values, and determine the feature value sequence composed of the multiple feature values as the motion component of the target object.
  • Information the multiple feature values are used to represent the weighting coefficients of multiple preset motion states when synthesizing the skeleton form.
  • a computer device in one aspect, includes one or more processors and one or more memories. At least one computer program is stored in the one or more memories. The at least one computer program is composed of the one or more computers. Multiple processors are loaded and executed to implement the above microscopic image processing method.
  • a storage medium is provided, and at least one computer program is stored in the storage medium.
  • the at least one computer program is loaded and executed by a processor to implement the above-mentioned microscopic image processing method.
  • a computer program product or computer program includes one or more program codes, and the one or more program codes are stored in a computer-readable storage medium.
  • One or more processors of the computer device can read the one or more program codes from the computer-readable storage medium, and the one or more processors execute the one or more program codes so that the computer device can Perform the above microscopic image processing method.
  • Figure 1 is a principle flow chart of a circular target segmentation method provided by an embodiment of the present application
  • Figure 2 is a principle flow chart of a traditional skeleton extraction method provided by an embodiment of the present application
  • Figure 3 is a schematic diagram of analysis of nematode swimming frequency provided by the embodiment of the present application.
  • Figure 4 is a schematic diagram of the implementation environment of a microscopic image processing method provided by an embodiment of the present application.
  • Figure 5 is a flow chart of a microscopic image processing method provided by an embodiment of the present application.
  • Figure 6 is a flow chart of a microscopic image processing method provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of the segmentation principle of a dual-layer instance segmentation model provided by an embodiment of the present application.
  • Figure 8 is a flow chart of an instance segmentation method for two target objects provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of the principle of the dual-layer instance segmentation model provided by the embodiment of the present application.
  • Figure 10 is a synthesis flow chart for synthesizing sample images provided by an embodiment of the present application.
  • Figure 11 is a schematic diagram of the training and prediction stages of a skeleton extraction model provided by an embodiment of the present application.
  • Figure 12 is a flow chart of a method for identifying head endpoints and tail endpoints according to an embodiment of the present application
  • Figure 13 is a schematic diagram of intercepting a local area of an endpoint provided by an embodiment of the present application.
  • Figure 14 is a flow chart for motion analysis of a target object provided by an embodiment of the present application.
  • Figure 15 is a schematic diagram of motion analysis of a target object provided by an embodiment of the present application.
  • Figure 16 is a principle flow chart of a microscopic image processing method provided by an embodiment of the present application.
  • Figure 17 is a schematic structural diagram of a microscopic image processing device provided by an embodiment of the present application.
  • Figure 18 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • Figure 19 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • C. elegans An example of the target object involved in this application.
  • C. elegans is a classic model organism. As a multi-cellular organism with a short life cycle, it is small, easy to cultivate, and can be operated in large quantities like microorganisms. The relatively small number of cells that make up the worm's body allows for exhaustive study of the morphology and lineage of the constituent cells.
  • a layer of cuticle mainly composed of collagen, lipids, and glycoproteins can be formed above the epithelial layer of the nematode. This cuticle is the protective exoskeleton (Exoskeleton) of the nematode and is a necessary structure for maintaining its shape. .
  • OSTU algorithm It is a method of automatic thresholding that is suitable for bimodal situations proposed by Japanese scholar Nobuyuki Otsu in 1979. It is also called Otsu method, maximum inter-class variance method, and maximum variance automatic thresholding method. wait.
  • the OSTU algorithm divides the image into two parts, the background and the target, according to the grayscale characteristics of the image. The greater the inter-class variance between the background and the target, the greater the difference between the two parts that make up the image. When some targets are mistakenly divided into background or parts When the background is misclassified as a target, the difference between the two parts will become smaller. Therefore, the segmentation that maximizes the variance between classes means the minimum probability of misclassification.
  • Watershed Algorithm Also called watershed segmentation method, it is a mathematical morphological segmentation method based on topology theory. Its basic idea is to regard the image as a topological topography in geodesy. The gray value of each pixel in the image is The degree value represents the altitude of the point. Each local minimum value and its affected area are called catchment basins, and the boundaries of the catchment basins form a watershed. The concept and formation of watersheds can be illustrated by simulating the immersion process. A small hole is pierced on the surface of each local minimum, and then the entire model is slowly immersed in water. As the immersion deepens, the influence domain of each local minimum slowly expands outward. Constructing a dam at the confluence of basins forms a watershed.
  • Distance Transform For binary images, convert the value of the foreground pixel into the distance from the point to the nearest background point, or convert the value of the background pixel into the distance from the point to the nearest foreground point.
  • Skeleton extraction algorithm The skeleton extraction algorithm in the image field actually extracts the center pixel outline of the target in the image.
  • the target is refined based on the center of the target.
  • the refined target is a single layer of pixel width.
  • the current skeleton extraction algorithm can be divided into iterative algorithms and non-iterative algorithms. Taking iterative algorithms as an example, it usually operates on binary images (such as masks). By moving from the periphery of the target to the center of the target, it uses the pixel to be detected as the center. The characteristics of the 3 ⁇ 3 size pixel window are continuously eroded and refined until the target is eroded to the point where it can no longer be eroded (single layer pixel width), and the skeleton of the image is obtained.
  • ROI Region of Interest
  • the area to be processed is outlined from the processed image in the form of boxes, circles, ellipses, irregular polygons, etc., which is called ROI.
  • ROI refers to the area in the microscopic image that contains the target object to be observed.
  • the rectangular area containing the target object is framed in the form of a rectangular frame, or it can also be framed in a circular frame or an elliptical frame. Or other irregular polygonal boxes to delineate the ROI.
  • ROI is the focus of this image analysis (that is, only the foreground area containing the target object is concerned, and the remaining background areas are not concerned).
  • HOG Histogram of Oriented Gradient
  • the HOG feature is a feature descriptor used for object detection in computer vision and image processing.
  • the HOG feature calculates and counts the gradient direction histogram of the local area of the image. to form features.
  • the main idea of the HOG feature is: in an image, the appearance and shape of the local target can be well described by the gradient or the directional density distribution of the edge. Its essence is: the statistical information of the gradient, and the gradient mainly exists at the edge. .
  • the skeleton endpoints involves using the HOG features of the skeleton endpoints to perform head-to-tail identification (or head-to-tail classification, head-to-tail detection) of the endpoints, that is, to determine whether the current endpoint is the head endpoint or the tail endpoint of the nematode.
  • head-to-tail identification or head-to-tail classification, head-to-tail detection
  • Support Vector Machine SVM is a generalized linear classifier that performs binary classification of data in a supervised learning manner. Its decision boundary is the maximum margin hyperplane that solves the learning sample. SVM uses the hinge loss function (Hinge Loss) to calculate the empirical risk (Empirical Risk), and adds a regularization term to the solution system to optimize the structural risk (Structural Risk). It is a classifier with sparsity and robustness. In the embodiment of this application, it involves using SVM to perform binary classification recognition of HOG features of skeleton endpoints to determine whether the current endpoint is the head endpoint or the tail endpoint of the nematode.
  • Hinge Loss hinge loss function
  • Etpirical Risk empirical risk
  • Structural Risk structural risk
  • the scale space of the signal refers to filtering the original signal through a series of single-parameter, increasing-width Gaussian filters to obtain a set of low-frequency signals
  • the scale space of the image feature refers to the image features extracted from the image as the above-mentioned original Signal.
  • the pyramidization of image features can efficiently express multi-scale image features. Usually, upsampling is performed from a lowest-level feature (i.e., the original scale feature), and a series of sampled features are fused with the bottom-level features. High-resolution, strong semantic features can be obtained (that is, feature extraction is enhanced).
  • nematodes in a petri dish are usually observed under a microscope, and the nematodes are imaged and analyzed through a CCD image sensor on the microscope to output a microscopic image of the nematodes.
  • Traditional research methods mainly rely on manual analysis of microscopic images. For example, nematodes appearing in microscopic images are manually counted, segmented, morphologically measured and kinematically analyzed. The above-mentioned manual analysis of microscopic images obviously consumes high labor costs and has low analysis efficiency.
  • Figure 1 is a principle flow chart of a circular target segmentation method provided by an embodiment of the present application.
  • the OSTU algorithm is usually used to extract the target foreground to obtain the foreground segmentation map 102.
  • distance transformation is performed on the foreground segmentation map 102 to obtain the center point of each target, and a distance-transformed image 103 is formed.
  • a watershed algorithm is performed using these center points as seeds to achieve a multi-objective segmentation task and obtain an instance segmentation result 104.
  • each target in the original 101 is a circular target.
  • the above-mentioned traditional nematode segmentation method has relatively high requirements for image quality. It requires that there be no interfering impurities in the image. It also has relatively high requirements for the image signal-to-noise ratio of the CCD image collected by the microscope. When the signal-to-noise ratio is low or the impurities are large, In many cases, the segmentation accuracy will be greatly reduced. Moreover, both the OSTU algorithm and distance transformation have many optional parameters that require manual debugging by technicians, so the labor costs are high and the analysis efficiency is low.
  • this nematode segmentation method cannot handle complex nematode targets such as overlap (that is, there is an overlap between two nematode targets) or curling (a single nematode target overlaps itself due to curling to form different body parts).
  • overlap that is, there is an overlap between two nematode targets
  • curling a single nematode target overlaps itself due to curling to form different body parts.
  • nematodes can easily form self-overlapping or curled shapes during observation.
  • traditional nematode segmentation methods cannot handle overlapping areas.
  • FIG. 2 is a principle flow chart of a traditional skeleton extraction method provided by an embodiment of the present application.
  • a single instance original image 201 containing only a single nematode target can be obtained by cropping the original image collected by the microscope.
  • the skeleton extraction algorithm is executed on the single instance original image 201 to obtain the skeleton image 202 of a single nematode target.
  • the skeleton diagram 202 is then subjected to post-processing such as pruning, and some small bifurcations are pruned to obtain a skeleton diagram 203 representing the skeleton between the head and tail of the nematode.
  • nematode is a long-shaped soft target, and the corresponding skeleton extraction needs to incorporate this prior knowledge.
  • the traditional skeleton extraction algorithm is easy to Producing more noise skeletons such as burrs requires post-processing, which results in lower processing efficiency and consumes more processing resources.
  • the traditional kinematic parameter analysis method usually analyzes the swimming frequency and body bending frequency of nematodes, and mainly relies on technical personnel's naked eye counting for statistics.
  • the swimming frequency of the nematode refers to the number of head swings of the nematode within 1 minute (referring to the nematode's head swinging from one side to the other and then back again, which is defined as 1 head swing), and the body bending frequency It means that one wavelength movement relative to the long axis of the body is defined as one body bending.
  • Figure 3 is a schematic diagram of analysis of nematode swimming frequency provided by an embodiment of the present application.
  • the above-mentioned traditional kinematic parameter analysis method is prone to errors in counting due to the fast movement speed of nematodes.
  • the labor cost is extremely high and the analysis efficiency is low.
  • the swimming frequency and body bending frequency of nematodes can only conduct simple movement assessment of nematodes, but cannot analyze in-depth morphological measurements and kinematics analysis, and the analysis accuracy is also relatively poor.
  • embodiments of the present application provide a microscopic image analysis method based on deep learning.
  • target objects such as nematodes
  • a complete set of deep learning-based methods from instance segmentation of multiple targets such as multiple nematodes
  • skeleton extraction and principal component analysis were designed. image analysis method.
  • the intermediate steps do not require manual operations by technicians, thus providing fast and efficient basic results for subsequent counting, segmentation, morphological measurement and kinematic analysis.
  • the embodiment of the present application proposes a complete set of nematode image analysis framework based on deep learning.
  • the intermediate steps do not require manual operations by technicians, which greatly reduces labor costs and improves analysis efficiency.
  • the overall image analysis framework involves an instance segmentation method that can handle overlapping nematodes, which can optimize the instance segmentation effect when multiple nematodes overlap, and can also count nematodes simultaneously after instance segmentation is completed.
  • the overall image analysis framework involves a skeleton extraction method based on deep learning, which can directly output skeleton images without noise such as burrs and bifurcations, and can also handle skeleton extraction in situations such as nematode curling. And based on the extracted skeleton diagram, the machine can automatically distinguish the head and tail of the nematode.
  • the overall image analysis framework involves a method based on principal component analysis. Principal component analysis can decompose the extracted nematode skeleton into principal components, and quickly and conveniently analyze the nematodes through the principal component coefficients, or eigenvalues. Kinematic parameters, etc.
  • the methods provided by the embodiments of the present application can automatically process nematode microscopic images by optimizing various traditional processes for nematode microscopic images, thereby providing downstream methods for subsequent counting, segmentation, morphological measurement and kinematic analysis. Tasks provide fast and efficient basic results.
  • FIG. 4 is a schematic diagram of the implementation environment of a microscopic image processing method provided by an embodiment of the present application.
  • the implementation environment includes a microscope 401, an image acquisition device 402 and a computer device 403, which will be described below.
  • the microscope 401 can be a digital microscope, that is, a video microscope, which can convert the physical image observed by the microscope 401 through digital-to-analog conversion, and then image it on the screen of the microscope 401 or on the computer device 403 external to the microscope 401.
  • the digital microscope is a product successfully developed by perfectly combining cutting-edge optical microscope technology, advanced photoelectric conversion technology, and LCD screen technology.
  • the image acquisition device 402 is used to acquire images of the physical objects observed by the microscope 401 .
  • the image acquisition device 402 will acquire a microscopic image containing the target object.
  • the target object as nematodes as an example, since the nematodes in the petri dish are cultured in batches, when observing the nematodes through the microscope 401, it is likely that there are multiple nematodes in the field of view of the eyepiece, and there may be overlap between these nematodes. parts, and individual nematodes may also overlap themselves due to curling.
  • the image acquisition device 402 usually includes a CCD image sensor connected to the microscope 401.
  • the CCD image sensor is also called a CCD photosensitive element.
  • CCD is a semiconductor device that can convert the optical image observed by the microscope 401 into a digital signal.
  • a light beam emitted from the subject i.e. the target object
  • the optical system such as the lens of the microscope objective and eyepiece
  • the tiny photosensitive substances implanted on the CCD are called pixels. The more pixels a CCD contains, the higher the picture resolution it provides.
  • a CCD works just like film, but it converts image pixels into digital signals.
  • capacitors on the CCD which can sense light and convert images into digital signals. Through the control of an external circuit, each small capacitor can transfer the charge it carries to its adjacent capacitor.
  • the computer device 403 is connected to the image acquisition device 402 or the microscope 401 carrying the image acquisition device 402.
  • Computer device 403 may be a terminal. Applications supporting the display of microscopic images are installed and run on the terminal.
  • the terminal receives the microscopic image collected by the image collecting device 401, and displays the microscopic image on the display screen of the terminal.
  • the terminal after the terminal receives and displays the microscopic image, the terminal locally supports the microscopic image processing method involved in the embodiments of the present application, so that the terminal can locally process the microscopic image and display the microscopic image. processing results.
  • the terminal may be directly or indirectly connected to the server through wired or wireless communication. The embodiments of the present application do not limit the connection method here.
  • the terminal sends the microscopic image sent by the image acquisition device 402 to the server, and the server processes the microscopic image and returns the processing result to the terminal, and the terminal displays the received processing result on the display screen.
  • the server is responsible for the main image processing work, and the terminal is responsible for the secondary image processing work; or the server is responsible for the secondary image processing work, and the terminal is responsible for the main image processing work; or the server and the terminal are distributed Computing architecture for collaborative image processing.
  • the server trains the dual-layer instance segmentation model, skeleton extraction model, head and tail recognition model, etc. required for the microscopic image processing method in the embodiment of the present application. Then, the server delivers the trained dual-layer instance segmentation model, skeleton extraction model, and head-to-tail recognition model to the terminal, so that the terminal can locally support the above-mentioned microscopic image processing method.
  • the above-mentioned server includes at least one of one server, multiple servers, a cloud computing platform, or a virtualization center.
  • the server is an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and middleware Services, domain name services, security services, CDN (Content Delivery Network, content distribution network) and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • CDN Content Delivery Network, content distribution network
  • the above-mentioned terminal is a smartphone, a tablet, a laptop, a desktop computer, a smart speaker, a smart watch, an MP3 (Moving Picture Experts Group Audio Layer III, Moving Picture Experts Group Audio Layer III) player, MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) players, e-book readers, etc., but are not limited to these.
  • the number of the above terminals may be more or less. For example, there may be only one terminal, or there may be dozens, hundreds, or more terminals. The embodiments of this application do not limit the number of terminals and device types.
  • FIG. 5 is a flow chart of a microscopic image processing method provided by an embodiment of the present application.
  • the microscopic image processing method is executed by a computer device. Taking the computer device as a terminal as an example, this embodiment includes the following steps:
  • the terminal performs instance segmentation on the microscopic image to obtain an instance image, which includes the target object in the microscopic image.
  • a terminal is a computer device used to store and process microscopic images.
  • the embodiment of this application is explained by taking the computer device being a terminal as an example.
  • the computer device may also be provided as a server, which is not specifically limited in the embodiments of the present application.
  • the microscopic image refers to the optical image collected by the CCD image sensor of the microscope to observe the object to be observed.
  • a microscope carries a CCD image sensor.
  • a CCD image sensor can convert the optical image observed by a microscope into an electrical signal and form a microscopic image that can be read and displayed by a terminal. After the CCD image sensor generates the microscopic image, the microscopic image is sent to the terminal.
  • the terminal receives microscopic images sent by a CCD image sensor of the microscope.
  • This microscopic image may refer to a single microscopic image sent by the CCD image sensor, or may refer to any image frame in the continuous observation video stream sent by the CCD image sensor.
  • the embodiments of this application are specific to the types of microscopic images. Not specifically limited.
  • the CCD image sensor may collect a continuous image frame (constituting an observation video stream). Then, the CCD image sensor sends each collected image frame to the terminal. When sending, it may be transmitted frame by frame in sequence, or it may be divided into multiple video segments for segmented transmission. The embodiment of the present application does not specifically limit the transmission method.
  • the terminal in addition to directly acquiring the microscopic image sent from the CCD image sensor of the microscope, the terminal can also process the locally stored microscopic image, or process the microscopic image downloaded from the server.
  • the embodiments of this application do not specifically limit the source of the microscopic images.
  • the microscopic image since the microscopic image refers to an optical image of a target object to be observed using a microscope, the microscopic image must contain one or more target objects.
  • the terminal performs instance segmentation on the microscopic image based on the microscopic image to segment each target object contained in the microscopic image to obtain an instance image containing a single target object.
  • the instance image of each target object includes a contour image and a mask image of the target object.
  • the contour map of the target object is used to indicate the edges and shapes of a single target object in the microscopic image.
  • the mask image of the target object refers to the position and area occupied by the microscopic image used to indicate a single target object.
  • instance segmentation refers to image instance segmentation (Instance Segmentation), which is further refined on the basis of semantic detection (Semantic Segmentation) to separate the foreground and background of objects, achieve pixel-level object separation, and be able to segment the same class objects in different instances. Examples may be organs, tissues, cells, etc.
  • instance segmentation is used to segment the target object, that is, nematodes.
  • the embodiment of the present application can achieve good instance segmentation effects for complex scenes in which multiple target objects overlap or a single target object curls.
  • the specific instance segmentation method will be described in detail in the next embodiment, which will not be discussed here. To elaborate.
  • the terminal performs skeleton extraction on the target object in the example image to obtain skeleton morphology information of the target object.
  • the skeleton morphology information represents the skeleton morphology of the target object.
  • the terminal can output an instance image for each target object.
  • the instance image of each target object includes a contour image and a mask image.
  • Skeleton extraction can be performed on the mask image of each target object to obtain the current target object through the skeleton extraction algorithm.
  • skeleton morphological information When multiple target objects are involved, it is necessary to extract the respective skeleton morphological information for each target object.
  • contour images and mask images are both binary images. In the contour map, contour pixels and non-contour pixels have different values. In the mask image, pixels belonging to the target object and pixels not belonging to the target object have different values.
  • the pixels with a value of 1 are contour pixels, and the pixels with a value of 0 are non-contour pixels; or the pixels with a value of 0 are contour pixels, and the value is
  • the pixels of 1 are non-contour pixels, which are not specifically limited in the embodiments of this application.
  • Contour pixels refer to pixels used to represent the outline (ie, edge) of a target object, while non-contour pixels refer to pixels used to represent the outline of a target object that is not the target object.
  • pixels with a value of 1 are pixels that belong to the target object, and pixels with a value of 0 are pixels that do not belong to the target object (maybe background pixels, or pixels of other target objects); or, pixels with a value of 0 are pixels belonging to this target object, and pixels with a value of 1 are pixels that do not belong to this target object.
  • the embodiment of the present application Not specifically limited.
  • the terminal runs a skeleton extraction algorithm on the mask map of the target object to output the skeleton morphological information of the target object.
  • the skeleton morphology information includes at least one skeleton morphology image generated based on the mask image, and the skeleton of the target object in the skeleton morphology image has a single layer of pixel width.
  • the skeleton morphology image is also a binary image. In the skeleton morphology image, skeleton pixels and non-skeleton pixels have different specific values.
  • the pixels with a value of 1 are skeleton pixels, and the pixels with a value of 0 are Non-skeleton pixels; or pixels with a value of 0 are skeleton pixels, and pixels with a value of 1 are non-skeleton pixels.
  • the skeleton pixels refer to the pixels used to represent the skeleton of the target object
  • the non-skeleton pixels refer to the pixels used to represent the skeleton of the target object.
  • the skeleton pixels in the skeleton morphology image can form a skeleton of the target object with a single layer of pixel width, and the shape of this skeleton represents the skeleton shape of the target object in the microscopic image.
  • a skeleton extraction algorithm is applied to each segmented instance image of the target object to extract the skeleton morphology image, which facilitates the extraction of the skeleton morphology image of the target object.
  • the terminal Based on the skeleton morphological information, the terminal performs motion analysis on the target object, obtains multiple feature values, and determines the feature value sequence composed of the multiple feature values as the motion component information of the target object.
  • the skeleton morphology information of each target object obtained by the terminal in the above step 502 includes at least the skeleton morphology image of each target object. Then, the terminal performs motion analysis on the skeleton shape image, so that the current skeleton shape of the target object can be decomposed into a combination of multiple preset motion states, and each preset motion state can be determined to represent this
  • the eigenvalues contributed by the motion state during decomposition This eigenvalue represents how much weighting coefficient needs to be applied to this motion state to synthesize the current skeleton form of the target object.
  • the motion state with a relatively large eigenvalue can be used as the main component of the current skeleton form.
  • the eigenvalues of each motion state as the principal component can form a eigenvalue sequence, and this eigenvalue sequence can be used as the motion component information of the target object.
  • the detailed kinematic analysis method will be explained in the next embodiment and will not be described again here.
  • the current skeleton shape of the target object can be decomposed into a combination of multiple preset motion states, so that any shape can be
  • the skeleton forms are all expressed as motion synthesized by using multiple feature values to weight multiple preset motion states respectively.
  • the above processing method can conduct a more in-depth and detailed kinematic analysis of the target object, especially when the target object is a nematode. It is not limited to artificial naked eye counting to analyze the nematode swimming frequency or body bending frequency, but can analyze the nematode swimming frequency or body bending frequency. Nematode kinematics analysis that is more accurate, efficient and requires no human intervention.
  • the method provided by the embodiment of the present application performs instance segmentation on the target objects contained in the microscopic image to determine the instance image of each target object, that is, the single instance segmentation result. And extract the skeleton morphological information from the single instance segmentation result to perform motion analysis and motion component decomposition based on the skeleton morphological information. It can decompose the current complex skeleton morphology of each target object into multiple presets. combination of motion states.
  • the overall processing process does not require manual intervention, and the machine can be automated, which greatly reduces labor costs and improves analysis efficiency.
  • the output-based motion component information can also perform in-depth morphological measurement and kinematic analysis, thus improving the accuracy of analyzing target objects.
  • FIG. 6 is a flow chart of a microscopic image processing method provided by an embodiment of the present application.
  • the microscopic image processing method is executed by a computer device.
  • the computer device is used as a terminal as an example for illustration. This embodiment Includes the following steps:
  • the terminal determines the ROI where the target object contained in the microscopic image is located.
  • the ROI contains multiple overlapping target objects.
  • the terminal acquires the microscopic image in the manner described in step 501 above.
  • the terminal can also input the microscopic image into an object detection model, and use the object detection model to perform object detection on each target object in the microscopic image (also known as object detection or target detection), the object detection model outputs the position information of the candidate box of the ROI in the microscopic image.
  • the position information includes: the coordinates (x, y) of the upper left corner vertex of the candidate box and the width w and height h of the candidate box.
  • the position information is a four-tuple data in the form of (x, y, w, h), or the candidate box can also be located using the coordinates of the lower left corner vertex, the upper right corner vertex coordinates, and the lower right corner vertex coordinates.
  • This application The examples do not specifically limit this.
  • the above-mentioned object detection model can be any machine learning model that supports object detection.
  • the object detection model can be: R-CNN (Region with CNN features, CNN-based regional object detection, where CNN refers to Convolutional Neural Network, that is, convolutional neural network), Fast R-CNN (fast R-CNN), Faster R-CNN (faster R-CNN) or FCOS (Fully Convolutional One-Stage, full convolutional one-stage), etc.
  • R-CNN Regular with CNN features, CNN-based regional object detection, where CNN refers to Convolutional Neural Network, that is, convolutional neural network
  • Fast R-CNN fast R-CNN
  • Faster R-CNN faster R-CNN
  • FCOS Full Convolutional One-Stage, full convolutional one-stage
  • the above processing method can not only avoid noise interference caused by non-ROI and improve the processing accuracy of ROI, but also save the processing resources occupied by non-ROI processing operations, shorten the processing time of microscopic images, and improve Processing efficiency of microscopic images.
  • the ROI may contain one or more target objects.
  • the processing flow of a single target object is relatively simple, and a single instance target object can be segmented directly through some traditional instance segmentation algorithms (such as running the OSTU algorithm, distance transform and watershed algorithm successively).
  • the processing flow of multiple target objects is relatively complicated, because multiple target objects are likely to overlap each other. Therefore, the embodiment of this application takes multiple target objects in the ROI as an example for explanation.
  • the ROI even if the ROI only contains a single target object, the single target object may be curled, resulting in self-overlap.
  • the traditional instance segmentation algorithm has poor instance segmentation accuracy in the case of self-overlap.
  • the processing flow provided by the embodiments of the present application can not only improve the accuracy of instance segmentation in scenarios where multiple target objects overlap each other, but can also improve the accuracy of instance segmentation in scenarios where single target objects overlap themselves.
  • the terminal extracts local image features of the ROI.
  • the terminal can input the microscopic image into a feature extraction model, and use the feature extraction model to extract global image features of the microscopic image.
  • the terminal can determine the local image features of the ROI from the global image features by using the position information of the ROI obtained in step 601. For example, when the position information of ROI is (x, y, w, h), assuming (x, y) is the coordinate of the upper left corner vertex, then you only need to scale the global image features to the same size as the microscopic image (if The feature extraction model directly outputs global image features of the same size, so there is no need to perform this scaling step). Then find the feature point with coordinates (x, y) from the global image features.
  • each feature point included in the area selected by the ROI candidate frame in the global image feature is determined as the local image feature of the ROI.
  • the terminal can crop the local image features of the area covered by the ROI from the global image features.
  • the above feature extraction model includes a residual network (Residual Networks, Resnet) and a feature pyramid network (Feature Pyramid Networks, FPN).
  • the residual sub-network is used to extract pixel-related features of the input image.
  • the feature pyramid subnetwork is used to extract the image pyramid features of the input image in different scale spaces.
  • the residual network includes multiple hidden layers, and residual connections are adopted between the multiple hidden layers.
  • the output of the current hidden layer will be input to the next hidden layer together with the input of the current hidden layer after splicing.
  • the output of the second hidden layer will be input into the third hidden layer together with the input of the second hidden layer (that is, the output of the first hidden layer) after splicing.
  • the output of the current hidden layer will be input to the next hidden layer together with the input of the previous hidden layer after splicing.
  • the output of the third hidden layer will be input into the fourth hidden layer together with the input of the second hidden layer (ie, the output of the first hidden layer) after splicing.
  • the structure is not specifically limited.
  • the residual subnetwork can be a deep residual network such as Resnet-34 network, Resnet-50 network, Resnet-101 network, etc.
  • the original image features are input into the feature pyramid network, and the feature pyramid network is used to upsample the original image features step by step to obtain a series of different features. Feature pyramid in scale space. Then the features of different scales contained in the feature pyramid are fused to obtain the final global image features.
  • only the residual network can also be used to extract global image features.
  • the feature extraction model is the residual network itself.
  • the terminal directly uses the original image features extracted by the residual network as global image features, and then cuts the local image features from the global image features. This can simplify the global image feature extraction process and save the terminal's processing resources.
  • the global image features are extracted through the feature extraction model, and then the local image features are cropped from the global image features, which can better retain the image features of the edge parts of the ROI. Because the image features in the edge part are closely related to the adjacent non-ROI pixels, local image features with better expressive ability can be extracted.
  • the ROI may also be cropped from the microscopic image first. Then, only the ROI is input into the feature extraction model, and the local image features of the ROI are directly extracted through the feature extraction model.
  • the above processing method can only perform feature extraction on the ROI, thereby eliminating the need to extract global image features of the entire microscopic image, which can greatly save the processing resources of the terminal.
  • the terminal inputs the local image features into the dual-layer instance segmentation model, processes the local image features through the dual-layer instance segmentation model, and outputs the respective contour maps and mask maps of the multiple target objects in the ROI. .
  • the dual-layer instance segmentation model is used to establish separate layers for different objects to obtain the instance segmentation results of each object.
  • the dual-layer instance segmentation model is used to establish separate layers for different objects to obtain instance images of each object.
  • the instance image of each object includes the contour map and mask map of the object.
  • the terminal after extracting the local image features of the ROI through the above step 602, the terminal inputs the local image features into the dual-layer instance segmentation model. If the ROI contains multiple target objects, the dual-layer instance segmentation model will establish separate layers for different target objects to output respective instance images (i.e., instance segmentation results) for each target object.
  • the instance image of each target object includes a contour map and a mask map of each target object, thereby characterizing the contours and masks occupied by each target object.
  • the above-mentioned contour map and mask map are binary images.
  • contour pixels and non-contour pixels have different values.
  • the pixels belonging to the target object and Pixels that do not belong to this target object have different values.
  • pixels with a value of 1 are contour pixels, and pixels with a value of 0 are non-contour pixels; or pixels with a value of 0 are contour pixels, and the value is The pixels of 1 are non-contour pixels, which are not specifically limited in the embodiments of this application.
  • pixels with a value of 1 are pixels that belong to the target object, and pixels with a value of 0 are pixels that do not belong to the target object (maybe background pixels, or pixels of other target objects); or, pixels with a value of 0 are pixels that belong to this target object, and pixels with a value of 1 are pixels that do not belong to this target object.
  • the embodiment of the present application Not specifically limited.
  • the instance segmentation process is introduced by taking two overlapping target objects in the ROI as an example.
  • the target object located on the top layer is called the occluded object
  • the target object located on the bottom layer is called the occluded object.
  • the occluder object is located on the top layer and covers part of the body of the occluded object located on the bottom layer.
  • the two-layer instance segmentation model includes an occluded object layer network and an occluded object layer network.
  • the occlusion object layer network is used to extract the outlines and masks of the occlusion objects located on the top layer.
  • the occluded object layer network is used to extract the outlines and masks of the underlying occluded objects.
  • the occluded object layer network and the occluded object layer network are deployed in the dual-layer instance segmentation model in a cascade manner. At this time, the output of the occluded object layer network is the input of the occluded object layer network.
  • Figure 7 is a schematic diagram of the segmentation principle of a dual-layer instance segmentation model provided by an embodiment of the present application.
  • the dual layer The instance segmentation model will create separate layers for the occluded objects on the top layer and the occluded objects on the bottom layer to perform instance segmentation. For example, extract the outline map and mask map of the occluded object (Occluder) in the top layer (Top Layer) layer 7021. Extract the outline map and mask map of the occluded object (Occludee) in the bottom layer (Bottom Layer) 7022.
  • the top layer 7021 and the bottom layer 7022 can achieve masking Bilayer decoupling is performed on the occluded object and the occluded object, so that respective instance segmentation results, that is, instance images 703, are finally output for different target objects (ie, different instances).
  • instance segmentation results that is, instance images 703
  • target objects ie, different instances.
  • outline maps and mask maps are output for occluded objects
  • outline maps and mask maps are also output for occluded objects.
  • Figure 8 is a flow chart of an instance segmentation method for two target objects provided by an embodiment of the present application. As shown in Figure 8, this example segmentation method includes the following steps 6031-6034.
  • the terminal inputs the local image features into the occlusion object layer network, and extracts the first perceptual feature of the top-level occlusion object in the ROI through the occlusion object layer network.
  • the first perceptual feature represents the image feature of the occluded object in the instance segmentation task.
  • an occluding object layer network is used to explicitly model the outlines and masks of occluding objects within the ROI.
  • the occlusion object layer network includes at least one first convolutional layer, at least one first graph convolutional network (GCN) layer, and at least one second convolutional layer.
  • the adjacent layers of the above-mentioned first convolution layer, first graph convolution layer and second convolution layer are connected in series. Series connection means that the features output by the previous layer are used as the input signals of the current layer.
  • the first graph convolution layer is simplified based on the non-local attention mechanism (non-local attention), which can also be called a non-local layer (non-local layer).
  • Non-local operator In order to reduce the number of parameters of the model, non-local operators are used (Non-local operator) operation implements the graph convolution layer. Each pixel is a graph node, and the attention weight constitutes the node connection between nodes. Based on the occlusion object layer network with the above structure, the above step 6031 can be implemented through the following steps A1-A3.
  • the terminal inputs the local image features into the first convolution layer of the occlusion object layer network, and performs a convolution operation on the local image features through the first convolution layer to obtain the initial perceptual features.
  • the local image features of the ROI extracted in the above step 602 are input into the first convolutional layer of the occlusion object layer network of the dual-layer instance segmentation model, and the local image features are processed through the first convolutional layer.
  • Convolution operation is performed on image features.
  • a convolution kernel of size 3 ⁇ 3 is used to perform a convolution operation on local image features and output the initial perceptual features of the occluded object.
  • the terminal inputs the initial perception features into the first graph convolution layer of the occlusion object layer network, and performs a convolution operation on the initial perception features through the non-local operator in the first graph convolution layer to obtain the graph convolution features.
  • the initial perceptual features are then input into the first graph convolution layer of the occlusion object layer network, and in the first graph convolution layer through non-local Operator (Non-Local Operator) to implement the graph convolution layer.
  • the first graph convolution layer involves three convolution layers with a convolution kernel size of 1 ⁇ 1 and a Softmax (exponential normalization) operator.
  • the above three convolution layers are called ⁇ convolution layer, ⁇ convolution layer and ⁇ convolution layer respectively.
  • the terminal inputs the initial perception features into the ⁇ convolution layer, theta convolution layer and the ⁇ convolution layer respectively.
  • Each convolution layer uses a convolution kernel of size 1 ⁇ 1 to perform a convolution operation on the initial perception features. Then, the terminal multiplies the feature map output by the ⁇ convolution layer and the feature map output by the ⁇ convolution layer element by element to obtain a fused feature map. The terminal then uses the Softmax operator to perform exponential normalization on the fused feature map to obtain a normalized feature map. The terminal then multiplies the normalized feature map and the feature map output by the ⁇ convolution layer element by element to obtain the target feature map. Then, the terminal adds the target feature map and the initial perceptual feature element by element to obtain the output result of the first graph convolution layer, that is, the graph convolution feature.
  • the graph convolution operation is implemented through non-local operators in the first graph convolution layer, which can reduce the amount of model parameters in the graph convolution part.
  • the graph convolution layer based on non-local operators pixels in the image space can be effectively associated according to the similarity of the corresponding feature vectors, realizing the re-aggregation of the input target area features, and can better solve the problem of the same object. Pixels are spatially blocked and truncated, causing discontinuity problems.
  • the terminal inputs the graph convolution feature into the second convolution layer of the occlusion object layer network, and performs a convolution operation on the graph convolution feature through the second convolution layer to obtain the first perceptual feature.
  • the graph convolution feature is input into one or more second convolution layers connected in series, and the graph convolution feature is convolved through the second convolution layer.
  • the product features are further convolved.
  • a convolution kernel of size 3 ⁇ 3 is used to perform a convolution operation on the graph convolution feature, and the first perceptual feature of the occluded object is output.
  • Figure 9 is a schematic diagram of the principle of the dual-layer instance segmentation model provided by the embodiment of the present application.
  • an occluded object layer network 910 and an occluded object layer network 920 are involved.
  • the occlusion object layer network 910 includes a first convolution layer 911, a first image convolution layer 912 and two second convolution layers 913-914.
  • the convolutional layer 912 and the second convolutional layers 913-914 are connected in series.
  • the local image features of the ROI extracted in step 602 are represented by the symbol x
  • the local image features x are first input into the occlusion object layer network 910, and the initial perceptual features are extracted through the first convolution layer 911.
  • the first graph convolution layer 912 extracts the graph convolution feature
  • the second convolution layer 913-914 extracts the first perceptual feature
  • the second convolution layer 914 outputs the first perceptual feature.
  • the above-mentioned dual-layer instance segmentation model 900 is based on Mask RCNN and adds a dual-layer (Overlapping Bi-Layers) module for processing overlapping targets.
  • a dual-layer (Overlapping Bi-Layers) module for processing overlapping targets.
  • the ROI is extracted.
  • Local image feature x (equivalent to ROI pooling on the original microscopic image).
  • use the dual layer module to model the relationship between the occluded object and the occluded object.
  • the first perceptual feature of the occluded object is introduced into the calculation process of the second perceptual feature of the occluded object.
  • the above processing method can better learn the relationship between occluded objects and occluded objects, and ultimately output better segmentation results in the case of multiple target overlaps.
  • the terminal obtains the contour map and mask map of the occluded object based on the first perception feature.
  • the terminal may perform an upsampling operation on the first perceptual feature to obtain the contour map and mask map of the occlusion object.
  • the first perceptual feature is upsampled to obtain a contour map with the same size as the ROI and a mask map with the same size as the ROI.
  • the first perceptual feature is upsampled to obtain a contour map with the same size as the microscopic image and a mask map with the same size as the microscopic image.
  • the embodiments of the present application do not specifically limit this.
  • the occlusion object layer network also includes a first deconvolution layer.
  • the terminal inputs the first perceptual feature into the first deconvolution layer, and the first perceptual feature is processed in the first deconvolution layer.
  • a deconvolution operation is performed to obtain the contour map and mask map of the occluded object.
  • Upsampling can also be implemented in other ways, and this is not specifically limited in the embodiments of the present application.
  • the occlusion object layer network 910 also includes a first deconvolution layer 915, and the first perceptual feature output by the second convolution layer 914 is input to the first deconvolution layer 915. , a contour image 916 of the occluded object and a mask image 917 of the occluded object will be output.
  • the terminal inputs the fusion feature obtained by fusing the local image feature and the first perceptual feature into the occluded object layer network, and extracts the second perceptual feature of the occluded object located at the bottom of the ROI.
  • the second perceptual feature represents the image feature of the occluded object in the instance segmentation task.
  • the occluded object layer network explicitly models the contours and masks of occluded objects within the ROI.
  • the occluded object layer network includes at least a third convolutional layer, At least one second graph convolutional layer and at least one fourth convolutional layer. The adjacent layers of the third convolution layer, the second graph convolution layer and the fourth convolution layer are connected in series. Based on the occluded object layer network with the above structure, the above step 6033 can be performed through the following steps B1-B4 to fulfill.
  • the terminal fuses the local image features and the first perceptual features to obtain fusion features.
  • the terminal adds the local image features and the first perceptual features element by element to obtain the fusion feature. Still taking Figure 9 as an example for explanation, the terminal adds the local image feature x of the ROI and the first perceptual feature output by the second convolution layer 914 element by element to obtain the fusion feature.
  • fusion methods such as element-wise multiplication, splicing, and bilinear convergence can also be used. The embodiments of this application do not specifically limit the fusion methods.
  • the terminal inputs the fused features into the third convolution layer of the occluded object layer network, and performs a convolution operation on the fused features through the third convolution layer to obtain perceptual interaction features.
  • the fusion features obtained in the above step B1 are input into the third convolution layer of the occluded object layer network of the dual-layer instance segmentation model, and the fusion features are processed through the third convolution layer.
  • Convolution operation For example, a convolution kernel with a size of 3 ⁇ 3 is used to perform a convolution operation on the fused features and output the perceptual interaction features of the occluded object.
  • the input signal of the occluded object layer network not only contains local image features, but also contains the first perceptual features of the occluded object, it is possible to achieve perceptual interaction between the occluded object and the occluded object. That is, the extracted information of the occluded object and the original local image features are combined to jointly model the outline and mask of the occluded object.
  • the adjacent instance boundaries of occluded objects and occluded objects can be effectively distinguished to improve the accuracy of instance segmentation of occluded objects.
  • the terminal inputs the perceptual interaction features into the second graph convolution layer of the occluded object layer network, and performs a convolution operation on the perceptual interaction features through the non-local operator in the second graph convolution layer to obtain the graph convolution interaction. feature.
  • the perceptual interaction features are then input into the second graph convolution layer of the occluded object layer network, and in the second graph convolution layer, non- Local operators to implement graph convolutional layers.
  • the second graph convolution layer involves three convolution layers with a convolution kernel size of 1 ⁇ 1 and a Softmax (exponential normalization) operator.
  • the above three convolution layers are called ⁇ convolution layer, ⁇ convolution layer and ⁇ convolution layer respectively.
  • the perceptual interaction features are input into the ⁇ convolution layer, theta convolution layer and the ⁇ convolution layer respectively.
  • a convolution kernel with a size of 1 ⁇ 1 is used to perform a convolution operation on the perceptual interaction features. Then, the terminal multiplies the feature map output by the ⁇ convolution layer and the feature map output by the ⁇ convolution layer element by element to obtain a fused feature map. The terminal then uses the Softmax operator to perform exponential normalization on the fused feature map to obtain a normalized feature map. The terminal then multiplies the normalized feature map and the feature map output by the ⁇ convolution layer element by element to obtain the target feature map. Then, the terminal adds the target feature map and the perceptual interaction feature element by element to obtain the output result of the second graph convolution layer, that is, the graph convolution interaction feature.
  • the graph convolution operation is implemented through non-local operators in the second graph convolution layer, which can reduce the amount of model parameters in the graph convolution part.
  • the graph convolution layer based on non-local operators pixels in the image space can be effectively associated according to the similarity of the corresponding feature vectors, and the re-aggregation of the input target area features can be achieved.
  • the above processing method can better solve the problem of discontinuity caused by occlusion and truncation of pixels of the same object in space.
  • the terminal inputs the graph convolution interaction feature into the fourth convolution layer of the occluded object layer network, and performs a convolution operation on the graph convolution interaction feature through the fourth convolution layer to obtain the second perceptual feature.
  • the graph convolution interaction features are input into one or more concatenated fourth convolutional layers, and the fourth convolutional layer is used to Graph convolution interactive features perform further convolution operations.
  • a convolution kernel with a size of 3 ⁇ 3 is used to perform a convolution operation on the graph convolution interaction feature, and the second perceptual feature of the occluded object is output.
  • the occluded object layer network 920 includes a third convolution layer 921, a second image convolution layer 922, and two fourth convolution layers 923-924.
  • the third convolution layer 921, the second graph convolution layer 922, and the fourth convolution layers 923-924 are connected in series. Assuming that the local image feature of the ROI extracted in step 602 is represented by the symbol x, then the local image feature x and the first perceptual feature output by the second convolution layer 914 in the occlusion object layer network 910 are added element-wise , get the fusion features.
  • the fused features are then input into the occluded object layer network 920, and the perceptual interaction features are extracted through the third convolution layer 921, the graph convolution interaction features are extracted through the second graph convolution layer 922, and the perceptual interaction features are extracted through the fourth convolution layer 922.
  • the product layers 923-924 extract the second perceptual feature, and the fourth convolution layer 924 outputs the second perceptual feature.
  • the terminal obtains the contour map and mask map of the occluded object based on the second perceptual feature.
  • the terminal can perform an upsampling operation on the second perceptual feature to obtain the contour map and mask of the occluded object.
  • the second perceptual feature is upsampled to obtain a contour map with the same size as the ROI and a mask map with the same size as the ROI.
  • the second perceptual feature is upsampled to obtain a contour map with the same size as the microscopic image and a mask map with the same size as the microscopic image.
  • the embodiments of the present application do not specifically limit this.
  • the occluded object layer network also includes a second deconvolution layer.
  • the terminal inputs the second perceptual feature into the second deconvolution layer, and performs a deconvolution operation on the second perceptual feature in the second deconvolution layer to obtain a contour map and a mask map of the occluded object.
  • upsampling through deconvolution operation is used as an example for explanation. Upsampling can also be implemented in other ways, and this is not specifically limited in the embodiments of the present application.
  • the occluded object layer network 920 also includes a second deconvolution layer 925, and the second perceptual feature output by the fourth convolution layer 924 is input to the second deconvolution layer.
  • layer 925 a contour image 926 of the occluded object and a mask image 927 of the occluded object will be output.
  • the terminal performs instance segmentation on the ROI to determine the respective contour map and mask map of at least one target object contained in the microscopic image. That is, the pre-trained dual-layer instance segmentation model is used to perform instance segmentation on the ROI to distinguish different target object instances in the microscopic image, taking the target object as nematodes as an example.
  • the embodiment of the present application only takes the case where the ROI contains multiple overlapping target objects as an example. However, during the observation process, the ROI may only contain a single target object.
  • the instance segmentation method involved in the above steps 602-603 can be used, or some traditional image segmentation algorithms can be used for instance segmentation. This is not performed in the embodiment of the present application. Specific limitations.
  • the dual-layer instance segmentation model is trained based on multiple synthetic sample images.
  • the synthetic sample image contains multiple target objects.
  • the synthetic sample image is synthesized based on multiple original images containing only a single target object.
  • synthetic sample images containing multiple target objects are used for training during the training phase. This can greatly improve the accuracy of the dual-layer instance segmentation model for targets such as target objects. segmentation accuracy.
  • the model trained in the above method can handle various complex situations such as multiple instances overlapping each other, or a single instance curling up on itself causing self-overlapping, without the need for manual operation by technicians.
  • manual collection of sample images containing multiple target objects is time-consuming and labor-intensive, and technicians are also required to manually add some label information. Therefore, multiple original images containing only a single target object can be directly used to synthesize a synthetic sample image containing multiple target objects.
  • synthetic sample images with any overlapping shapes and any number of instances can be synthesized. In this way, on the training set composed of original images, data enhancement can be used to synthesize better training quality and training effect. Better composite sample images.
  • the above processing method can help train a dual-layer instance segmentation model with better instance segmentation effect and higher accuracy.
  • the following synthesis method when using multiple original images to synthesize a synthetic sample image, the following synthesis method can be adopted: when the target object is darker than the background in the original image, the pixels at the same position in the multiple original images are The lowest pixel value is assigned to the pixel at the same position in the composite sample image. In other words, the pixel value of each pixel in the synthesized sample image is equal to the lowest pixel value among the pixels at the same position in the multiple original images used to synthesize the synthesized sample image. For example, when the nematode is darker than the background (closer to black), assuming that the original image includes image 1 and image 2, a synthetic sample image is synthesized by taking min(image 1, image 2) pixel by pixel.
  • Figure 10 is a synthesis flow chart for synthesizing sample images provided by an embodiment of the present application.
  • the target object as C. elegans as an example, assuming that there are two original images 1001 and 1002, a synthetic sample can be synthesized by taking min (original image 1001, original image 1002) pixel by pixel. Image 1003, and ensure that the pixel value of each pixel in the synthesized sample image 1003 is equal to the lowest pixel value among the pixels at the same position in the original images 1001 and 1002.
  • the following synthesis method when using multiple original images to synthesize a synthetic sample image, the following synthesis method can also be adopted: when the target object is brighter than the background in the original image, the pixels at the same position in the multiple original images are The highest pixel value in is assigned to the pixel at the same position in the synthetic sample image. In other words, the pixel value of each pixel in the synthesized sample image is equal to the highest pixel value among the pixels at the same position in multiple original images. For example, when the nematode is brighter than the background (closer to white), assuming that the original image includes image 1 and image 2, a synthetic sample image is synthesized by taking max(image 1, image 2) pixel by pixel.
  • the instance image contains the contour map and mask map of the target object. That is, the example of first extracting the ROI from the microscopic image and then performing instance segmentation on the ROI is used as an example. This eliminates the need to run the instance segmentation algorithm on the entire microscopic image and can save the computing resources of the terminal.
  • the terminal can also perform instance segmentation on the entire microscopic image, which can avoid missing some smaller target objects when extracting the ROI.
  • the terminal inputs the mask image of the target object into the skeleton extraction model, extracts the skeleton of the target object through the skeleton extraction model, and outputs the skeleton morphological image of the target object. .
  • the terminal inputs the mask image in the instance image of each target object into the skeleton extraction model, and performs extraction on the target object through the skeleton extraction model.
  • Skeleton extraction output the skeleton morphological image of the target object.
  • the skeleton extraction model is used to predict the skeleton morphology of the target object based on the mask image in the instance image of the target object.
  • this step 604 can be used to extract the skeletal morphological image of the target object.
  • the skeleton extraction model is a CNN model containing multiple convolutional layers.
  • the terminal inputs the mask image of the target object into the multiple convolutional layers of the skeleton extraction model, and the target object is extracted through the multiple convolutional layers.
  • the mask image is subjected to a convolution operation to output a skeleton morphological image.
  • the terminal inputs the mask image of the occluded object or the occluded object into the skeleton extraction model, and performs a convolution operation on the mask image through multiple convolution layers connected in series in the skeleton extraction model.
  • the last convolution The layer outputs a skeleton morphological image in which the skeleton of the target object has a single layer pixel width.
  • the skeleton morphology image is a binary image, and the skeleton pixels and non-skeleton pixels in the skeleton morphology image have different specific values.
  • the pixels with a value of 1 are skeleton pixels, and the pixels with a value of 0 are non-skeleton pixels.
  • the pixels with a value of 0 are skeleton pixels, and the pixels with a value of 1 are non-skeleton pixels.
  • the embodiments of the present application do not specifically limit this.
  • the skeleton pixels in the skeleton morphology image can form a skeleton of the target object with a single layer of pixel width, and the shape of this skeleton represents the skeleton shape of the target object in the microscopic image.
  • a skeleton morphological image is extracted by applying a skeleton extraction algorithm to each instance of the segmented target object based on the instance segmentation result, that is, the instance image.
  • the above processing method facilitates kinematic analysis on the skeletal morphological image of the target object, and can improve the analysis efficiency of kinematic analysis without manual participation or naked eye counting.
  • the skeleton extraction model is trained based on sample images containing the target object and skeleton morphological label information annotating the target object.
  • the skeleton morphology label information includes the skeleton tangential angles of each of the multiple sampling points that sample the skeleton morphology of the target object in the sample image.
  • the skeleton tangential angle represents the angle between the tangent line and the horizontal line corresponding to the sampling point as the tangent point on the directed skeleton shape from the head endpoint to the tail endpoint.
  • any sample image containing a target object technicians can mark the skeleton shape of the target image and the head endpoint and tail endpoint on the skeleton shape, so that a directed skeleton shape can be formed from the head endpoint to the tail endpoint. .
  • the marked directed skeleton form is sampled. Multiple sampling points on the directed skeleton form are first determined, and then for each sampling point, a line is generated on the directed skeleton form with the sampling point as the cut point. A tangent line, the angle between the tangent line and the horizontal line is determined as the skeleton tangential angle of the sampling point. Repeat the above operations to obtain the skeleton tangential angles of multiple sampling points. The skeleton tangential angles of each of the multiple sampling points are determined as the skeleton morphology label information of the sample image. Perform the above operations on each sample image to obtain the skeleton morphology label information of each sample image.
  • the loss function value of the skeleton extraction model in the training phase is based on the skeleton cut of each of the multiple sampling points.
  • the error between the tangential angle and the predicted tangential angle is determined.
  • the predicted tangential angle is obtained by sampling the skeleton morphology image predicted by the sample image based on the skeleton extraction model.
  • the current sample image will be input into the skeleton extraction model, and the sample image will be convolved through the multiple convolution layers connected in the skeleton extraction model, as follows:
  • the last convolutional layer outputs the predicted skeleton image of the sample image.
  • multiple sampling points are also determined from the predicted skeleton image, and the predicted tangential angle of each sampling point is obtained.
  • the method of obtaining the predicted tangential angle is similar to the method of obtaining the skeleton tangential angle, and will not be described in detail here.
  • the prediction error of the current sample image can be obtained based on the error between the skeleton tangential angle and the predicted tangential angle of each of the multiple sampling points.
  • the prediction error can be the sum, arithmetic mean or weighted average of the errors of the skeleton tangential angle and the predicted tangential angle of each sampling point, and is not specifically limited here.
  • the above operations will be performed on each sample image, and the respective prediction errors of all sample images can be obtained.
  • the loss function value of the skeleton extraction model in this round of iteration can be determined. Next, determine whether the number of iterations or the value of the loss function meets the conditions for stopping training.
  • the frequency threshold is any integer greater than or equal to 1
  • the loss threshold is any value greater than 0.
  • FIG 11 is a schematic diagram illustrating the training and prediction stages of a skeleton extraction model provided by an embodiment of the present application.
  • technicians first mark the skeleton shape on the sample image.
  • the skeleton morphology label information ie, the skeleton tangential angles of multiple sampling points
  • data enhancement methods such as deformation, flipping, and size transformation can be performed to synthesize more and richer training data sets.
  • train the skeleton extraction model on the training data set and evaluate the skeleton extraction performance of the trained skeleton extraction model on real sample images.
  • a single instance that is, a single target object
  • a mask image of the single target object can be generated.
  • the mask image is input into the trained skeleton extraction model, and the skeleton of the target object is extracted through the skeleton extraction model to obtain the skeleton morphological image of the target object.
  • the skeleton morphology predicted by the skeleton extraction model and the skeleton morphology actually annotated in the sample image can be accurately analyzed.
  • Quantification makes it easy to compare the error between the predicted skeleton and the annotated skeleton, so that a skeleton extraction model with accurate skeleton extraction function for the target object can be trained. And there are still complex situations such as self-curling in the instance segmentation results, and it still has a good skeleton extraction effect.
  • the terminal Based on the skeleton shape image, the terminal identifies the head endpoint and tail endpoint in the skeleton shape of the target object.
  • the terminal can directly identify the head endpoint and tail endpoint from the skeleton morphology image. That is, an endpoint recognition model is trained.
  • the input of the endpoint recognition model is the skeleton shape image, and the output is the endpoint coordinates of the head endpoint and the tail endpoint.
  • the terminal can also first intercept the endpoint local areas of one end of the skeleton and the other end of the skeleton from the skeleton shape image, and then perform two classifications on each intercepted endpoint local area. That is, a head-tail recognition model for binary classification is trained to determine whether the input endpoint local area is a head endpoint or a tail endpoint.
  • the above processing method can reduce the calculation amount of the head and tail recognition process and improve the recognition efficiency of the head and tail recognition process. Taking this situation as an example for explanation, FIG. 12 is a flow chart of a method for identifying head endpoints and tail endpoints related to an embodiment of the present application. As shown in Figure 12, the above step 605 can be implemented through the following steps 6051-6054.
  • the terminal intercepts and obtains the first endpoint local area and the second endpoint local area.
  • the first endpoint local area and the second endpoint local area are located at both ends of the skeleton.
  • the skeleton pixels in the skeleton morphological image can form a skeleton with a single pixel width, it is easy to find the two endpoints of the skeleton. Then, you can use each endpoint as the interception center, in the skeleton shape image An endpoint candidate box with the endpoint as the center point is determined. Then, the endpoint local area circled by the endpoint candidate box can be found directly from the skeleton morphology image.
  • the intercepted area located at one end of the skeleton is called the first endpoint local area
  • the intercepted area located at the other end of the skeleton is called the second endpoint local area.
  • the terminal extracts the first HOG feature of one end of the skeleton based on the local area of the first endpoint.
  • the terminal after the terminal intercepts the first endpoint local area from the skeleton morphology image, it can find the original partial image with the same position as the first endpoint local area from the original microscopic image. Then, the first HOG feature can be extracted for the original partial image.
  • the original partial image is divided into multiple cell units, where a cell unit refers to a smaller connected area in the image. Next, the gradient or edge direction histogram of each pixel in each cell unit is collected. These histograms are then combined to form a feature descriptor for the cell unit. Repeat the above operations until the first HOG feature of the entire original partial image is obtained.
  • the terminal extracts the second HOG feature of the other end of the skeleton based on the local area of the second endpoint.
  • the above step 6053 is similar to the above step 6052 and will not be described again here.
  • the terminal identifies one end of the skeleton and the other end of the skeleton based on the first HOG feature and the second HOG feature respectively, and obtains the head endpoint and the tail endpoint.
  • the terminal may utilize a head-to-tail recognition model to identify/classify head endpoints and tail endpoints.
  • the head-to-tail recognition model is used to determine whether the endpoints in the skeleton of the target object belong to the head endpoints or tail endpoints based on the HOG features of the local area of the endpoints.
  • the above step 6054 can be implemented through the following steps C1-C3.
  • the terminal inputs the first HOG feature into the head-to-tail recognition model, performs two classifications on the first HOG feature through the head-to-tail recognition model, and obtains the first recognition result of one end of the skeleton.
  • the first identification result is used to characterize whether one end of the skeleton is a head endpoint or a tail endpoint.
  • the head and tail recognition model includes two binary classification models: a head recognition model and a tail recognition model.
  • a head recognition model for binary classification of head endpoints is obtained by using HOG features of some endpoint local areas pre-labeled with head endpoints.
  • a head recognition model for binary classification of tail endpoints is obtained by using some HOG features of the endpoint local area pre-labeled with tail endpoints.
  • the first HOG feature is input into the trained head recognition model.
  • the head recognition model performs a binary classification process on whether one end of the skeleton is a head endpoint, and outputs a first recognition result of whether one end of the skeleton is a head endpoint.
  • the head recognition model is an SVM two-classification model as an example for explanation.
  • the SVM two-classification model After inputting the first HOG feature into the SVM two-classification model, the SVM two-classification model will perform two classifications on the first HOG feature, thereby outputting the first identification result of whether one end of the skeleton is the head endpoint.
  • the SVM two-classification model predicts the recognition probability that one end of the skeleton belongs to the head endpoint based on the first HOG feature.
  • the recognition probability is greater than the classification threshold
  • the first recognition result is set to "Y (Yes)", which means that one end of the skeleton is the head endpoint.
  • the first recognition result is set to "N (No, No)", which means that one end of the skeleton is not the head endpoint.
  • the classification threshold is any value greater than or equal to 0 and less than or equal to 1.
  • the head and tail recognition model is an overall multi-classification model used to determine whether a skeleton endpoint is a head endpoint or a tail endpoint.
  • a head and tail recognition model for multi-classification of head endpoints/tail endpoints is obtained by using some HOG features of the endpoint local area pre-labeled with head endpoints and tail endpoints. Then, input the first HOG feature into the trained head and tail recognition model, perform multi-classification processing on one end of the skeleton through the head and tail recognition model, and output that one end of the skeleton is the head endpoint/tail endpoint/neither head The endpoint is also not the first identified result for the tail endpoint.
  • the SVM multi-classification model After inputting the first HOG feature into the SVM multi-classification model, the SVM multi-classification model will perform multi-classification on the first HOG feature, thereby outputting The first recognition result is that one end of the skeleton is a head endpoint/a tail endpoint/neither a head endpoint nor a tail endpoint. That is, the SVM multi-classification model can be configured with 3 category labels: "head endpoint", "tail endpoint” and "neither head endpoint nor tail endpoint”.
  • the SVM multi-classification model predicts the classification probability of each category label at one end of the skeleton based on the first HOG feature. Then, the category label with the highest classification probability is determined as the first recognition result of one end of the skeleton.
  • the terminal inputs the second HOG feature into the head-to-tail recognition model, performs two classifications on the second HOG feature through the head-to-tail recognition model, and obtains a second recognition result for the other end of the skeleton.
  • the second recognition result is used to characterize whether the other end of the skeleton is a head endpoint or a tail endpoint.
  • the head and tail recognition model includes two two-classification models: a head recognition model and a tail recognition model
  • the first recognition result obtained in the above step C1 indicates that one end of the skeleton is the head endpoint
  • the tail recognition model can be called to perform binary classification processing on the second HOG feature to output the second recognition result of whether the other end of the skeleton is the tail endpoint.
  • the SVM two-classification model will perform two classifications on the second HOG feature, thereby outputting the skeleton. Whether the other end is the second identification result of the tail endpoint.
  • the SVM two-classification model predicts the recognition probability of the tail endpoint at the other end of the skeleton based on the second HOG feature.
  • the recognition probability is greater than the classification threshold
  • the second recognition result is set to "Y (Yes, yes)" which means that the other end of the skeleton is the tail endpoint. Otherwise, the second recognition result is set to "N (No, No)", which means that the other end of the skeleton is not the tail endpoint.
  • the head and tail recognition model includes two two-classification models: a head recognition model and a tail recognition model, and the first recognition result obtained in step C1 above indicates that one end of the skeleton is not the head endpoint, then
  • the head recognition model can be continuously called to perform binary classification processing on the second HOG feature to output the second recognition result of whether the other end of the skeleton is the head endpoint.
  • the tail recognition model is called to perform binary classification processing on the first HOG feature to determine whether one end of the skeleton is a tail endpoint.
  • the second HOG feature can also be input into the trained head and tail recognition model, and the other end of the skeleton can be processed through the head and tail recognition model.
  • Multi-classification processing output the second recognition result that the other end of the skeleton is the head endpoint/tail endpoint/neither the head endpoint nor the tail endpoint.
  • the SVM multi-classification model will perform multi-classification on the second HOG feature, thereby outputting
  • the other end of the skeleton is the head endpoint/tail endpoint/the second recognition result that is neither the head endpoint nor the tail endpoint. That is, the SVM multi-classification model can be configured with 3 category labels: "head endpoint", "tail endpoint” and "neither head endpoint nor tail endpoint”.
  • the SVM multi-classification model will predict the classification probability of each category label at the other end of the skeleton based on the second HOG feature. Then, the category label with the highest classification probability is determined as the second recognition result of the other end of the skeleton.
  • the terminal determines to obtain the head endpoint and the tail endpoint based on the first recognition result and the second recognition result.
  • the first recognition result and the second recognition result indicate that one end of the skeleton is the head endpoint and the other end of the skeleton is the tail endpoint, or indicate that one end of the skeleton is the tail endpoint and the other end of the skeleton is the head endpoint. That is, one endpoint is the head endpoint and the other endpoint is the tail endpoint. Then it means that the recognition result is normal and the subsequent process continues.
  • the two endpoints are determined to be “neither head endpoints nor "Tail endpoint"
  • this can be automatically corrected to a certain extent. For example, if the head recognition model classifies both endpoints as head endpoints, the endpoint with the highest recognition probability is selected as the head endpoint, and the remaining endpoints are used as tail endpoints. At this time, the tail recognition model is used for verification. If the probability that the remaining endpoints are identified as tail endpoints is greater than the probability that the selected head endpoints are identified as tail endpoints, the verification is passed. Alternatively, if the tail recognition model classifies both endpoints as tail endpoints, the analogy can be made in this modified manner. Alternatively, it can also be reported directly to technical personnel for manual investigation, which is not specifically limited in the embodiments of this application.
  • Figure 13 is a schematic diagram of intercepting a local area of an endpoint provided by an embodiment of the present application.
  • the target object being nematodes as an example
  • the corresponding two original partial images 1311 and 1312 can be sampled in the microscopic image 1301.
  • the head and tail recognition model can be used to determine whether the skeleton endpoints contained in the original partial images 1311 and 1312 are head endpoints or tail endpoints respectively.
  • the morphological differences between the head and tail of the nematode are relatively obvious.
  • several exemplary partial images of nematode heads are given.
  • the edge of the nematode's head is relatively rounded.
  • several exemplary partial images of the nematode tail are given.
  • the tail edge of the nematode is relatively sharp.
  • a local image area is intercepted for the two endpoints of the skeleton.
  • 128-dimensional HOG features are extracted for each skeleton endpoint.
  • the recognition accuracy for the head and tail of the nematode is as high as 98%, which can well balance the head and tail recognition speed and recognition accuracy.
  • the terminal determines the skeleton shape image, the head endpoint and the tail endpoint as the skeleton shape information of the target object.
  • the skeleton shape information represents the shape of the target object's skeleton.
  • the current skeleton shape of a target object can be determined through the skeleton shape image, and the direction of the skeleton shape can be determined through the recognized head endpoint and tail endpoint. That is, a complete directed skeleton shape (from the head endpoint to the tail endpoint) can be formed.
  • This directed skeleton shape is the skeleton shape information of the target object.
  • a possible implementation method of extracting the skeleton of the target object in the example image and obtaining the skeleton morphological information of the target object is provided. That is, first extract the skeleton shape image through the skeleton extraction model, and then use the head and tail recognition model to identify the head endpoint and tail endpoint. This way, a directed skeleton shape can be obtained from the head endpoint to the tail endpoint. In the directed Based on the skeleton shape, a richer and deeper kinematics analysis of the target object can be performed. In other embodiments, only the undirected skeleton shape in the skeleton shape image may be used as the skeleton shape information, which is not specifically limited in the embodiments of the present application.
  • the terminal Based on the skeleton shape information, the terminal samples the skeleton shape of the target object and obtains a feature vector composed of the skeleton tangential angles of the multiple sampling points.
  • the skeleton tangential angle represents the angle between the tangent line and the horizontal line corresponding to the sampling point as the tangent point on the directed skeleton shape from the head endpoint to the tail endpoint.
  • the terminal selects multiple sampling points on the directed skeleton form from the head endpoint to the tail endpoint. Then, for each sampling point, a tangent line is generated on the directed skeleton shape with the sampling point as the tangent point (because the skeleton shape is directional, the tangent line is a ray along the direction of the skeleton shape, not undirected straight line). Next, the angle between the tangent line and the horizontal line is determined as the skeleton tangential angle of the sampling point. Repeat the above operations to obtain the skeleton tangential angles of multiple sampling points.
  • the skeleton tangential angles of the multiple sampling points mentioned above can form a feature vector. The dimension of the feature vector is equal to the number of sampling points. Each element in the feature vector is the skeleton tangential angle of a sampling point.
  • the terminal decomposes motion components of the directed skeleton form represented by the feature vector to obtain motion component information of the target object.
  • the motion component information refers to the respective characteristic values of various preset motion states obtained by motion decomposition of the skeleton morphological information.
  • each preset motion state actually represents a preset skeleton form.
  • the preset skeleton form is sampled in a manner similar to the above-mentioned step 607, and the preset feature vector corresponding to the preset skeleton form can also be obtained, and then the feature vector obtained in the above step 607 is decomposed into the weighted weights of multiple preset feature vectors. and.
  • the above processing method can obtain the motion component information of the target object based on the weight coefficient (i.e., eigenvalue) occupied by each preset feature vector during decomposition, so that any skeleton shape can be decomposed into a variety of preset motions
  • the combination of states greatly facilitates the kinematic analysis of the target object.
  • the motion component information includes multiple characteristic values of each of the plurality of preset motion states. This characteristic value represents the weight coefficient of the corresponding preset motion state when performing motion component decomposition.
  • Figure 14 is a flow chart for motion analysis of a target object provided by an embodiment of the present application. As shown in Figure 14, the above step 608 can be implemented through the following steps 6081-6083.
  • the terminal samples the preset skeleton shapes indicated by the various preset motion states to obtain the preset skeleton shapes. Assume the preset feature vectors of each motion state.
  • multiple sampling points are selected from the directed preset skeleton shape indicated by the preset motion state.
  • the method of selecting sampling points from the preset skeleton shape needs to be consistent with the method of selecting sampling points in step 607.
  • a tangent line is generated on the preset skeleton shape with the sampling point as the tangent point (because the preset skeleton shape also has a direction, so the tangent line follows the direction of the preset skeleton shape. rays rather than undirected straight lines).
  • the angle between the tangent line and the horizontal line is determined as the skeleton tangential angle of the sampling point.
  • the skeleton tangential angles of the multiple sampling points mentioned above can form a preset feature vector.
  • the dimension of the preset feature vector is equal to the number of sampling points.
  • Each element in the preset feature vector is the skeleton tangential angle of a sampling point. It should be noted that since the sampling methods in step 607 and step 6081 are consistent, which means that the number of sampling points is the same, the feature vector and the preset feature vector have the same dimensions.
  • the terminal decomposes the feature vector into the sum of the products of multiple preset feature vectors and multiple feature values.
  • the feature vector is decomposed into a sum of the products of multiple preset feature vectors and multiple feature values.
  • the preset feature vector is also a K-dimensional vector.
  • N preset feature vectors can be obtained (each preset feature vector is a K-dimensional vector). That is, the preset feature vectors of all preset motion states form an N ⁇ K matrix. Extracting the covariance from the N ⁇ K matrix can obtain a K ⁇ K matrix.
  • the eigenvalues and K-dimensional eigenvectors are decomposed, and N eigenvalues corresponding to each of the N preset eigenvectors can be obtained.
  • the N eigenvalues meet the following conditions: multiply the N preset eigenvectors by the corresponding N eigenvalues to obtain N products, and the sum of these N products is exactly equal to the K-dimensional eigenvector.
  • the terminal determines the feature value sequence composed of the multiple feature values as the motion component information.
  • eigenvalues are obtained by decomposition in the above step 6083. For example, N eigenvalues.
  • a eigenvalue sequence can be determined. For example, assuming that 5 preset motion states are included, 5 eigenvalues are obtained from the solution: a1, a2, a3, a4 and a5. Then the eigenvalue sequence ⁇ a1, a2, a3, a4, a5 ⁇ can be used as the motion component information of the target object.
  • the multiple feature values included in the feature value sequence can also be sorted from large to small, and the preset motion state corresponding to the feature value located at the front target position in the sorting is determined as The principal component of motion. Then, only the eigenvalues of the motion principal components can be used as the motion component information of the target object, that is, only focus on the principal components that have a decisive role in the current skeleton shape of the target object, and ignore some minor side components.
  • the feature value subsequence serves as the motion component information of the target object.
  • feature values such as top3 and top10 can also be selected as the main motion components, which are not specifically limited in the embodiments of the present application.
  • the motion of the target object within the observation time period can also be analyzed based on the motion principal components, and the kinematic characteristics of the target object within the observation time period can be obtained.
  • the skeleton morphology of a certain nematode 10 eigenvalues were obtained using the above analysis method, and the eigenvalues ⁇ a1, a2, a3, a4, a5) ranked in the top5 (top 5) in order from largest to smallest were selected.
  • corresponds to the five preset motion states as the motion principal components.
  • Figure 15 is a schematic diagram of motion analysis of a target object provided by an embodiment of the present application.
  • the target object is nematodes as an example for explanation.
  • the nematode morphology is different and unpredictable, the nematode morphology usually has its own inherent rules.
  • the motion analysis method of the above steps 6081-6083 the main components of the nematode's motion can be decomposed and obtained. For example, a total of 5 preset motion states were analyzed, and the top 2 preset motion states with the largest eigenvalues were selected as motion principal components.
  • the original eigenvalue sequence is ⁇ a1, a2, a3, a4, a5 ⁇
  • the eigenvalue subsequence of the motion principal component is ⁇ a1, a2 ⁇ , that is, the two preset motion states corresponding to the eigenvalues a1 and a2. It is the main component of motion.
  • the above motion analysis process is executed for each microscopic image frame, and a motion analysis probability map 1501 composed of the characteristic values a1 and a2 of the same nematode during the observation period can be drawn.
  • the abscissa is the value of the characteristic value a1
  • the ordinate is the value of the characteristic value a2.
  • the color depth of each coordinate point in the diagram represents the position of the nematode at this coordinate point.
  • the probability of the skeleton form synthesized by eigenvalues a1 and a2.
  • the angle phase value formed by the coordinate values of the feature values a1 and a2 can also be analyzed.
  • This angular phase value is the angular phase value obtained by converting the coordinate value composed of the characteristic values a1 and a2 into a trigonometric function and then transforming it through an inverse trigonometric function.
  • the terminal performs motion analysis on the target object based on the skeleton morphology information to obtain motion component information of the target object. That is, by sampling the directed skeleton shape, the feature vector is constructed based on the skeleton tangential angle of the sampling point.
  • the above processing method can quantitatively decompose the feature vector, thereby automatically decomposing the extracted skeleton shape into motion principal components, and then conveniently analyze various kinematic parameters of the target object through the motion principal components, which greatly improves the accuracy of the target object. Object analysis efficiency.
  • Figure 16 is a principle flow chart of a microscopic image processing method provided by an embodiment of the present application. As shown in Figure 16, taking the target object as nematodes as an example, this processing method can be applied to various fields of nematode analysis, such as counting, segmentation, morphological measurement and kinematic analysis, and has an extremely wide range of application scenarios.
  • the original nematode image 1601 collected by the microscope CCD image sensor includes two situations: single nematode and multiple nematodes. Single nematodes may appear to curl onto themselves, while polynematodes may overlap each other.
  • the nematode image 1601 is input into an instance segmentation model that can handle overlapping targets (ie, a dual-layer instance segmentation model), and the result 1602 of the nematode instance segmentation is obtained. Then, each single nematode target instance obtained by instance segmentation is input into the skeleton extraction model for skeleton extraction, and the skeleton extraction result 1603 is obtained. In addition, it is necessary to identify the head and tail endpoints of single nematodes. Next, after skeleton extraction and head-to-tail identification, the skeleton tangential angle is used to describe the movement state of the nematode. As shown in the skeleton extraction result 1603, 5 sampling points are set on the enlarged segment of the skeleton arc.
  • the angle ⁇ i between the tangent line t i of the third sampling point and the horizontal line is the skeleton of this sampling point. Tangential angle.
  • the motion principal components are decomposed using the feature vectors composed of the skeleton tangential angles of multiple sampling points.
  • the decomposed eigenvalues of each motion principal component are put into subsequent kinematic parameter analysis, and the nematode's motion speed, angular velocity, axial velocity, etc. can be automatically output.
  • the state description diagram 1604 assuming that the serial numbers of multiple sampling points are normalized, the relationship between the normalized sampling point serial numbers and the skeleton tangential angle is drawn.
  • the abscissa represents the normalized For the subsequent sampling point sequence, the ordinate represents the value of the skeleton tangential angle corresponding to the sampling point.
  • the skeleton form of a certain nematode can be decomposed into the weighted sum of 4 preset movement forms.
  • the weights or eigenvalues of each of the 4 preset movement forms are ⁇ a1, a2, a3 respectively. , a4 ⁇ .
  • deeper kinematic analysis can be performed based on the Eigenworm (peristalsis eigen) mode, such as analyzing the movement speed, angular velocity, axial velocity, etc. of the nematode.
  • the method provided by the embodiment of the present application performs instance segmentation on the target objects contained in the microscopic image to determine the instance image of each target object, that is, the single instance segmentation result, and extracts the skeleton morphological information from the single instance segmentation result.
  • it can decompose the current complex skeleton shape of each target object into a combination of multiple preset motion states, and the overall processing process does not require Manual intervention can be automated by machines, which greatly reduces labor costs and improves analysis efficiency.
  • the output-based motion component information can also perform in-depth morphological measurement and kinematic analysis, thus improving the accuracy of analyzing target objects.
  • Figure 17 is a schematic structural diagram of a microscopic image processing device provided by an embodiment of the present application. As shown in Figure 17, the device includes:
  • the instance segmentation module 1701 is used to perform instance segmentation on the microscopic image to obtain an instance image, where the instance image contains the target object in the microscopic image;
  • the skeleton extraction module 1702 is used to extract the skeleton of the target object in the instance image to obtain the skeleton morphology information of the target object, and the skeleton morphology information represents the skeleton morphology of the target object;
  • the motion analysis module 1703 is used to perform motion analysis on the target object based on the skeleton morphological information, obtain multiple feature values, and determine the feature value sequence composed of the multiple feature values as the motion component information of the target object.
  • the multiple features The value is used to represent the weighting coefficient of various preset motion states when synthesizing the skeleton form.
  • the device provided by the embodiment of the present application performs instance segmentation on the target objects contained in the microscopic image to determine the instance image of each target object, that is, the single instance segmentation result, and extracts the skeleton morphological information from the single instance segmentation result.
  • it can decompose the current complex skeleton shape of each target object into a combination of multiple preset motion states, and the overall processing process does not require Manual intervention can be automated by machines, which greatly reduces labor costs and improves analysis efficiency.
  • the instance image includes a contour map and a mask image of the target object
  • the instance segmentation module 1701 includes:
  • the determination submodule is used to determine the region of interest ROI containing the target object from the microscopic image
  • the segmentation submodule is used to perform instance segmentation of ROI to determine the contour map and mask map of the target object.
  • the segmentation sub-module includes:
  • the extraction unit is used to determine the ROI candidate frame based on the position information of the ROI.
  • the area selected by the ROI candidate frame includes the ROI; determine the local image features of the ROI from the global image features of the microscopic image, and the local image features are used to characterize the global image features. Characteristics of the area selected by the ROI candidate box;
  • the processing unit is used to input local image features into the dual-layer instance segmentation model, process the local image features through the dual-layer instance segmentation model, and output the respective contour maps and mask maps of multiple target objects in the ROI.
  • the layer instance segmentation model is used to establish separate layers for different objects to obtain the instance segmentation results of each object.
  • the ROI contains occluded objects and occluded objects that overlap each other;
  • the dual-layer instance segmentation model includes an occluded object layer network and an occluded object layer network.
  • the occluded object layer network is used to extract the contours and masks of the occluded objects located on the top layer, and the occluded object layer network is used to extract the contours and masks of the occluded objects located on the bottom layer.
  • the processing unit includes:
  • the first extraction subunit is used to input local image features into the occlusion object layer network, and extract the first perceptual feature of the top-level occlusion object in the ROI through the occlusion object layer network.
  • the first perceptual feature represents the occlusion object in instance segmentation. Image features on the task;
  • the acquisition subunit is used to perform an upsampling operation on the first perceptual feature to obtain the contour map and mask map of the occluded object;
  • the second extraction subunit is used to input the fusion features obtained by fusing the local image features and the first perceptual features into the occluded object layer network, and extract the second perceptual features of the underlying occluded objects in the ROI.
  • the second perceptual feature The feature represents the image characteristics of the occluded object in the instance segmentation task;
  • the acquisition subunit is also used to perform an upsampling operation on the second perceptual feature to obtain the contour map and mask map of the occluded object.
  • the occlusion object layer network includes a first convolution layer, a first graph convolution layer, and a second convolution layer.
  • the first graph convolution layer includes a non-local operator, and the non-local operator is used to Pixel points in the image space are associated according to the similarity of the corresponding feature vectors; the first extraction subunit is used to:
  • the local image features are input into the first convolutional layer of the occlusion object layer network, and the local image is processed through the first convolutional layer. Convolution operation is performed on the features to obtain the initial perceptual features;
  • the initial perceptual features are input into the first graph convolution layer of the occlusion object layer network, and the initial perceptual features are convolved through the non-local operator in the first graph convolution layer to obtain the graph convolution features;
  • the graph convolution features are input into the second convolution layer of the occlusion object layer network, and the graph convolution features are convolved through the second convolution layer to obtain the first perceptual features.
  • the second extraction subunit is used for:
  • the graph convolution interaction feature is input into the fourth convolution layer of the occluded object layer network, and the graph convolution interaction feature is convolved through the fourth convolution layer to obtain the second perceptual feature.
  • the dual-layer instance segmentation model is trained based on multiple synthetic sample images containing multiple target objects, and the synthetic sample image is synthesized based on multiple original images containing only a single target object.
  • the pixel value of each pixel in the synthesized sample image is equal to the lowest pixel among the pixels at the same position in the multiple original images used to synthesize the synthesized sample image. value; or, in the case where the target object is lighter than the background in the original image, the pixel value of each pixel in the composite sample image is equal to the highest pixel value among the pixels at the same position in the multiple original images.
  • the skeleton extraction module 1702 includes:
  • the skeleton extraction submodule is used to input the instance image into the skeleton extraction model for any target object in the ROI.
  • the skeleton extraction model is used to extract the skeleton of the target object to obtain the skeleton morphological image.
  • the skeleton extraction model is used to based on the target Instance images of objects are used to predict the skeletal morphology of the target object;
  • the recognition submodule is used to recognize the skeleton shape image and obtain the head endpoint and tail endpoint in the skeleton shape of the target object;
  • the information determination submodule is used to determine the skeleton morphology image, head endpoints and tail endpoints as skeleton morphology information.
  • the skeleton extraction model includes multiple convolutional layers in cascade; the skeleton extraction sub-module is used to:
  • the skeleton extraction model is trained based on the sample image containing the target object and the skeleton morphological label information labeling the target object.
  • the skeleton morphology label information includes the skeleton tangential angle of each of the multiple sampling points that sample the skeleton morphology of the target object in the sample image.
  • the skeleton tangential angle represents the direction from the head endpoint to the tail endpoint. In the skeleton form, the angle between the tangent line corresponding to the sampling point as the tangent point and the horizontal line;
  • the loss function value of the skeleton extraction model in the training phase is determined based on the error between the skeleton tangential angles of multiple sampling points and the predicted tangential angle.
  • the predicted tangential angle is based on the skeleton predicted by the skeleton extraction model on the sample image.
  • the morphological image is sampled.
  • the identification submodule includes:
  • the interception unit is used to intercept the first endpoint local area and the second endpoint local area in the skeleton shape image, and the first endpoint local area and the second endpoint local area are located at both ends of the skeleton respectively;
  • a feature extraction unit used to extract the first direction gradient histogram HOG feature at one end of the skeleton based on the local area of the first endpoint;
  • the feature extraction unit is also used to extract the second HOG feature at the other end of the skeleton based on the local area of the second endpoint;
  • the identification unit is used to identify one end of the skeleton and the other end of the skeleton respectively based on the first HOG feature and the second HOG feature to obtain the head endpoint and the tail endpoint.
  • the identification unit is used to:
  • the first recognition result is used to characterize whether one end of the skeleton is a head endpoint or a tail endpoint;
  • the second HOG feature is input into the head and tail recognition model, and the second HOG feature is classified into two categories through the head and tail recognition model to obtain the second recognition result.
  • the second recognition result is used to characterize whether the other end of the skeleton is the head endpoint or the tail endpoint;
  • the head-to-tail recognition model is used to determine whether the endpoints in the skeleton of the target object belong to the head endpoints or tail endpoints based on the HOG features of the local area of the endpoints.
  • the motion analysis module 1703 includes:
  • the sampling submodule is used to sample the skeleton shape of the target object based on the skeleton shape information, and obtain a feature vector composed of the skeleton tangential angles of multiple sampling points.
  • the skeleton tangential angle is represented from the head endpoint to the tail endpoint. The angle between the tangent line and the horizontal line corresponding to the sampling point as the tangent point on the directed skeleton form;
  • the feature vector is decomposed into the sum of the products of multiple preset feature vectors and multiple feature values, and a feature value sequence composed of the multiple feature values is determined as motion component information.
  • the motion analysis module 1703 is also used to:
  • the movement of the target object within the observation time period is analyzed to obtain the kinematic characteristics of the target object during the observation time period.
  • Figure 18 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • the terminal 1800 is an exemplary illustration of a computer device.
  • the terminal 1800 includes: a processor 1801 and a memory 1802.
  • the processor 1801 includes one or more processing cores, such as a 4-core processor, an 8-core processor, etc.
  • the processor 1801 adopts at least one of DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), and PLA (Programmable Logic Array, programmable logic array). implemented in hardware form.
  • the processor 1801 includes a main processor and a co-processor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit);
  • a coprocessor is a low-power processor used to process data in standby mode.
  • the processor 1801 is integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is responsible for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 1801 also includes an AI (Artificial Intelligence, artificial intelligence) processor, which is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • memory 1802 includes one or more computer-readable storage media, which optionally are non-transitory.
  • the memory 1802 also includes high-speed random access memory, and non-volatile memory, such as one or more disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 1802 is used to store at least one program code, and the at least one program code is used to be executed by the processor 1801 to implement the methods provided by various embodiments of this application. Microscopic image processing methods.
  • the terminal 1800 optionally further includes: a peripheral device interface 1803 and at least one peripheral device.
  • the processor 1801, the memory 1802 and the peripheral device interface 1803 can be connected through a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 1803 through a bus, a signal line, or a circuit board.
  • the peripheral device includes: at least one of a display screen 1805 and a power supply 1808.
  • the peripheral device interface 1803 may be used to connect at least one I/O (Input/Output) related peripheral device to the processor 1801 and the memory 1802 .
  • the processor 1801, the memory 1802, and the peripheral device interface 1803 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1801, the memory 1802, and the peripheral device interface 1803 or Both are implemented on separate chips or circuit boards. This embodiment This is not limited.
  • the display screen 1805 is used to display UI (User Interface, user interface).
  • the UI includes graphics, text, icons, videos, and any combination thereof.
  • display screen 1805 also has the ability to collect touch signals on or above the surface of display screen 1805 .
  • the touch signal can be input to the processor 1801 as a control signal for processing.
  • the display screen 1805 is also used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 1805 there is one display screen 1805, which is provided on the front panel of the terminal 1800; in other embodiments, there are at least two display screens 1805, which are respectively provided on different surfaces of the terminal 1800 or have a folding design; in some implementations
  • the display screen 1805 is a flexible display screen, which is disposed on the curved surface or folding surface of the terminal 1800 .
  • the display screen 1805 is set in a non-rectangular irregular shape, that is, a special-shaped screen.
  • the display screen 1805 is made of LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, organic light-emitting diode) and other materials.
  • the power supply 1808 is used to power various components in the terminal 1800.
  • power source 1808 is AC, DC, disposable or rechargeable batteries.
  • the rechargeable battery supports wired charging or wireless charging.
  • the rechargeable battery is also used to support fast charging technology.
  • FIG 19 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device 1900 may vary greatly due to different configurations or performance.
  • the computer device 1900 includes one or more processors (Central Processing Units, CPU) 1901 and one or more memories 1902, wherein at least one computer program is stored in the memory 1902, and the at least one computer program is loaded and executed by the one or more processors 1901 to implement the functions provided by the above embodiments. Microscopic image processing methods.
  • the computer device 1900 also has components such as a wired or wireless network interface, a keyboard, and an input and output interface to facilitate input and output.
  • the computer device 1900 also includes other components for realizing device functions, which will not be described again here.
  • a computer-readable storage medium such as a memory including at least one computer program.
  • the at least one computer program can be executed by a processor in a terminal to complete the microscopic images in each of the above embodiments. processing method.
  • the computer-readable storage media includes ROM (Read-Only Memory), RAM (Random-Access Memory), CD-ROM (Compact Disc Read-Only Memory), Tapes, floppy disks and optical data storage devices, etc.
  • a computer program product or computer program including one or more program codes, the one or more program codes being stored in a computer-readable storage medium.
  • One or more processors of the computer device can read the one or more program codes from the computer-readable storage medium, and the one or more processors execute the one or more program codes so that the computer device can execute to complete The microscopic image processing method in the above embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种显微图像的处理方法、装置、计算机设备及存储介质。方法包括:对显微图像进行实例分割,得到实例图像;对所述实例图像中的目标对象进行骨架提取,得到所述目标对象的骨架形态信息;基于所述骨架形态信息,对所述目标对象进行运动分析,得到多个特征值;将所述多个特征值构成的特征值序列,确定为所述目标对象的运动成分信息。

Description

显微图像的处理方法、装置、计算机设备及存储介质
本申请要求于2022年07月19日提交的申请号为202210849205.1,发明名称为“显微图像的处理方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,特别涉及一种显微图像的处理方法、装置、计算机设备及存储介质。
背景技术
线虫是一种经典的、生命周期较短的多细胞生物,由于具有个体小、易培养、可如微生物一样进行大批量操作、构成身体的细胞数量较少等特点,因此存在对线虫的形态和谱系的研究需求。
发明内容
本申请实施例提供了一种显微图像的处理方法、装置、计算机设备及存储介质,能够节约显微图像分析的人力成本、提升显微图像的分析效率。该技术方案如下:
一方面,提供了一种显微图像的处理方法,该方法包括:
对显微图像进行实例分割,得到实例图像,所述实例图像包含所述显微图像中的目标对象;
对所述实例图像中的目标对象进行骨架提取,得到所述目标对象的骨架形态信息,所述骨架形态信息表征所述目标对象的骨架形态;
基于所述骨架形态信息,对所述目标对象进行运动分析,得到多个特征值,所述多个特征值用于表征合成所述骨架形态时多种预设运动状态的加权系数;
将所述多个特征值构成的特征值序列,确定为所述目标对象的运动成分信息。
一方面,提供了一种显微图像的处理装置,该装置包括:
实例分割模块,用于对显微图像进行实例分割,得到实例图像,所述实例图像包含所述显微图像中的目标对象;
骨架提取模块,用于对所述实例图像中的目标对象进行骨架提取,得到所述目标对象的骨架形态信息,所述骨架形态信息表征所述目标对象的骨架所处的形态;
运动分析模块,用于基于所述骨架形态信息,对所述目标对象进行运动分析,得到多个特征值,将所述多个特征值构成的特征值序列,确定为所述目标对象的运动成分信息,所述多个特征值用于表征合成所述骨架形态时多种预设运动状态的加权系数。
一方面,提供了一种计算机设备,该计算机设备包括一个或多个处理器和一个或多个存储器,该一个或多个存储器中存储有至少一条计算机程序,该至少一条计算机程序由该一个或多个处理器加载并执行以实现如上述显微图像的处理方法。
一方面,提供了一种存储介质,该存储介质中存储有至少一条计算机程序,该至少一条计算机程序由处理器加载并执行以实现如上述显微图像的处理方法。
一方面,提供一种计算机程序产品或计算机程序,所述计算机程序产品或所述计算机程序包括一条或多条程序代码,所述一条或多条程序代码存储在计算机可读存储介质中。计算机设备的一个或多个处理器能够从计算机可读存储介质中读取所述一条或多条程序代码,所述一个或多个处理器执行所述一条或多条程序代码,使得计算机设备能够执行上述显微图像的处理方法。
附图说明
图1是本申请实施例提供的一种圆形目标的分割方法的原理性流程图;
图2是本申请实施例提供的一种传统骨架提取方法的原理性流程图;
图3是本申请实施例提供的一种线虫泳动频率的分析示意图;
图4是本申请实施例提供的一种显微图像的处理方法的实施环境示意图;
图5是本申请实施例提供的一种显微图像的处理方法的流程图;
图6是本申请实施例提供的一种显微图像的处理方法的流程图;
图7是本申请实施例提供的一种双图层实例分割模型的分割原理示意图;
图8是本申请实施例提供的一种对两个目标对象的实例分割方式的流程图;
图9是本申请实施例提供的双图层实例分割模型的原理性示意图;
图10是本申请实施例提供的一种合成样本图像的合成流程图;
图11是本申请实施例提供的一种骨架提取模型的训练和预测阶段的原理性示意图;
图12是本申请实施例涉及的一种识别头部端点和尾部端点的方法流程图;
图13是本申请实施例提供的一种截取端点局部区域的示意图;
图14是本申请实施例提供的一种对目标对象进行运动分析的流程图;
图15是本申请实施例提供的一种目标对象的运动分析原理图;
图16是本申请实施例提供的一种显微图像的处理方法的原理性流程图;
图17是本申请实施例提供的一种显微图像的处理装置的结构示意图;
图18是本申请实施例提供的一种终端的结构示意图;
图19是本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
以下,对本申请实施例涉及的术语进行解释说明:
线虫:本申请涉及的目标对象的一种示例,线虫是一种经典的模式生物,作为一种生命周期短的多细胞生物,其个体小、容易培养,可以如同微生物一样进行大批量操作,且构成线虫身体的细胞数量相对较少,可以对构成细胞的形态和谱系进行穷尽性研究。线虫的上皮层的上方可形成一层主要由胶原、脂质、糖蛋白组成的表皮(Cuticle),这一层表皮是线虫具有保护作用的外骨架(Exoskeleton),是其维持形态所必需的结构。
OSTU算法:是由日本学者大津(Nobuyuki Otsu)于1979年提出的一种自适合于双峰情况的自动求取阈值的方法,又叫大津法、最大类间方差法、最大方差自动取阈值法等。OSTU算法按图像的灰度特性,将图像分成背景和目标两部分,背景和目标之间的类间方差越大,说明构成图像的两部分的差别越大,当部分目标错分为背景或部分背景错分为目标时,都会导致两部分差别变小,因此,使类间方差最大的分割意味着错分概率最小。
分水岭算法(Watershed Algorithm):也称分水岭分割方法,是一种基于拓扑理论的数学形态学的分割方法,其基本思想是把图像看作是测地学上的拓扑地貌,图像中每一点像素的灰度值表示该点的海拔高度,每一个局部极小值及其影响区域称为集水盆,而集水盆的边界则形成分水岭。分水岭的概念和形成可以通过模拟浸入过程来说明。在每一个局部极小值表面,刺穿一个小孔,然后把整个模型慢慢浸入水中,随着浸入的加深,每一个局部极小值的影响域慢慢向外扩展,在两个集水盆汇合处构筑大坝,即形成分水岭。
距离变换(Distance Transform):对于二值图像,将前景像素的值转化为该点到最近的背景点的距离,或将背景像素的值转化为该点到最近的前景点的距离。
骨架提取算法:图像领域的骨架提取算法,实际上就是提取目标在图像上的中心像素轮廓。换言之,以目标中心为准,对目标进行细化,一般细化后的目标都是单层像素宽度。目前的骨架提取算法可划分为迭代算法和非迭代算法,以迭代算法为例,通常针对二值图像(如掩膜)进行操作,通过从目标外围往目标中心,利用以待检测像素为中心的3×3尺寸的像素窗口的特征,对目标不断腐蚀细化,直至腐蚀到不能再腐蚀(单层像素宽度),就得到了图像的骨架。
感兴趣区域(Region Of Interest,ROI):在机器视觉、图像处理领域中,从被处理的图像以方框、圆、椭圆、不规则多边形等方式勾勒出需要处理的区域,称为ROI。在对显微 图像的处理场景下,ROI是指显微图像中包含待观测的目标对象的区域,比如,以矩形框的方式框出的包含目标对象的矩形区域,或者还可以以圆形框、椭圆形框或者其他不规则多边形框来圈定ROI,ROI是本次图像分析所关注的重点(即仅关心包含目标对象的前景区域,对其余背景区域不关心)。
方向梯度直方图(Histogram of Oriented Gradient,HOG)特征:HOG特征是一种在计算机视觉和图像处理中用来进行物体检测的特征描述子,HOG特征通过计算和统计图像局部区域的梯度方向直方图来构成特征。HOG特征的主要思想是:在一幅图像中,局部目标的表象和形状能够被梯度或边缘的方向密度分布很好地描述,其本质为:梯度的统计信息,而梯度主要存在于边缘的地方。在本申请实施例中,涉及到运用骨架端点的HOG特征来进行端点的头尾识别(或头尾分类、头尾检测),即判断当前端点是线虫的头部端点还是尾部端点。
支持向量机(Support Vector Machine,SVM):SVM是一类按监督学习方式对数据进行二元分类的广义线性分类器,其决策边界是对学习样本求解的最大边距超平面。SVM使用铰链损失函数(Hinge Loss)来计算经验风险(Empirical Risk),并在求解系统中加入了正则化项以优化结构风险(Structural Risk),是一个具有稀疏性和稳健性的分类器。在本申请实施例中,涉及到利用SVM来对骨架端点的HOG特征进行二分类识别,以判断当前端点是线虫的头部端点还是尾部端点。
尺度:信号的尺度空间是指通过一系列单参数、宽度递增的高斯滤波器将原始信号滤波得到一组低频信号,而图像特征的尺度空间则是指以针对图像提取到的图像特征作为上述原始信号。图像特征的金字塔化能高效地对图像特征进行多尺度的表达,通常,会从一个最底层的特征(即原始尺度特征)进行向上采样,将采样得到的一系列特征与该底层特征进行融合,能够得到高分辨率、强语义的特征(即加强了特征的提取)。
在相关技术中,对于培养皿中的线虫,通常会在显微镜下进行观察,并通过显微镜上的CCD图像传感器来对线虫进行成像分析,以输出线虫的显微图像。传统研究方式下,主要依赖于人工对显微图像进行分析。比如,人工对显微图像中出现的线虫进行计数、分割、形态学测量和运动学分析,上述对显微图像的人工分析方式显然耗费的人力成本高、分析效率低。
传统的线虫分割方法如图1所示,图1是本申请实施例提供的一种圆形目标的分割方法的原理性流程图。在通过显微镜CCD图像传感器采集到原图101之后,通常利用OSTU算法来提取目标前景,以得到前景分割图102。接着,在前景分割图102中做距离变换,以得到每个目标的中心点,形成经过距离变换的图像103。接着,在经过距离变换的图像103的基础上,以这些中心点为种子执行分水岭算法,实现多目标分割任务,得到实例分割结果104。这里以原始101中的每个目标均为圆形目标为例进行说明。
上述传统的线虫分割方法,对图像质量的要求比较高,要求图像中没有什么干扰的杂质,并且对显微镜采集的CCD图像的图像信噪比的要求也比较高,在信噪比低或者杂质较多的情况下,分割准确率会大大降低。并且,OSTU算法和距离变换都有较多的可选参数需要技术人员进行人工调试,因此所需人力成本较高、分析效率较低。此外,这一线虫分割方法无法处理存在重叠(即两个线虫目标之间存在交叠部分)或者卷曲(单个线虫目标由于卷曲形成不同身体部分的自我重叠)等复杂线虫目标。而线虫作为一种软体生物,本身在观测过程中又很容易形成自我重叠或卷曲的形状,但传统线虫分割方法在重叠区域是无法处理的。
传统的骨架提取算法如图2所示,图2是本申请实施例提供的一种传统骨架提取方法的原理性流程图。在实例分割结果的基础上,可以从显微镜采集到的原图中裁剪得到仅包含单个线虫目标的单实例原图201。接着,对单实例原图201执行骨架提取算法,得到单个线虫目标的骨架图202。再对骨架图202进行剪枝等后处理,剪枝掉一些细小分叉,即可得到表征线虫头尾之间的骨架的骨架图203。
上述传统的骨架提取算法,强烈依赖于目标的先验知识应用。如线虫是一种长条形的软体目标,相应的骨架提取就需要把这一先验知识融合进去。并且,传统的骨架提取算法容易 产生较多毛刺等噪音骨架,需要进行后处理,处理效率较低,耗费处理资源较多。
传统的运动学参数分析方式,通常会分析线虫泳动频率和身体弯曲频率,主要依赖于技术人员肉眼计数进行统计。线虫泳动频率是指线虫在1min(分钟)的时间内头部摆动的次数(指线虫头部从一侧摆向另一侧后再摆回来,定义为1次头部摆动),身体弯曲频率是指相对于身体长轴方向上的1个波长移动定义为1次身体弯曲。如图3所示,图3是本申请实施例提供的一种线虫泳动频率的分析示意图。技术人员在显微镜下观测线虫运动时,可以从观测图像301中寻找线虫的头尾等关键节点。接着,将线虫从图像302中的形态A运动到图像303中的形态B的过程,视为线虫实现1次头部摆动,从而由技术人员在一段时间内人工对线虫的头部摆动次数进行统计。再将头部摆动次数除以统计耗费的分钟数即可得到线虫泳动频率。
上述传统的运动学参数分析方式,由于线虫运动速度较快,因此人工判断计数时容易产生误计数等问题,所需人力成本极高,分析效率较低。并且线虫泳动频率和身体弯曲频率仅能够对线虫进行简单的运动评估,而不能分析深层次的形态学测量和运动学分析,分析精度也比较差。
有鉴于此,本申请实施例提供一种基于深度学习的显微图像分析方法。尤其针对通过显微镜CCD获取到的包含目标对象(如线虫)的显微图像,设计了从多目标(如多线虫)的实例分割,到骨架提取,再到基于主成分分析的一整套基于深度学习的图像分析方式。其中间步骤不需要技术人员进行人工操作,从而为后续的计数、分割、形态学测量和运动学分析提供了快速高效的基础结果。
以目标对象为线虫为例进行说明:一方面,本申请实施例提出一整套基于深度学习的线虫图像分析框架,其中间步骤不需要技术人员手工操作,极大降低了人力成本,提升了分析效率。另一方面,在整体图像分析框架中,涉及一种能够处理重叠多线虫的实例分割方法,能够优化多线虫重叠情况下的实例分割效果,且实例分割完毕后还能够同步进行线虫计数。另一方面,在整体图像分析框架中,涉及一种基于深度学习的骨架提取方法,能够直接输出不带毛刺、分叉等噪音的骨架图,同时能够处理线虫卷曲等情况下的骨架提取。并且基于提取到的骨架图还能够由机器自动化区分线虫的头部和尾部。另一方面,在整体图像分析框架中,涉及一种基于主成分分析的方法,主成分分析能够对提取的线虫骨架进行主成分分解,并通过主成分系数即特征值来快速方便的分析线虫的运动学参数等。总体而言,本申请实施例提供的方法,通过优化了传统针对线虫显微图像的各个流程,能够自动化处理线虫显微图像,从而为后续的计数、分割、形态学测量和运动学分析等下游任务提供快速高效的基础结果。
图4是本申请实施例提供的一种显微图像的处理方法的实施环境示意图。参见图4,该实施环境包括显微镜401、图像采集设备402和计算机设备403,下面进行说明。
显微镜401可以是数码显微镜即视频显微镜,能够将显微镜401观测到的实物图像通过数模转换,使其成像在显微镜401自带的屏幕上或显微镜401外接的计算机设备403上。数码显微镜是将精锐的光学显微镜技术、先进的光电转换技术、液晶屏幕技术完美地结合在一起而开发研制成功的一项产品。
图像采集设备402用于采集显微镜401所观测到的实物图像。比如,当待观测对象是目标对象时,图像采集设备402会采集到包含目标对象的显微图像。以目标对象为线虫为例进行说明,由于培养皿中的线虫是批量培养的,因此通过显微镜401观测线虫时,很可能在目镜视野下包含多个线虫,且这些线虫之间可能会存在重叠的部分,并且单个线虫也可能会由于卷曲而导致自我重叠。
图像采集设备402通常会包含与显微镜401相连接的CCD图像传感器,CCD图像传感器也称为CCD感光元件。CCD是一种半导体器件,能够将显微镜401观测到的光学影像转化成为数字信号。换言之,通过光学系统(例如显微镜物镜和目镜的镜头)在CCD的光接收面上形成从被摄体(即目标对象)发出的光束,将图像的明暗程度光电转换为电荷量,然后顺序 将其读取为电信号。CCD上植入的微小光敏物质称作像素(Pixel)。一块CCD上包含的像素数越多,其提供的画面分辨率也就越高。CCD的作用就像胶片一样,但它是把图像像素转换成数字信号。CCD上有许多排列整齐的电容,能感应光线,并将影像转变成数字信号。经由外部电路的控制,每个小电容能将其所带的电荷转给它相邻的电容。
计算机设备403与图像采集设备402或携带图像采集设备402的显微镜401相连接。计算机设备403可以是终端。终端上安装和运行有支持显示显微图像的应用程序。终端接收图像采集设备401所采集到的显微图像,并将显微图像显示在终端的显示屏上。
在一些实施例中,终端接收并显示显微图像之后,终端本地就支持本申请实施例涉及的显微图像的处理方法,从而终端能够在本地对显微图像进行处理,并显示对显微图像的处理结果。在另一些实施例中,终端可以通过有线或无线通信方式与服务器进行直接或间接地连接,本申请实施例在此不对连接方式进行限制。终端将图像采集设备402发送来的显微图像发送到服务器,由服务器来对显微图像进行处理,并将处理结果返回给终端,终端在显示屏上显示接收到的处理结果。在一些实施例中,服务器承担主要图像处理工作,终端承担次要图像处理工作;或者,服务器承担次要图像处理工作,终端承担主要图像处理工作;或者,服务器和终端两者之间采用分布式计算架构进行协同图像处理。
在一个示例性场景中,服务器训练本申请实施例的显微图像的处理方法所需要使用的双图层实例分割模型、骨架提取模型、头尾识别模型等。接着,服务器将训练完毕的双图层实例分割模型、骨架提取模型、头尾识别模型下发到终端上,以使得终端能够本地支持上述显微图像的处理方法。
在一些实施例中,上述服务器包括一台服务器、多台服务器、云计算平台或者虚拟化中心中的至少一种。比如,服务器是独立的物理服务器,或者是多个物理服务器构成的服务器集群或者分布式系统,或者是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)以及大数据和人工智能平台等基础云计算服务的云服务器。
在一些实施例中,上述终端是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、MP3(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)播放器、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、电子书阅读器等,但并不局限于此。
本领域技术人员可以知晓,上述终端的数量可以更多或更少。比如上述终端可以仅为一个,或者上述终端为几十个或几百个,或者更多数量。本申请实施例对终端的数量和设备类型不加以限定。
以下,对本申请实施例涉及的显微图像的处理方法的基本流程进行介绍。
图5是本申请实施例提供的一种显微图像的处理方法的流程图。参见图5,该显微图像的处理方法由计算机设备执行,以计算机设备为终端为例进行说明,该实施例包括下述步骤:
501、终端对显微图像进行实例分割,得到实例图像,该实例图像包含该显微图像中的目标对象。
终端是用于存储和处理显微图像的计算机设备,本申请实施例以计算机设备是终端为例进行说明。可选地,计算机设备也可以被提供为服务器,本申请实施例对此不进行具体限定。
显微图像是指显微镜的CCD图像传感器采集到的对待观测对象进行观测的光学影像。比如,显微镜携带CCD图像传感器。CCD图像传感器作为图像采集设备的一种示例性说明,能够将显微镜观测到的光学影像转化成电信号,并形成可被终端读取并显示的显微图像。CCD图像传感器生成显微图像之后,将显微图像发送至终端。
在一些实施例中,终端接收显微镜的CCD图像传感器发送的显微图像。这一显微图像可以是指CCD图像传感器发送来的单张显微图像,也可以是指CCD图像传感器发送来的连续的观测视频流中的任一图像帧,本申请实施例对显微图像的类型不进行具体限定。
需要说明的是,由于显微镜对目标对象的观测可能是一个连续的动作,因此CCD图像传感器可能会采集到一段连续的图像帧(构成一段观测视频流)。接着,CCD图像传感器将采集到的各个图像帧发送到终端。发送时可以是逐帧依次传输,也可以是分成多个视频片段分段传输,本申请实施例对传输方式不进行具体限定。
在一些实施例中,终端除了直接获取显微镜的CCD图像传感器发送来的显微图像之外,还可以对本地存储的显微图像进行处理,或者,对从服务器中下载的显微图像进行处理,本申请实施例对显微图像的来源不进行具体限定。
在一些实施例中,终端获取到显微图像之后,由于显微图像是指使用显微镜对待观测的目标对象进行观测的光学影像,因此显微图像中必然包含一个或多个目标对象。为了对目标对象进行深入分析,终端在显微图像的基础上,对显微图像进行实例分割,以将显微图像中包含的每个目标对象都分割出来,得到包含单个目标对象的实例图像,作为对显微图像的单实例分割结果。比如,每个目标对象的实例图像包括该目标对象的一张轮廓图和一张掩膜图。其中,目标对象的轮廓图用于指示单个目标对象在显微图像中所具有的边缘和形状。目标对象的掩膜图是指用于指示单个目标对象的显微图像中所处位置和所占区域。其中,上述实例分割是指图像实例分割(Instance Segmentation),是在语义检测(Semantic Segmentation)的基础上进一步细化,分离对象的前景与背景,实现像素级别的对象分离,能够分割出同一个类中的不同实例的物体。其中,实例可以是器官、组织或者细胞等。在本申请实施例中,实例分割用于分割目标对象,也即线虫。
以目标对象为线虫为例进行说明,由于培养皿中的线虫是批量培养的,因此通过显微镜观测培养皿中的线虫时,很可能在目镜视野下会观测到多个线虫,且这些线虫之间可能会存在重叠的部分,并且单个线虫也可能会由于卷曲而导致自我重叠。而本申请实施例在进行实例分割时,能够针对多目标对象重叠或单目标对象卷曲的复杂场景,都具有很好的实例分割效果,具体实例分割方式将在下一实施例中详细说明,这里不做赘述。
502、终端对该实例图像中的目标对象进行骨架提取,得到该目标对象的骨架形态信息,该骨架形态信息表征该目标对象的骨架形态。
在一些实施例中,终端针对显微图像中的每个目标对象都进行实例分割之后,可以对每个目标对象都输出实例图像。可选地,每个目标对象的实例图像包括一张轮廓图和一张掩膜图,针对每个目标对象的掩膜图可以进行骨架提取,以通过骨架提取算法,来获取到当前的目标对象的骨架形态信息。在涉及多个目标对象的情况下,需要对每个目标对象都分别提取到各自的骨架形态信息。比如,轮廓图和掩膜图都是二值图像。在轮廓图中轮廓像素点和非轮廓像素点具有不同的取值。在掩膜图中属于本目标对象的像素点和不属于本目标对象的像素点具有不同的取值。
示意性地,在轮廓图中,取值为1的像素点是轮廓像素点、取值为0的像素点是非轮廓像素点;又或者取值为0的像素点是轮廓像素点、取值为1的像素点是非轮廓像素点,本申请实施例对此不进行具体限定。轮廓像素点是指用于表征目标对象的轮廓(即边缘)的像素点,而非轮廓像素点即用于表示不是目标对象的轮廓的像素点。
示意性地,在掩膜图中,取值为1的像素点是属于本目标对象的像素点,取值为0的像素点是不属于本目标对象的像素点(可能是背景像素点,或者其他目标对象的像素点);又或者,取值为0的像素点是属于本目标对象的像素点,取值为1的像素点是不属于本目标对象的像素点,本申请实施例对此不进行具体限定。
在一些实施例中,终端对每个目标对象,都在目标对象的掩膜图上运行骨架提取算法,以输出目标对象的骨架形态信息。可选地,骨架形态信息至少包括一张基于掩膜图生成的骨架形态图像,在骨架形态图像中的目标对象的骨架具有单层像素宽度。例如,骨架形态图像也是一张二值图像,在骨架形态图像中骨架像素点和非骨架像素点具体不同的取值。
示意性地,在骨架形态图像中,取值为1的像素点是骨架像素点、取值为0的像素点是 非骨架像素点;又或者取值为0的像素点是骨架像素点、取值为1的像素点是非骨架像素点,本申请实施例对此不进行具体限定。骨架像素点是指用于表征目标对象的骨架的像素点,非骨架像素点即用于表征不是目标对象的骨架的像素点。
在这种情况下,骨架形态图像中的骨架像素点能够形成一条单层像素宽度的目标对象的骨架,这一骨架所具有的形态代表了目标对象在该显微图像中所呈现的骨架形态。
在本申请实施例中,通过在实例分割结果的基础上,对每个分割所得的目标对象的实例图像,都应用骨架提取算法来提取到骨架形态图像,这样方便了在目标对象的骨架形态图像上进行运动学分析,能够提升运动学分析的分析效率,而无需人工参与或肉眼计数。
503、终端基于该骨架形态信息,对该目标对象进行运动分析,得到多个特征值,将该多个特征值构成的特征值序列,确定为目标对象的运动成分信息。
在一些实施例中,终端在上述步骤502获取到的每个目标对象的骨架形态信息,至少包括每个目标对象的骨架形态图像。接着,终端在该骨架形态图像上来进行运动分析,这样能够将目标对象当前的骨架形态,分解成为多种预设的运动状态的组合,并对每种预设运动状态可以确定用于表征这种运动状态在分解时所贡献的特征值。这一特征值代表了要合成目标对象当前的骨架形态,需要对这种运动状态施加多大的加权系数,特征值比较大的运动状态可以作为当前的骨架形态的主成分。而作为主成分的每种运动状态的特征值能够形成一个特征值序列,这个特征值序列即可作为目标对象的运动成分信息。详细运动学分析方式将在下一实施例中进行说明,这里不做赘述。
在本申请实施例中,通过在每个目标对象的骨架形态图像上,进行运动学分析,能够将目标对象当前的骨架形态,分解成为多种预设的运动状态的组合,从而能够将任意形状的骨架形态,都表示为使用多个特征值来分别对多种预设运动状态进行加权后合成的运动。上述处理方式能够对目标对象进行更加深层次和细致的运动学分析,尤其是针对目标对象是线虫的情况,并不局限于人工肉眼计数来分析线虫泳动频率或身体弯曲频率,而是能够对线虫进行更加准确、效率更高且无需人工干预的运动学分析。
上述所有可选技术方案,能够采用任意结合形成本公开的可选实施例,在此不再赘述。
本申请实施例提供的方法,通过对显微图像中包含的目标对象进行实例分割,以确定出来每个目标对象的实例图像即单实例分割结果。并在单实例分割结果上提取出来骨架形态信息,以在骨架形态信息的基础上进行运动分析和运动成分分解,能够将每个目标对象的当前所具有的复杂骨架形态,分解成多个预设的运动状态之间的组合。整体处理流程无需人工干预,机器能够自动化实现,极大了降低了人力成本、提升了分析效率。此外,基于输出的运动成分信息还能够进行深层的形态学测量和运动学分析,因此也提升了对目标对象进行分析的精准程度。
以下,对本申请实施例涉及的显微图像的处理方法的详细流程进行介绍。
图6是本申请实施例提供的一种显微图像的处理方法的流程图,参见图6,该显微图像的处理方法由计算机设备执行,以计算机设备为终端为例进行说明,该实施例包括下述步骤:
601、终端从显微图像中,确定该显微图像中所包含的目标对象所在的ROI,该ROI中包含互相重叠的多个目标对象。
在一些实施例中,终端通过上述步骤501中描述的方式来获取显微图像。在获取到显微图像之后,对任一显微图像,终端可以将该显微图像还输入到一个对象检测模型中,通过对象检测模型来对该显微图像中的各个目标对象进行对象检测(也称为物体检测或目标检测),对象检测模型输出该显微图像中ROI的候选框的位置信息。比如,该位置信息包括:候选框的左上角顶点坐标(x,y)以及候选框的宽度w和高度h。即,该位置信息是一个形如(x,y,w,h)的四元组数据,或者,还可以使用左下角顶点坐标、右上角顶点坐标、右下角顶点坐标来定位候选框,本申请实施例对此不进行具体限定。
在一些实施例中,上述对象检测模型可以是任一支持对象检测的机器学习模型。比如, 该对象检测模型可以是:R-CNN(Region with CNN features,基于CNN的区域对象检测,其中,CNN是指Convolutional Neural Network,即卷积神经网络)、Fast R-CNN(快速R-CNN)、Faster R-CNN(更快速R-CNN)或者FCOS(Fully Convolutional One-Stage,全卷积一阶段)等,本申请实施例对该对象检测模型的结构不进行具体限定。
在上述过程中,通过从显微图像中提取出来包含目标对象的ROI,能够仅针对包含目标对象的ROI执行后续的实例分割、骨架提取和运动分析,这样对于一些背景区域,或者不包含目标对象的区域,则无需执行上述实例分割、骨架提取和运动分析。上述处理方式不但能够避免非ROI所带来的噪声干扰,提升对ROI的处理精度,而且还能够节约对非ROI的处理操作所占用的处理资源,且能够缩短对显微图像的处理时长,提升对显微图像的处理效率。
需要说明的是,由于目标对象很可能是微生物或线虫等活体研究对象,ROI中有可能会包含一个或多个目标对象。单个目标对象的处理流程是比较简单的,可以直接通过一些传统的实例分割算法(如先后运行OSTU算法、距离变换和分水岭算法)就能够分割出来单实例的目标对象。而多个目标对象的处理流程则是比较复杂的,因为多个目标对象很可能会存在互相重叠的情况,所以,本申请实施例以ROI中包含多个目标对象为例进行说明。此外,即使ROI中仅包含单个目标对象,也可能会由于单个目标对象发生卷曲,导致产生自我重叠的情况,而传统的实例分割算法针对自我重叠的情况实例分割准确度较差。本申请实施例提供的处理流程不但能够提升在多目标对象互相重叠场景下的实例分割准确度,而且还能够提升在单目标对象自我重叠场景下的实例分割准确度。
602、终端提取该ROI的局部图像特征。
在一些实施例中,终端可以将显微图像输入到一个特征提取模型中,通过特征提取模型来提取显微图像的全局图像特征。接着,终端利用上述步骤601中获取到的ROI的位置信息,能够从该全局图像特征中确定出来ROI的局部图像特征。比如,ROI的位置信息为(x,y,w,h)时,假设(x,y)是左上角顶点坐标,这时只需要将全局图像特征先缩放到与显微图像相同的尺寸(如果特征提取模型直接输出的是相同尺寸的全局图像特征,那么无需执行这一缩放步骤)。然后从全局图像特征中找到坐标为(x,y)的特征点。再以该特征点作为左上角顶点,确定一个宽度为w、高度为h的ROI候选框。接着,从该全局图像特征中被ROI候选框所选中的区域包含的各个特征点确定为ROI的局部图像特征。换言之,终端基于该ROI候选框,能够从全局图像特征中裁剪得到ROI所覆盖区域的局部图像特征。
示意性地,上述特征提取模型包括一个残差网络(Residual Networks,Resnet)和一个特征金字塔网络(Feature Pyramid Networks,FPN)。残差子网络用于提取输入图像的像素相关特征。特征金字塔子网络则用于提取输入图像在不同尺度空间下的图像金字塔特征。
可选地,在残差网络中包括多个隐藏层,该多个隐藏层之间采取残差连接。比如,在所有相邻隐藏层均残差连接时,当前隐藏层的输出将会与当前隐藏层的输入在拼接后一起输入到下一个隐藏层中。例如第二个隐藏层的输出会与第二个隐藏层的输入(即第一个隐藏层的输出)在拼接后一起输入到第三个隐藏层中。又比如,在每间隔一个隐藏层进行一次残差连接时,当前隐藏层的输出会与上一个隐藏层的输入在拼接后一起输入到下一个隐藏层中。例如第三个隐藏层的输出会与第二个隐藏层的输入(即第一个隐藏层的输出)在拼接后一起输入到第四个隐藏层中,本申请实施例对残差子网络的结构不进行具体限定。在一个示例中,残差子网络可以是Resnet-34网络、Resnet-50网络、Resnet-101网络等深度残差网络。
可选地,在使用残差网络提取得到原始图像特征之后,将该原始图像特征输入到特征金字塔网络中,通过该特征金字塔网络来从该原始图像特征开始逐级进行向上采样,得到一系列不同尺度空间下的特征金字塔。再将特征金字塔中包含的不同尺度的特征进行融合,即可得到最终的全局图像特征。
在上述过程中,通过对残差网络提取到的原始图像特征进行金字塔化,能够得到一系列不同尺度的特征。再将不同尺度的特征进行融合,得到的全局图像特征能够将高层(即尺度 最小)的特征传递到原始图像特征中,补充底层(即尺度最大)的原始图像特征的语义。上述处理方式能够得到高分辨率、强语义的全局图像特征,有利于进行线虫等小目标的检测。
在一些实施例中,也可以仅使用残差网络来提取全局图像特征。换言之,特征提取模型就是残差网络本身。终端将残差网络提取到的原始图像特征直接作为全局图像特征,再从全局图像特征中裁剪得到局部图像特征,这样能够简化全局图像特征提取流程,节约终端的处理资源。在上述过程中,通过特征提取模型来提取到全局图像特征,再从全局图像特征中裁剪得到局部图像特征,能够较好的保留ROI中边缘部分的图像特征。因为边缘部分的图像特征是与临近的非ROI的像素点息息相关的,所以能够提取到表达能力较好的局部图像特征。
在另一些实施例中,在通过上述步骤601获取到ROI的位置信息,还可以先从显微图像中裁剪得到ROI。接着,仅将ROI输入到特征提取模型中,通过特征提取模型来直接提取ROI的局部图像特征。上述处理方式可以仅对ROI来进行特征提取,从而无需提取整张显微图像的全局图像特征,能够极大节约终端的处理资源。
603、终端将该局部图像特征输入到双图层实例分割模型中,通过该双图层实例分割模型对该局部图像特征进行处理,输出该ROI中多个目标对象各自的轮廓图和掩膜图。
其中,双图层实例分割模型用于对不同对象分别建立图层来获取每个对象各自的实例分割结果。换言之,双图层实例分割模型用于对不同对象分别建立图层来获取每个对象的实例图像。其中,每个对象的实例图像包括该对象的轮廓图和掩膜图。
在一些实施例中,终端在通过上述步骤602提取到ROI的局部图像特征之后,将局部图像特征输入到双图层实例分割模型中。若ROI中包含多个目标对象,双图层实例分割模型会对不同的目标对象分别建立图层,以对每个目标对象分别输出各自的实例图像(即实例分割结果)。可选地,每个目标对象的实例图像包括每个目标对象的轮廓图和掩膜图,从而来表征每个目标对象所具有的轮廓和所占据的掩膜。
在一些实施例中,上述轮廓图和掩膜图均为二值图像,在轮廓图中轮廓像素点和非轮廓像素点具有不同的取值,在掩膜图中属于本目标对象的像素点和不属于本目标对象的像素点具有不同的取值。示意性地,在轮廓图中,取值为1的像素点是轮廓像素点、取值为0的像素点是非轮廓像素点;又或者取值为0的像素点是轮廓像素点、取值为1的像素点是非轮廓像素点,本申请实施例对此不进行具体限定。示意性地,在掩膜图中,取值为1的像素点是属于本目标对象的像素点,取值为0的像素点是不属于本目标对象的像素点(可能是背景像素点,或者其他目标对象的像素点);又或者,取值为0的像素点是属于本目标对象的像素点,取值为1的像素点是不属于本目标对象的像素点,本申请实施例对此不进行具体限定。
在本申请实施例中,为了方便说明,以ROI中包含互相重叠的两个目标对象为例来介绍实例分割流程。为了区分ROI中涉及的两个目标对象,将位于顶层的目标对象称为遮挡对象,将位于底层的目标对象称为被遮挡对象。显然,在两个目标对象的重叠区域,遮挡对象位于顶层并遮盖住了位于底层的被遮挡对象的躯体的一部分。
在一些实施例中,双图层实例分割模型包括遮挡对象图层网络和被遮挡对象图层网络。遮挡对象图层网络用于提取位于顶层的遮挡对象的轮廓和掩膜。被遮挡对象图层网络用于提取位于底层的被遮挡对象的轮廓和掩膜。其中,遮挡对象图层网络和被遮挡对象图层网络以级联的方式部署于双图层实例分割模型中,此时遮挡对象图层网络的输出为被遮挡对象图层网络的输入。
图7是本申请实施例提供的一种双图层实例分割模型的分割原理示意图。如图7所示,针对任一输入图像701(例如,输入图像是指:显微图像,或者显微图像的ROI),在ROI中包含互相重叠的两个目标对象的情况下,双图层实例分割模型会针对位于顶层的遮挡对象和位于底层的被遮挡对象分别建立图层来进行实例分割。比如,在顶层(Top Layer)图层7021中提取遮挡对象(Occluder)的轮廓图和掩膜图。在底层(Bottom Layer)图层7022中提取被遮挡对象(Occludee)的轮廓图和掩膜图。顶层图层7021和底层图层7022能够实现对遮 挡对象和被遮挡对象的双图层分离(Bilayer Decoupling),从而最终分别对不同的目标对象(即不同实例)来输出各自的实例分割结果即实例图像703。比如,对遮挡对象输出轮廓图和掩膜图,以及对被遮挡对象也输出轮廓图和掩膜图。
在具有上述结构的双图层实例分割模型的基础上,对两个目标对象的实例分割方式进行如下说明。图8是本申请实施例提供的一种对两个目标对象的实例分割方式的流程图。如图8所示,这一实例分割方式包括下述步骤6031-6034。
6031、终端将局部图像特征输入遮挡对象图层网络,通过遮挡对象图层网络提取ROI中位于顶层的遮挡对象的第一感知特征。
其中,第一感知特征表征遮挡对象在实例分割任务上的图像特征。
在一些实施例中,遮挡对象图层网络用于对ROI内的遮挡对象的轮廓和掩膜进行显式建模。可选地,遮挡对象图层网络中包括至少一个第一卷积层、至少一个第一图卷积(Graph Convolutional Network,GCN)层和至少一个第二卷积层。上述第一卷积层、第一图卷积层和第二卷积层的相邻层之间串联连接。串联连接是指上一层输出的特征作为当前层的输入信号。其中,第一图卷积层基于非局部注意力机制(non-local attention)进行了简化,也可以称为非局部层(non-local layer),为了减少模型的参数量,使用非局部算子(Non-local operator)操作进行图卷积层的实现。每个像素(pixel)就是一个图形节点(graph node),注意力权重(attention weight)构成了节点之间的连接关系(node connection)。基于上述结构的遮挡对象图层网络,上述步骤6031可通过下述步骤A1-A3来实现。
A1、终端将局部图像特征输入到遮挡对象图层网络的第一卷积层中,通过第一卷积层对局部图像特征进行卷积操作,得到初始感知特征。
在一些实施例中,针对将上述步骤602提取到的ROI的局部图像特征,输入到双图层实例分割模型的遮挡对象图层网络的第一卷积层中,通过第一卷积层对局部图像特征进行卷积操作。比如,以尺寸为3×3的卷积核来对局部图像特征进行卷积操作,输出遮挡对象的初始感知特征。
A2、终端将初始感知特征输入到遮挡对象图层网络的第一图卷积层中,通过第一图卷积层中非局部算子对初始感知特征进行卷积操作,得到图卷积特征。
在一些实施例中,针对上述步骤A1输出的初始感知特征,将该初始感知特征再输入到遮挡对象图层网络的第一图卷积层中,在该第一图卷积层中通过非局部算子(Non-Local Operator)来实现图卷积层。比如,在第一图卷积层中涉及到3个卷积核大小为1×1的卷积层以及1个Softmax(指数归一化)算子。为了便于说明,将上述3个卷积层分别称为φ卷积层、θ卷积层和β卷积层。终端将初始感知特征分别输入到φ卷积层、θ卷积层和β卷积层中,每个卷积层中都使用尺寸为1×1的卷积核对初始感知特征进行卷积操作。接着,终端将φ卷积层输出的特征图和θ卷积层输出的特征图进行按元素相乘,得到融合特征图。终端再使用Softmax算子对融合特征图进行指数归一化处理,得到归一化特征图。终端再将归一化特征图与β卷积层输出的特征图进行按元素相乘,得到目标特征图。接着,终端将目标特征图和该初始感知特征再进行按元素相加,即可得到第一图卷积层的输出结果即图卷积特征。
在上述过程中,在第一图卷积层中通过非局部算子来实现图卷积操作,能够减少图卷积部分的模型参数量。在基于非局部算子的图卷积层中,能够将图像空间中的像素点根据对应特征向量的相似度有效关联起来,实现输入目标区域特征的重新聚合,能够较好地解决同一个对象的像素点在空间上被遮挡截断导致不连续的问题。
A3、终端将图卷积特征输入到遮挡对象图层网络的第二卷积层中,通过第二卷积层对图卷积特征进行卷积操作,得到第一感知特征。
在一些实施例中,针对上述步骤A2输出的图卷积特征,将该图卷积特征输入到一个或多个串连的第二卷积层中,通过第二卷积层来对该图卷积特征进行进一步地卷积操作。比如, 以尺寸为3×3的卷积核来对该图卷积特征进行卷积操作,输出该遮挡对象的第一感知特征。
图9是本申请实施例提供的双图层实例分割模型的原理性示意图。如图9所示,在双图层实例分割模型900中,涉及到遮挡对象图层网络910和被遮挡对象图层网络920。其中,遮挡对象图层网络910中包括1个第一卷积层911、1个第一图卷积层912和2个第二卷积层913-914,第一卷积层911、第一图卷积层912、第二卷积层913-914之间串联连接。假设上述步骤602提取到的ROI的局部图像特征使用符号x来表征,那么先将局部图像特征x输入到遮挡对象图层网络910中,依次通过第一卷积层911提取到初始感知特征,通过第一图卷积层912提取到图卷积特征,通过第二卷积层913-914提取到第一感知特征,由第二卷积层914输出该第一感知特征。
上述双图层实例分割模型900,在Mask RCNN的基础上,添加了一个处理重叠目标的双图层(Overlapping Bi-Layers)模块,针对任一实例分割所得到的单个目标对象,提取到ROI的局部图像特征x(相当于对原始的显微图像进行了ROI池化)。再通过双图层模块来建模遮挡对象和被遮挡对象之间的关系。并将遮挡对象的第一感知特征引入到对被遮挡对象的第二感知特征的计算过程中。上述处理方式能够更好地学习遮挡对象和被遮挡对象之间的相互关系,最终输出较好的在多目标重叠情况下分割结果。
6032、终端基于第一感知特征,获取遮挡对象的轮廓图和掩膜图。
在一些实施例中,终端在通过遮挡对象图层网络提取到遮挡对象的第一感知特征之后,可以对该第一感知特征进行上采样操作,得到该遮挡对象的轮廓图和掩膜图。比如,对该第一感知特征进行上采样,以得到一张与ROI尺寸相同的轮廓图和一张与ROI尺寸相同的掩膜图。又或者,对该第一感知特征进行上采样,以得到一张与显微图像尺寸相同的轮廓图和一张与显微图像尺寸相同的掩膜图。本申请实施例对此不进行具体限定。
在一些实施例中,遮挡对象图层网络还包括一个第一反卷积层,终端将第一感知特征输入到第一反卷积层中,在第一反卷积层中对第一感知特征进行反卷积操作,以得到遮挡对象的轮廓图和掩膜图。这里仅以通过反卷积操作来进行上采样为例进行说明,还可以通过其他方式来实现上采样,本申请实施例对此不进行具体限定。
仍以图9为例继续进行说明,在遮挡对象图层网络910中还包括第一反卷积层915,将第二卷积层914输出的第一感知特征输入到第一反卷积层915中,将会输出一张遮挡对象的轮廓图916和一张遮挡对象的掩膜图917。
6033、终端将局部图像特征和第一感知特征融合所得的融合特征输入到被遮挡对象图层网络,提取得到ROI中位于底层的被遮挡对象的第二感知特征。
其中,第二感知特征表征被遮挡对象在实例分割任务上的图像特征。
在一些实施例中,被遮挡对象图层网络用对ROI内的被遮挡对象的轮廓和掩膜进行显式建模可选地,被遮挡对象图层网络中包括至少一个第三卷积层、至少一个第二图卷积层和至少一个第四卷积层。上述第三卷积层、第二图卷积层和第四卷积层的相邻层之间串联连接,基于上述结构的被遮挡对象图层网络,上述步骤6033可通过下述步骤B1-B4来实现。
B1、终端将局部图像特征和第一感知特征进行融合,得到融合特征。
在一些实施例中,终端将局部图像特征和第一感知特征进行按元素相加,得到融合特征。仍以图9为例进行说明,终端将ROI的局部图像特征x和第二卷积层914所输出的第一感知特征进行按元素相加,得到融合特征。在另一些实施例中,除了按元素相加以外,还可以采用按元素相乘、拼接、双线性汇合等融合方式,本申请实施例对融合方式不进行具体限定。
B2、终端将融合特征输入到被遮挡对象图层网络的第三卷积层中,通过第三卷积层对融合特征进行卷积操作,得到感知交互特征。
在一些实施例中,针对将上述步骤B1获取到的融合特征,输入到双图层实例分割模型的被遮挡对象图层网络的第三卷积层中,通过第三卷积层对融合特征进行卷积操作。比如,以尺寸为3×3的卷积核来对融合特征进行卷积操作,输出被遮挡对象的感知交互特征。
在上述过程中,由于被遮挡对象图层网络的输入信号,不但包含了局部图像特征,还包含了遮挡对象的第一感知特征,因此能够实现对遮挡对象和被遮挡对象的感知交互。即,结合已提取到的遮挡对象的信息和原始的局部图像特征,来共同作用于对被遮挡对象的轮廓和掩膜的建模。通过将遮挡与被遮挡关系同时考虑进来,以进行交互感知,能够有效地区分遮挡对象和被遮挡对象的相邻实例边界,以提升对被遮挡对象的实例分割准确度。
B3、终端将感知交互特征输入到被遮挡对象图层网络的第二图卷积层中,通过第二图卷积层中非局部算子对感知交互特征进行卷积操作,得到图卷积交互特征。
在一些实施例中,针对上述步骤B2输出的感知交互特征,将该感知交互特征再输入到被遮挡对象图层网络的第二图卷积层中,在该第二图卷积层中通过非局部算子来实现图卷积层。比如,在第二图卷积层中涉及到3个卷积核大小为1×1的卷积层以及1个Softmax(指数归一化)算子。为了便于说明,将上述3个卷积层分别称为φ卷积层、θ卷积层和β卷积层。将感知交互特征分别输入到φ卷积层、θ卷积层和β卷积层中,每个卷积层中都使用尺寸为1×1的卷积核对感知交互特征进行卷积操作。接着,终端将φ卷积层输出的特征图和θ卷积层输出的特征图进行按元素相乘,得到融合特征图。终端再使用Softmax算子对融合特征图进行指数归一化处理,得到归一化特征图。终端再将归一化特征图与β卷积层输出的特征图进行按元素相乘,得到目标特征图。接着,终端将目标特征图和该感知交互特征再进行按元素相加,即可得到第二图卷积层的输出结果即图卷积交互特征。
在上述过程中,在第二图卷积层中通过非局部算子来实现图卷积操作,能够减少图卷积部分的模型参数量。在基于非局部算子的图卷积层中,能够将图像空间中的像素点根据对应特征向量的相似度有效关联起来,实现输入目标区域特征的重新聚合。上述处理方式能够较好地解决同一个对象的像素点在空间上被遮挡截断导致不连续的问题。
B4、终端将图卷积交互特征输入到被遮挡对象图层网络的第四卷积层中,通过第四卷积层对图卷积交互特征进行卷积操作,得到第二感知特征。
在一些实施例中,针对上述步骤B3输出的图卷积交互特征,将该图卷积交互特征输入到一个或多个串连的第四卷积层中,通过第四卷积层来对该图卷积交互特征进行进一步地卷积操作。比如,以尺寸为3×3的卷积核来对该图卷积交互特征进行卷积操作,输出该被遮挡对象的第二感知特征。
仍以图9为例继续进行说明,被遮挡对象图层网络920中包括1个第三卷积层921、1个第二图卷积层922和2个第四卷积层923-924。第三卷积层921、第二图卷积层922、第四卷积层923-924之间串联连接。假设上述步骤602提取到的ROI的局部图像特征使用符号x来表征,那么先将局部图像特征x和遮挡对象图层网络910中第二卷积层914输出的第一感知特征进行按元素相加,得到融合特征。再将该融合特征输入到被遮挡对象图层网络920中,依次通过第三卷积层921提取到感知交互特征,通过第二图卷积层922提取到图卷积交互特征,通过第四卷积层923-924提取到第二感知特征,由第四卷积层924输出该第二感知特征。
6034、终端基于第二感知特征,获取被遮挡对象的轮廓图和掩膜图。
在一些实施例中,终端在通过被遮挡对象图层网络提取到被遮挡对象的第二感知特征之后,可以对该第二感知特征进行上采样操作,得到该被遮挡对象的轮廓图和掩膜图。比如,对该第二感知特征进行上采样,以得到一张与ROI尺寸相同的轮廓图和一张与ROI尺寸相同的掩膜图。又或者,对该第二感知特征进行上采样,以得到一张与显微图像尺寸相同的轮廓图和一张与显微图像尺寸相同的掩膜图。本申请实施例对此不进行具体限定。
在一些实施例中,被遮挡对象图层网络还包括一个第二反卷积层。终端将第二感知特征输入到第二反卷积层中,在第二反卷积层中对第二感知特征进行反卷积操作,以得到被遮挡对象的轮廓图和掩膜图。这里仅以通过反卷积操作来进行上采样为例进行说明,还可以通过其他方式来实现上采样,本申请实施例对此不进行具体限定。
仍以图9为例继续进行说明,在被遮挡对象图层网络920中还包括第二反卷积层925,将第四卷积层924输出的该第二感知特征输入到第二反卷积层925中,将会输出一张被遮挡对象的轮廓图926和一张被遮挡对象的掩膜图927。
在上述步骤602-603中,提供了终端对ROI进行实例分割,以确定出该显微图像中包含的至少一个该目标对象各自的轮廓图和掩膜图的一种可能实施方式。即,利用预先训练得到的双图层实例分割模型,来对ROI进行实例分割,以在显微图像中区分出来不同的目标对象实例,以目标对象为线虫为例。需要说明的是,本申请实施例仅以ROI中包含互相重叠的多个目标对象的情况为例进行说明。但在观测过程中也可能ROI中仅包含单个目标对象,此时可以用上述步骤602-603涉及的实例分割方式,也可以采用一些传统图像分割算法进行实例分割,本申请实施例对此不进行具体限定。
在一些实施例中,双图层实例分割模型基于多个合成样本图像训练得到。合成样本图像中包含多个该目标对象。合成样本图像基于多个仅包含单目标对象的原始图像合成得到。换言之,为了提升双图层实例分割模型的分割准确度,在训练阶段就通过包含多个目标对象的合成样本图像来进行训练,这样能够极大提升双图层实例分割模型针对目标对象这类目标的分割准确度。并且,上述方式训练得到的模型能够处理多实例互相重叠,或者单实例自身卷曲导致自我重叠等各类复杂情况,而无需技术人员手工操作。
在一些实施例中,由于手动采集包含多个目标对象的样本图像是比较耗时耗力的,而且还需要技术人员人工添加一些标签信息。因此,可以直接利用多个仅包含单目标对象的原始图像,来合成一张包含多个目标对象的合成样本图像。并且,通过不同的排列组合方式,还可以合成任意形态重叠、包含任意数量个实例的合成样本图像,这样能够在原始图像构成的训练集上,以数据增强方式来合成训练质量更好、训练效果更佳的合成样本图像。上述处理方式能够有利于训练得到实例分割效果更好、精度更高的双图层实例分割模型。
在一些实施例中,在使用多个原始图像合成一个合成样本图像时,可以采取如下合成方式:在目标对象比原始图像中的背景暗的情况下,将该多个原始图像中相同位置像素中的最低像素值,赋值给该合成样本图像中的相同位置像素。换言之,该合成样本图像中每个像素的像素值等于用于合成该合成样本图像的多个原始图像中相同位置像素中的最低像素值。例如,在线虫比背景暗(更靠近黑色)的情况下,假设原始图像包括图像1和图像2,则通过逐像素取min(图像1,图像2)的方式来合成一张合成样本图像。
图10是本申请实施例提供的一种合成样本图像的合成流程图。如图10所示,以目标对象为线虫为例进行说明,假设已有2张原始图像1001和1002,通过逐像素取min(原始图像1001,原始图像1002)的方式,能够合成一张合成样本图像1003,且保证合成样本图像1003中每个像素的像素值,均等于原始图像1001和1002中相同位置像素中的最低像素值。比如,原始图像1001中坐标(10,21)的像素的像素值为245,原始图像1002中坐标(10,21)的像素的像素值为200,则在合成样本图像1003中将坐标(10,21)的像素的像素值赋值为min(245,200)=200。即,在合成样本图像1003中将坐标(10,21)的像素的像素值赋值为原始图像1001和原始图像1002中坐标(10,21)的像素的像素值中的最小值(即最低像素值)。
在另一些实施例中,在使用多个原始图像合成一个合成样本图像时,还可以采取如下合成方式:在目标对象比原始图像中的背景亮的情况下,将多个原始图像中相同位置像素中的最高像素值,赋值给该合成样本图像中的相同位置像素。换言之,合成样本图像中每个像素的像素值等于多个原始图像中相同位置像素中的最高像素值。例如,在线虫比背景亮(更靠近白色)的情况下,假设原始图像包括图像1和图像2,则通过逐像素取max(图像1,图像2)的方式来合成一张合成样本图像。
通过上述合成样本图像的获取方式,能够通过有限的仅包含单个目标对象的原始图像,来合成大量包含重叠的多个目标对象的训练数据即合成样本图像。上述处理方式在合成样本 图像构成的增强训练集上训练双图层实例分割模型时,有利于训练得到实例分割效果更好、精度更高的双图层实例分割模型。
在上述步骤601-603中,提供了对显微图像进行实例分割,得到包含该显微图像中目标对象的实例图像的一种可能实施方式。其中,实例图像包含目标对象的轮廓图和掩膜图。即,以先从显微图像中提取ROI,再针对ROI进行实例分割为例进行说明,这样无需对整张显微图像都运行实例分割算法,能够节约终端的计算资源。在另一些实施例中,终端也可以对整张显微图像进行实例分割,能够避免在提取ROI时遗漏掉部分体积较小的目标对象。
604、对该ROI中的任一目标对象,终端将该目标对象的掩膜图输入到骨架提取模型中,通过该骨架提取模型来对该目标对象进行骨架提取,输出该目标对象的骨架形态图像。
上述步骤604中,由于目标对象的实例图像包括轮廓图和掩膜图,终端将每个目标对象的实例图像中的掩膜图输入到骨架提取模型中,通过该骨架提取模型来对目标对象进行骨架提取,输出目标对象的骨架形态图像。其中,该骨架提取模型用于基于目标对象的实例图像中的掩膜图来预测目标对象的骨架形态。
在一些实施例中,对ROI中包含的每个目标对象,都可以通过本步骤604来提取到目标对象的骨架形态图像。可选地,骨架提取模型是一个包含多个卷积层的CNN模型,终端将目标对象的掩膜图输入到骨架提取模型的多个卷积层中,通过多个卷积层对目标对象的掩膜图进行卷积操作,输出骨架形态图像。比如,终端将遮挡对象或被遮挡对象的掩膜图输入到骨架提取模型中,通过骨架提取模型中串连的多个卷积层,来对掩膜图进行卷积操作,由最后一个卷积层输出骨架形态图像,在骨架形态图像中的目标对象的骨架具有单层像素宽度。
可选地,骨架形态图像是一张二值图像,在骨架形态图像中骨架像素点和非骨架像素点具体不同的取值。示意性地,在骨架形态图像中,取值为1的像素点是骨架像素点、取值为0的像素点是非骨架像素点。又或者取值为0的像素点是骨架像素点、取值为1的像素点是非骨架像素点。本申请实施例对此不进行具体限定。在这种情况下,骨架形态图像中的骨架像素点能够形成一条单层像素宽度的目标对象的骨架,这一骨架所具有的形态代表了目标对象在该显微图像中所呈现的骨架形态。
在本申请实施例中,通过在实例分割结果即实例图像的基础上,对每个分割所得的目标对象的实例,都应用骨架提取算法来提取到骨架形态图像。上述处理方式方便了在目标对象的骨架形态图像上进行运动学分析,能够提升运动学分析的分析效率,而无需人工参与或肉眼计数。
在一些实施例中,骨架提取模型基于包含目标对象的样本图像和对目标对象标注的骨架形态标签信息训练得到。可选地,骨架形态标签信息包括对样本图像中目标对象的骨架形态进行采样的多个采样点各自的骨架切向角。骨架切向角表征在从头部端点指向尾部端点的有向骨架形态上以采样点作为切点所对应切线与水平线之间的夹角。换言之,对任一包含目标对象的样本图像,可由技术人员标注好目标图像的骨架形态以及骨架形态上的头部端点和尾部端点,这样能够形成一条从头部端点指向尾部端点的有向骨架形态。接着,针对已标注的有向骨架形态进行采样,先确定有向骨架形态上的多个采样点,再对每个采样点,都在有向骨架形态上都生成一条以采样点作为切点的一条切线,将切线与水平线之间的夹角确定为采样点的骨架切向角。重复执行上述操作,即可得到多个采样点各自的骨架切向角。将多个采样点各自的骨架切向角确定为对样本图像的骨架形态标签信息。对每个样本图像都执行上述操作,即可获取到每个样本图像的骨架形态标签信息。
在上述过程中,通过使用多个采样点各自的骨架切向角来作为每个样本图像的骨架形态标签信息,能够方便对骨架形态的预测准确度进行量化,方便比较骨架提取模型预测出来的骨架形态和样本图像中实际的骨架形态之间的误差,从而便于训练出来针对目标对象更适用的骨架提取模型。
在一些实施例中,骨架提取模型在训练阶段的损失函数值基于多个采样点各自的骨架切 向角和预测切向角之间的误差确定得到。其中,预测切向角基于骨架提取模型对样本图像预测得到的骨架形态图像采样得到。
在训练阶段,在任一次迭代中,会将当前的样本图像输入到该骨架提取模型中,通过该骨架提取模型中串连的该多个卷积层,来对该样本图像进行卷积操作,由最后一个卷积层输出该样本图像的预测骨架图像。接着,按照与样本图像相同的采样方式,从预测骨架图像中也确定多个采样点,并获取到每个采样点各自的预测切向角。预测切向角的获取方式与骨架切向角的获取方式类似,这里不做赘述。接着,可以基于该多个采样点各自的骨架切向角和预测切向角之间的误差,获取当前的样本图像的预测误差。预测误差可以是各个采样点各自的骨架切向角和预测切向角的误差的和值、算术平均值或加权平均值,这里不做具体限定。在本轮迭代中会对每个样本图像都执行上述操作,能够获取到所有样本图像各自的预测误差。基于所有样本图像各自的预测误差,即可确定得到骨架提取模型在本轮迭代中的损失函数值。接着,判断迭代次数或损失函数值是否满足停止训练条件。比如在迭代次数大于次数阈值或损失函数值小于损失阈值时,认为满足停止训练条件,对骨架提取模型停止训练(即调整参数),得到训练完毕的骨架提取模型;否则,在迭代次数小于或等于次数阈值且损失函数值大于或等于损失阈值时,认为不满足停止训练条件,继续对骨架提取模型进行迭代调参。其中,次数阈值是任一大于或等于1的整数,损失阈值是任一大于0的数值。
图11是本申请实施例提供的一种骨架提取模型的训练和预测阶段的原理性示意图。如图11所示,在训练阶段1101中,先由技术人员针对样本图像标注好骨架形态。再基于已标注的骨架形态生成骨架形态标签信息(即多个采样点的骨架切向角)。接着,可在标签数据的基础上,进行形变、翻转、尺寸变换等数据增强方式,来合成更多更丰富的训练数据集。最后,在训练数据集上训练骨架提取模型,并在真实的样本图像上来评估训练完毕的骨架提取模型的骨架提取性能。在预测阶段1102中,针对包含多个目标对象的显微图像,通过实例分割方式,能够从显微图像中定位到单实例即单个目标对象,并生成单个目标对象的掩膜图。接着,将该掩膜图输入到训练完毕的骨架提取模型中,通过骨架提取模型来对目标对象进行骨架提取,得到目标对象的骨架形态图像。
在上述过程中,通过基于标注好的目标对象,确定多个采样点并获取其骨架切向角作为骨架形态标签信息,能够对骨架提取模型预测的骨架形态和样本图像实际标注的骨架形态进行精准量化,便于比较预测骨架和标注骨架之间的误差,从而能够训练得到对目标对象具有精准骨架提取功能的骨架提取模型。并且在实例分割结果中仍然存在自我卷曲等复杂情况,仍然具有良好的骨架提取效果。
605、终端基于该骨架形态图像,识别得到该目标对象的骨架形态中的头部端点和尾部端点。
在一些实施例中,终端可以直接从骨架形态图像中识别出来头部端点和尾部端点。即,训练一个端点识别模型,端点识别模型的输入是骨架形态图像,输出则是头部端点和尾部端点各自的端点坐标。
在另一些实施例中,终端还可以先从骨架形态图像中,截取出来骨架一端和骨架另一端各自的端点局部区域,进而再对截取到的每个端点局部区域分别进行二分类。即,训练一个用于二分类的头尾识别模型,用于判断输入的端点局部区域是头部端点还是尾部端点。上述处理方式能够减少头尾识别过程的计算量,提升头尾识别过程的识别效率。以这种情况为例进行说明,图12是本申请实施例涉及的一种识别头部端点和尾部端点的方法流程图。如图12所示,上述步骤605可以通过下述步骤6051-6054来实现。
6051、终端在骨架形态图像中,截取得到第一端点局部区域和第二端点局部区域,第一端点局部区域和第二端点局部区域位于骨架的两端。
在一些实施例中,由于骨架形态图像中的骨架像素点能够形成一条单像素宽度的骨架,因此很容易找到骨架的两个端点。接着,可以使用每个端点作为截取中心,在骨架形态图像 中确定出来一个以该端点为中心点的端点候选框。接着,可以直接从骨架形态图像中找到端点候选框所圈定的端点局部区域。这里为例方便区分两个端点各自的端点局部区域,将位于骨架一端的截取区域称为第一端点局部区域,将位于骨架另一端的截取区域称为第二端点局部区域。
6052、终端基于该第一端点局部区域,提取该骨架一端的第一HOG特征。
在一些实施例中,终端从骨架形态图像中截取得到第一端点局部区域之后,可以从原始的显微图像中找到与该第一端点局部区域位置相同的原始局部图像。接着,可以针对该原始局部图像提取到该第一HOG特征。可选地,将该原始局部图像划分成多个细胞单元,细胞单元是指图像中较小的连通区域。接着,采集每个细胞单元中各个像素点的梯度或边缘的方向直方图。再将这些直方图组合起来构成细胞单元的特征描述符。重复上述操作直到得到整个原始局部图像的第一HOG特征。
6053、终端基于该第二端点局部区域,提取该骨架另一端的第二HOG特征。
上述步骤6053与上述步骤6052类似,这里不做赘述。
6054、终端基于该第一HOG特征和该第二HOG特征,分别对该骨架一端和该骨架另一端进行识别,得到该头部端点和该尾部端点。
在一些实施例中,终端可以利用头尾识别模型来进行头部端点和尾部端点的识别/分类。其中,头尾识别模型用于根据端点局部区域的HOG特征来判断目标对象的骨架中的端点是属于头部端点还是尾部端点。在此基础上,上述步骤6054可以通过下述步骤C1-C3实现。
C1、终端将该第一HOG特征输入到头尾识别模型中,通过该头尾识别模型对该第一HOG特征进行二分类,得到对该骨架一端的第一识别结果。
其中,第一识别结果用于表征该骨架一端是头部端点还是尾部端点。
在一些实施例中,头尾识别模型包括头部识别模型和尾部识别模型两个二分类模型。利用一些预先标注了头部端点的端点局部区域的HOG特征训练得到一个针对头部端点进行二分类的头部识别模型。同时,利用一些预先标注了尾部端点的端点局部区域的HOG特征训练得到一个针对尾部端点进行二分类的头部识别模型。接着,将该第一HOG特征输入到训练得到的头部识别模型中。通过头部识别模型对骨架一端进行是否为头部端点的二分类处理,输出骨架一端是否为头部端点的第一识别结果。
示意性地,以头部识别模型是SVM二分类模型为例进行说明。将第一HOG特征输入到SVM二分类模型之后,SVM二分类模型会对该第一HOG特征进行二分类,从而输出该骨架一端是否为头部端点的第一识别结果。例如,SVM二分类模型会基于第一HOG特征预测得到骨架一端属于头部端点的识别概率。在该识别概率大于分类阈值的情况下,将第一识别结果设置为“Y(Yes,是)”,代表该骨架一端是头部端点。否则,将第一识别结果设置为“N(No,否)”,代表该骨架一端不是头部端点。其中,该分类阈值是任一大于或等于0且小于或等于1的数值。
在另一些实施例中,头尾识别模型就是一个整体的用于判断骨架端点是头部端点还是尾部端点的多分类模型。这样利用一些预先标注了头部端点及尾部端点的端点局部区域的HOG特征训练得到一个针对头部端点/尾部端点进行多分类的头尾识别模型。接着,将该第一HOG特征输入到训练得到的头尾识别模型中,通过该头尾识别模型对该骨架一端进行多分类处理,输出该骨架一端是头部端点/尾部端点/既不是头部端点也不是尾部端点的第一识别结果。
示意性地,以头尾识别模型是SVM多分类模型为例进行说明,将该第一HOG特征输入到SVM多分类模型之后,SVM多分类模型会对该第一HOG特征进行多分类,从而输出该骨架一端是头部端点/尾部端点/既不是头部端点也不是尾部端点的第一识别结果。即,SVM多分类模型可以配置有3个类别标签:“头部端点”、“尾部端点”和“既不是头部端点也不是尾部端点”。SVM多分类模型会基于第一HOG特征预测得到该骨架一端属于每种类别标签的分类概率。接着,在该分类概率最高的类别标签确定为对该骨架一端的第一识别结果。
C2、终端将该第二HOG特征输入到该头尾识别模型中,通过该头尾识别模型对该第二HOG特征进行二分类,得到对该骨架另一端的第二识别结果。
其中,第二识别结果用于表征骨架另一端是头部端点还是尾部端点。
在一些实施例中,若头尾识别模型包括头部识别模型和尾部识别模型两个二分类模型,在上述步骤C1获取到的第一识别结果指示该骨架一端是头部端点的情况下,在本步骤C2中可以调用尾部识别模型来对第二HOG特征进行二分类处理,以输出该骨架另一端是否为尾部端点的第二识别结果。示意性地,以尾部识别模型是SVM二分类模型为例进行说明,将第二HOG特征输入到SVM二分类模型之后,SVM二分类模型会对该第二HOG特征进行二分类,从而输出该骨架另一端是否为尾部端点的第二识别结果。例如,SVM二分类模型会基于第二HOG特征预测得到该骨架另一端属于尾部端点的识别概率。在该识别概率大于分类阈值的情况下,将第二识别结果设置为“Y(Yes,是)”,代表该骨架另一端是尾部端点。否则,将第二识别结果设置为“N(No,否)”,代表该骨架另一端不是尾部端点。
在一些实施例中,若头尾识别模型包括头部识别模型和尾部识别模型两个二分类模型,在上述步骤C1获取到的第一识别结果指示该骨架一端不是头部端点的情况下,那么可以继续调用头部识别模型来对第二HOG特征进行二分类处理,以输出该骨架另一端是否为头部端点的第二识别结果。同时,再调用尾部识别模型来对第一HOG特征进行二分类处理,以判断该骨架一端是否为尾部端点。
在另一些实施例中,若头尾识别模型是一个多分类模型,那么可以将该第二HOG特征也输入到训练得到的头尾识别模型中,通过该头尾识别模型对该骨架另一端进行多分类处理,输出骨架另一端是头部端点/尾部端点/既不是头部端点也不是尾部端点的第二识别结果。
示意性地,以头尾识别模型是SVM多分类模型为例进行说明,将该第二HOG特征输入到SVM多分类模型之后,SVM多分类模型会对该第二HOG特征进行多分类,从而输出该骨架另一端是头部端点/尾部端点/既不是头部端点也不是尾部端点的第二识别结果。即,SVM多分类模型可以配置有3个类别标签:“头部端点”、“尾部端点”和“既不是头部端点也不是尾部端点”。SVM多分类模型会基于第二HOG特征预测得到该骨架另一端属于每种类别标签的分类概率。接着,在该分类概率最高的类别标签确定为对该骨架另一端的第二识别结果。
C3、终端基于该第一识别结果和该第二识别结果,确定得到该头部端点和该尾部端点。
在一些实施例中,如果第一识别结果和第二识别结果指示了骨架一端是头部端点、骨架另一端是尾部端点,或者指示了骨架一端是尾部端点、骨架另一端是头部端点。即,有一个端点是头部端点,另一个端点是尾部端点。那么代表识别结果无异常,继续后续流程。
在一些实施例中,如果第一识别结果和第二识别结果指示了两个端点都是头部端点,或者两个端点都是尾部端点,或者两个端点判断为“既不是头部端点也不是尾部端点”,这时可以仅一定程度上的自动修正。比如,如果头部识别模型将两个端点都分类成头部端点,则选择识别概率最大的端点作为头部端点,并将剩余端点作为尾部端点,此时再利用尾部识别模型进行验证。如果剩余端点被识别成尾部端点的概率大于选定的头部端点被识别为尾部端点的概率,代表验证通过。或,如果尾部识别模型将两个端点都分类成尾部端点,也可以以这一修正方式进行类推。或者,还可以直接上报给技术人员进行人工排查,本申请实施例对此不进行具体限定。
图13是本申请实施例提供的一种截取端点局部区域的示意图。如图13所示,以目标对象是线虫为例说明,根据第一端点局部区域和第二端点局部区域,可以在显微图像1301中采样到对应的两个原始局部图像1311和1312。接着通过头尾识别模型能够分别判断原始局部图像1311和1312中各自包含的骨架端点是头部端点还是尾部端点。对于线虫来说,线虫头部和尾部在形态学上的差异比较明显。如1302所示,给出了几种示例性的线虫头部的局部图像。可以看出来,线虫的头部边缘比较圆润。如1303所示,给出了几种示例性的线虫尾部的局部图像。可以看出来,线虫的尾部边缘比较尖锐。通过针对两个原始局部图像分别提取HOG 特征,HOG特征能够很好描述不同方向统计后的梯度特征,从而能够明显区分出来圆润的边缘和尖锐的边缘,这样再使用HOG特征和SVM分类器结合之后,能够对线虫的头部和尾部具有很高的识别精度。
在上述过程中,通过在提取到的骨架形态图像的基础上,针对骨架的两个端点分别截取一个局部图像区域。对这一局部图像区域中的骨架端点进行分类,要么是头部端点要么是尾部端点。在一个示例中,对每个骨架端点都提取128维度的HOG特征,这种情况下针对线虫头尾的识别准确率高达98%,能够很好地平衡头尾识别速度和识别准确率。
606、终端将该骨架形态图像、该头部端点和该尾部端点确定为该目标对象的骨架形态信息。
其中,骨架形态信息表征目标对象的骨架所处的形态。
在一些实施例中,通过骨架形态图像能够确定出一条目标对象当前的骨架形态,通过识别出来的头部端点、尾部端点能够确定出来这条骨架形态的走向。即,能够形成一条完整的有方向的(从头部端点指向尾部端点)的有向骨架形态,这一有向骨架形态就是目标对象的骨架形态信息。
在上述步骤604-606中,提供了对该实例图像中的目标对象进行骨架提取,得到该目标对象的骨架形态信息的一种可能实施方式。即,先通过骨架提取模型来提取到骨架形态图像,再利用头尾识别模型来识别出来头部端点和尾部端点,这样能够得到一条从头部端点指向尾部端点的有向骨架形态,在有向骨架形态的基础上能够对目标对象进行更加丰富深层的运动学分析。在另一些实施例中,也可以仅将骨架形态图像中的无向骨架形态作为骨架形态信息,本申请实施例对此不进行具体限定。
607、终端基于该骨架形态信息,对该目标对象的骨架形态进行采样,得到多个采样点各自的骨架切向角所构成的特征向量。
其中,骨架切向角表征在从头部端点指向尾部端点的有向骨架形态上以采样点作为切点所对应切线与水平线之间的夹角。
在一些实施例中,终端在从头部端点指向尾部端点的有向骨架形态上,选取多个采样点。接着,对每个采样点,都在有向骨架形态上都生成一条以该采样点作为切点的一条切线(因为骨架形态是有方向的,因此切线是顺着骨架形态方向的射线,而非无向的直线)。接着,将切线与水平线之间的夹角确定为该采样点的骨架切向角。重复执行上述操作,即可得到多个采样点各自的骨架切向角。上述多个采样点各自的骨架切向角能够形成一个特征向量,特征向量的维度等于采样点的数量,特征向量中每个元素是一个采样点的骨架切向角。
608、终端基于多种预设运动状态,对由该特征向量所表示的有向骨架形态进行运动成分分解,得到该目标对象的运动成分信息。
其中,运动成分信息是指对骨架形态信息进行运动分解所得的多种预设运动状态各自的特征值。
在一些实施例中,技术人员预先定义多种预设运动状态之后,每种预设运动状态实际表征了一个预设骨架形态。对预设骨架形态运用上述步骤607类似的方式进行采样,也能够得到预设骨架形态所对应的预设特征向量,再将上述步骤607获取到的特征向量分解成多个预设特征向量的加权和。上述处理方式能够根据每种预设特征向量在分解时所占的权值系数(即特征值),来获取目标对象的运动成分信息,这样能够将任意的骨架形态都分解成多种预设运动状态的组合,极大方便了对目标对象的运动学分析。
在一些实施例中,运动成分信息包括该多种预设运动状态各自的多个特征值,这一特征值代表了在进行运动成分分解时对应预设运动状态所具有的权值系数。在这种情况下,图14是本申请实施例提供的一种对目标对象进行运动分析的流程图如图14所示,上述步骤608可以通过如下步骤6081-6083来实现。
6081、终端对该多种预设运动状态所指示的预设骨架形态分别进行采样,得到该多种预 设运动状态各自的预设特征向量。
在一些实施例中,对每种预设运动状态,从该预设运动状态所指示的有向的预设骨架形态上,选取多个采样点。需要说明的是,从预设骨架形态上对采样点的选取方式,需要与步骤607中的采样点的选取方式保持一致。接着,对每个采样点,都在该预设骨架形态上生成一条以该采样点作为切点的一条切线(因为预设骨架形态也是有方向的,因此切线是顺着预设骨架形态方向的射线,而非无向的直线)。接着,将该切线与水平线之间的夹角确定为该采样点的骨架切向角。重复执行上述操作,即可得到多个采样点各自的骨架切向角。上述多个采样点各自的骨架切向角能够形成一个预设特征向量,预设特征向量的维度等于采样点的数量,预设特征向量中每个元素是一个采样点的骨架切向角。需要说明的是,由于步骤607和步骤6081的采样方式保持一致,代表了采样点数量一致,因此特征向量和预设特征向量是具有相同维度的。
6082、终端将该特征向量分解成多个该预设特征向量与多个特征值各自的乘积之间的和值。
在一些实施例中,对特征向量进行分解,分解成多个预设特征向量与多个特征值各自的乘积之间的和值。换言之,假设特征向量是一个K维向量,相当于用K个采样点的骨架切向角来描述目标对象的骨架形态,那么显然预设特征向量也是一个K维向量。假设共设定了K个预设运动状态,那么可得到N个预设特征向量(每个预设特征向量都是K维向量)。即所有预设运动状态的预设特征向量组成了一个N×K矩阵。对N×K矩阵提取协方差,可以得到一个K×K矩阵。再基于K×K的协方差矩阵,进行特征值和K维特征向量的分解,即可得到N个预设特征向量各自对应的N个特征值。N个特征值满足如下条件:将N个预设特征向量分别与对应的N个特征值相乘,得到N个乘积,将这N个乘积相加恰好等于K维特征向量。
6083、终端将该多个特征值所构成的特征值序列确定为该运动成分信息。
上述步骤6083中分解得到了多个特征值。例如,N个特征值。此时能够确定一个特征值序列。比如,假设包含了5个预设运动状态,求解得到了5个特征值:a1,a2,a3,a4和a5。那么特征值序列{a1,a2,a3,a4,a5}即可作为该目标对象的运动成分信息。
在另一些实施例中,还可以将特征值序列中包含的该多个特征值按照从大到小的顺序进行排序,将排序中位于前目标位的特征值所对应的预设运动状态确定为运动主成分。接着,可以仅将运动主成分的特征值作为目标对象的运动成分信息,即仅关注对目标对象当前的骨架形态具有决定性作用的主成分,而忽略掉一些占比较小的副成分。例如,将特征值序列中的N个特征值按照从大到小的顺序进行排序,仅选取top5个特征值对应的5个预设运动状态作为运动主成分,并将这top5个特征值构成的特征值子序列作为目标对象的运动成分信息。可选地,还可以选取top3、top10等特征值作为运动主成分,本申请实施例对此不进行具体限定。
在获取到运动主成分的基础上,还可以基于运动主成分,对目标对象在观测时间段内的运动进行分析,得到目标对象在观测时间段内的运动学特征。比如,对某一线虫的骨架形态,采用上述分析方式得到了10个特征值,选取在从大到小的排序中位于top5(前5位)的特征值{a1,a2,a3,a4,a5}所对应的5种预设运动状态作为运动主成分,这5个运动主成分能够很好地描述线虫的运动状态。
图15是本申请实施例提供的一种目标对象的运动分析原理图。如图15所示,以目标对象为线虫为例进行说明。在主成分分析过程中,虽然线虫形态各异、难以预料,但线虫形态通常会有其固有规律。通过上述步骤6081-6083的运动分析方式,能够分解得到线虫的运动主成分。以总共分析了5个预设运动状态,并选取了特征值最大的top2个预设运动状态作为运动主成分为例进行说明。这时原始的特征值序列为{a1,a2,a3,a4,a5},运动主成分的特征值子序列为{a1,a2},即特征值a1和a2各自对应的两种预设运动状态是运动主成分。这样,通过在一段观测时间段内,对同一条线虫采集得到包含该线虫的连续的多个显微图像 帧,对每个显微图像帧都执行上述运动分析流程,能够绘制出来同一条线虫在观测时间段内的特征值a1和a2所构成的运动分析概率图1501。在运动分析概率图1501中,横坐标是特征值a1的取值,纵坐标是特征值a2的取值,而图中各个坐标点的颜色深浅,则代表了线虫处于这一坐标点所决定的特征值a1和a2所合成的骨架形态的概率。进一步的,在运动分析概率图1501的基础上,还可以分析特征值a1和a2的坐标值所构成的角度相位值。这一角度相位值是将特征值a1和a2构成的坐标值转换成三角函数之后,再经过反三角函数变换取到的角度相位值,这样能够描述线虫摆动前进的运动学特征。如运动相位分析图1502所示,假设对同一条线虫采集到了8个时刻下的显微图像帧,这8个时刻下线虫的骨架形态依次从左至右排开。经过分析可知,t=1时刻时角度相位值φ=-π,t=5时刻时角度相位值φ=0,t=8时刻时角度相位值
在上述步骤607-608中,示出了终端基于该骨架形态信息,对该目标对象进行运动分析,得到该目标对象的运动成分信息的一种可能实施方式。即通过对有向骨架形态进行采样,以采样点的骨架切向角来构建特征向量。上述处理方式能够针对特征向量进行量化分解,从而能够对提取到的骨架形态进行自动化分解运动主成分,进而方便地通过运动主成分来分析目标对象的各类运动学参数,极大提升了对目标对象的分析效率。
图16是本申请实施例提供的一种显微图像的处理方法的原理性流程图。如图16所示,以目标对象为线虫为例,这一处理方法可应用于线虫分析的各个领域,如计数、分割、形态学测量和运动学分析,具有极度广泛的应用场景。针对显微镜CCD图像传感器采集到的原始的线虫图像1601,包括单线虫和多线虫两种情况。单线虫可能会出现自我卷曲,而多线虫可能会发生互相重叠。因此,将线虫图像1601输入到一个能够处理重叠目标的实例分割模型(即双图层实例分割模型)中,得到线虫实例分割后的结果1602。接着,将实例分割所得的每个的单线虫目标实例,都输入到骨架提取模型中进行骨架提取,得到骨架提取结果1603。此外还需要对单线虫的头部端点和尾部端点进行识别。接着,在骨架提取和头尾识别后,应用骨架切向角来描述线虫的运动状态。如骨架提取结果1603所示,针对放大的一段骨架弧形上设置了5个采样点,其中第3个采样点的切线ti与水平线之间的夹角θi即为这一采样点的骨架切向角。接着,再通过运动主成分分析的方式,来将使用多个采样点的骨架切向角构成的特征向量来进行运动主成分的分解。将分解得到的各个运动主成分的特征值投入到后续的运动学参数分析中,可以自动输出线虫的运动速度、角向速度、轴向速度等。如状态描述图1604所示,假设将多个采样点的序号进行归一化之后,绘制出来归一化后的采样点序号与骨架切向角之间的关系图,横坐标表示了归一化之后的采样点序列,纵坐标则表示了这一序号对应采样点的骨架切向角的取值。如主成分分析模块1605所示,将某个线虫的骨架形态可以分解成4个预设运动形态的加权和,4个预设运动形态各自的权值即特征值分别为{a1,a2,a3,a4}。最后,如1606所示,可以基于Eigenworm(蠕动本征)模式来进行更深层次的运动学分析,如分析线虫的运动速度、角向速度、轴向速度等。
上述所有可选技术方案,能够采用任意结合形成本公开的可选实施例,在此不再一一赘述。
本申请实施例提供的方法,通过对显微图像中包含的目标对象进行实例分割,以确定出来每个目标对象的实例图像即单实例分割结果,并在单实例分割结果上提取出来骨架形态信息,以在骨架形态信息的基础上进行运动分析和运动成分分解,能够将每个目标对象的当前所具有的复杂骨架形态,分解成多个预设的运动状态之间的组合,整体处理流程无需人工干预,机器能够自动化实现,极大了降低了人力成本、提升了分析效率。此外,基于输出的运动成分信息还能够进行深层的形态学测量和运动学分析,因此也提升了对目标对象进行分析的精准程度。
图17是本申请实施例提供的一种显微图像的处理装置的结构示意图,如图17所示,装置包括:
实例分割模块1701,用于对显微图像进行实例分割,得到实例图像,实例图像包含显微图像中的目标对象;
骨架提取模块1702,用于对实例图像中的目标对象进行骨架提取,得到目标对象的骨架形态信息,骨架形态信息表征目标对象的骨架形态;
运动分析模块1703,用于基于骨架形态信息,对目标对象进行运动分析,得到多个特征值,将所述多个特征值构成的特征值序列,确定为目标对象的运动成分信息,多个特征值用于表征合成骨架形态时多种预设运动状态的加权系数。
本申请实施例提供的装置,通过对显微图像中包含的目标对象进行实例分割,以确定出来每个目标对象的实例图像即单实例分割结果,并在单实例分割结果上提取出来骨架形态信息,以在骨架形态信息的基础上进行运动分析和运动成分分解,能够将每个目标对象的当前所具有的复杂骨架形态,分解成多个预设的运动状态之间的组合,整体处理流程无需人工干预,机器能够自动化实现,极大了降低了人力成本、提升了分析效率。
在一些实施例中,基于图17的装置组成,实例图像包括目标对象的轮廓图和掩膜图,实例分割模块1701包括:
确定子模块,用于从显微图像中,确定包含目标对象的感兴趣区域ROI;
分割子模块,用于对ROI进行实例分割,以确定出目标对象的轮廓图和掩膜图。
在一些实施例中,在ROI中包含互相重叠的多个目标对象的情况下,基于图17的装置组成,分割子模块包括:
提取单元,用于基于ROI的位置信息,确定ROI候选框,ROI候选框选中的区域包括ROI;从显微图像的全局图像特征中确定ROI的局部图像特征,局部图像特征用于表征全局图像特征中被ROI候选框选中的区域的特征;
处理单元,用于将局部图像特征输入到双图层实例分割模型中,通过双图层实例分割模型对局部图像特征进行处理,输出ROI中多个目标对象各自的轮廓图和掩膜图,双图层实例分割模型用于对不同对象分别建立图层来获取每个对象各自的实例分割结果。
在一些实施例中,ROI中包含互相重叠的遮挡对象和被遮挡对象;
双图层实例分割模型包括遮挡对象图层网络和被遮挡对象图层网络,遮挡对象图层网络用于提取位于顶层的遮挡对象的轮廓和掩膜,被遮挡对象图层网络用于提取位于底层的被遮挡对象的轮廓和掩膜;
基于图17的装置组成,处理单元包括:
第一提取子单元,用于将局部图像特征输入遮挡对象图层网络,通过遮挡对象图层网络提取得到ROI中位于顶层的遮挡对象的第一感知特征,第一感知特征表征遮挡对象在实例分割任务上的图像特征;
获取子单元,用于对第一感知特征进行上采样操作,得到遮挡对象的轮廓图和掩膜图;
第二提取子单元,用于将局部图像特征和第一感知特征融合所得的融合特征输入到被遮挡对象图层网络,提取得到ROI中位于底层的被遮挡对象的第二感知特征,第二感知特征表征被遮挡对象在实例分割任务上的图像特征;
获取子单元,还用于对第二感知特征进行上采样操作,得到被遮挡对象的轮廓图和掩膜图。
在一些实施例中,遮挡对象图层网络包括第一卷积层、第一图卷积层以及第二卷积层,第一图卷积层包括非局部算子,非局部算子用于将图像空间中的像素点根据对应特征向量的相似度关联起来;第一提取子单元用于:
将局部图像特征输入到遮挡对象图层网络的第一卷积层中,通过第一卷积层对局部图像 特征进行卷积操作,得到初始感知特征;
将初始感知特征输入到遮挡对象图层网络的第一图卷积层中,通过第一图卷积层中非局部算子对初始感知特征进行卷积操作,得到图卷积特征;
将图卷积特征输入到遮挡对象图层网络的第二卷积层中,通过第二卷积层对图卷积特征进行卷积操作,得到第一感知特征。
在一些实施例中,第二提取子单元用于:
将融合特征输入到被遮挡对象图层网络的第三卷积层中,通过第三卷积层对融合特征进行卷积操作,得到感知交互特征;
将感知交互特征输入到被遮挡对象图层网络的第二图卷积层中,通过第二图卷积层中非局部算子对感知交互特征进行卷积操作,得到图卷积交互特征;
将图卷积交互特征输入到被遮挡对象图层网络的第四卷积层中,通过第四卷积层对图卷积交互特征进行卷积操作,得到第二感知特征。
在一些实施例中,双图层实例分割模型基于多个合成样本图像训练得到,合成样本图像中包含多个目标对象,合成样本图像基于多个仅包含单目标对象的原始图像合成得到。
在一些实施例中,在目标对象比原始图像中的背景暗的情况下,合成样本图像中每个像素的像素值等于用于合成合成样本图像的多个原始图像中相同位置像素中的最低像素值;或,在目标对象比原始图像中的背景亮的情况下,合成样本图像中每个像素的像素值等于多个原始图像中相同位置像素中的最高像素值。
在一些实施例中,基于图17的装置组成,骨架提取模块1702包括:
骨架提取子模块,用于对ROI中的任一目标对象,将实例图像输入到骨架提取模型中,通过骨架提取模型来对目标对象进行骨架提取,得到骨架形态图像,骨架提取模型用于基于目标对象的实例图像来预测目标对象的骨架形态;
识别子模块,用于对骨架形态图像进行识别,得到目标对象的骨架形态中的头部端点和尾部端点;
信息确定子模块,用于将骨架形态图像、头部端点和尾部端点确定为骨架形态信息。
在一些实施例中,骨架提取模型包括级联的多个卷积层;骨架提取子模块用于:
将实例图像输入到骨架提取模型的多个卷积层中,通过多个卷积层对实例图像进行卷积操作,得到骨架形态图像;
其中,骨架提取模型基于包含目标对象的样本图像和对目标对象标注的骨架形态标签信息训练得到。
在一些实施例中,骨架形态标签信息包括对样本图像中目标对象的骨架形态进行采样的多个采样点各自的骨架切向角,骨架切向角表征在从头部端点指向尾部端点的有向骨架形态上以采样点作为切点所对应切线与水平线之间的夹角;
骨架提取模型在训练阶段的损失函数值基于多个采样点各自的骨架切向角和预测切向角之间的误差确定得到,其中,预测切向角基于骨架提取模型对样本图像预测得到的骨架形态图像采样得到。
在一些实施例中,基于图17的装置组成,识别子模块包括:
截取单元,用于在骨架形态图像中,截取得到第一端点局部区域和第二端点局部区域,第一端点局部区域和第二端点局部区域分别位于骨架的两端;
特征提取单元,用于基于第一端点局部区域,提取骨架一端的第一方向梯度直方图HOG特征;
特征提取单元,还用于基于第二端点局部区域,提取骨架另一端的第二HOG特征;
识别单元,用于基于第一HOG特征和第二HOG特征,分别对骨架一端和骨架另一端进行识别,得到头部端点和尾部端点。
在一些实施例中,识别单元用于:
将第一HOG特征输入到头尾识别模型中,通过头尾识别模型对第一HOG特征进行二分类,得到第一识别结果,第一识别结果用于表征骨架一端是头部端点还是尾部端点;
将第二HOG特征输入到头尾识别模型中,通过头尾识别模型对第二HOG特征进行二分类,得到第二识别结果,第二识别结果用于表征骨架另一端是头部端点还是尾部端点;
基于第一识别结果和第二识别结果,确定得到头部端点和尾部端点;
其中,头尾识别模型用于根据端点局部区域的HOG特征来判断目标对象的骨架中的端点是属于头部端点还是尾部端点。
在一些实施例中,基于图17的装置组成,运动分析模块1703包括:
采样子模块,用于基于骨架形态信息,对目标对象的骨架形态进行采样,得到多个采样点各自的骨架切向角所构成的特征向量,骨架切向角表征在从头部端点指向尾部端点的有向骨架形态上以采样点作为切点所对应切线与水平线之间的夹角;
分解子模块,对多种预设运动状态所指示的预设骨架形态分别进行采样,得到多种预设运动状态各自的预设特征向量;
将特征向量分解成多个预设特征向量与多个特征值各自的乘积之间的和值,将多个特征值所构成的特征值序列确定为运动成分信息。
在一些实施例中,运动分析模块1703还用于:
将特征值序列中包含的多个特征值按照从大到小的顺序进行排序,将排序中位于前目标位的特征值所对应的预设运动状态确定为运动主成分;
基于运动主成分,对目标对象在观测时间段内的运动进行分析,得到目标对象在观测时间段内的运动学特征。
图18是本申请实施例提供的一种终端的结构示意图,终端1800是计算机设备的一种示例性说明。通常,终端1800包括有:处理器1801和存储器1802。
可选地,处理器1801包括一个或多个处理核心,比如4核心处理器、8核心处理器等。可选地,处理器1801采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field -Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。在一些实施例中,处理器1801包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器1801集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器1801还包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
在一些实施例中,存储器1802包括一个或多个计算机可读存储介质,可选地,该计算机可读存储介质是非暂态的。可选地,存储器1802还包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器1802中的非暂态的计算机可读存储介质用于存储至少一个程序代码,该至少一个程序代码用于被处理器1801所执行以实现本申请中各个实施例提供的显微图像的处理方法。
在一些实施例中,终端1800还可选包括有:外围设备接口1803和至少一个外围设备。处理器1801、存储器1802和外围设备接口1803之间能够通过总线或信号线相连。各个外围设备能够通过总线、信号线或电路板与外围设备接口1803相连。具体地,外围设备包括:显示屏1805和电源1808中的至少一种。
外围设备接口1803可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器1801和存储器1802。在一些实施例中,处理器1801、存储器1802和外围设备接口1803被集成在同一芯片或电路板上;在一些其他实施例中,处理器1801、存储器1802和外围设备接口1803中的任意一个或两个在单独的芯片或电路板上实现,本实施例对 此不加以限定。
显示屏1805用于显示UI(User Interface,用户界面)。可选地,该UI包括图形、文本、图标、视频及其它们的任意组合。当显示屏1805是触摸显示屏时,显示屏1805还具有采集在显示屏1805的表面或表面上方的触摸信号的能力。该触摸信号能够作为控制信号输入至处理器1801进行处理。可选地,显示屏1805还用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏1805为一个,设置终端1800的前面板;在另一些实施例中,显示屏1805为至少两个,分别设置在终端1800的不同表面或呈折叠设计;在一些实施例中,显示屏1805是柔性显示屏,设置在终端1800的弯曲表面上或折叠面上。甚至,可选地,显示屏1805设置成非矩形的不规则图形,也即异形屏。可选地,显示屏1805采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
电源1808用于为终端1800中的各个组件进行供电。可选地,电源1808是交流电、直流电、一次性电池或可充电电池。当电源1808包括可充电电池时,该可充电电池支持有线充电或无线充电。该可充电电池还用于支持快充技术。
图19是本申请实施例提供的一种计算机设备的结构示意图,该计算机设备1900可因配置或性能不同而产生比较大的差异,该计算机设备1900包括一个或一个以上处理器(Central Processing Units,CPU)1901和一个或一个以上的存储器1902,其中,该存储器1902中存储有至少一条计算机程序,该至少一条计算机程序由该一个或一个以上处理器1901加载并执行以实现上述各个实施例提供的显微图像的处理方法。可选地,该计算机设备1900还具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该计算机设备1900还包括其他用于实现设备功能的部件,在此不做赘述。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括至少一条计算机程序的存储器,上述至少一条计算机程序可由终端中的处理器执行以完成上述各个实施例中的显微图像的处理方法。例如,该计算机可读存储介质包括ROM(Read-Only Memory,只读存储器)、RAM(Random-Access Memory,随机存取存储器)、CD-ROM(Compact Disc Read-Only Memory,只读光盘)、磁带、软盘和光数据存储设备等。
在示例性实施例中,还提供了一种计算机程序产品或计算机程序,包括一条或多条程序代码,该一条或多条程序代码存储在计算机可读存储介质中。计算机设备的一个或多个处理器能够从计算机可读存储介质中读取该一条或多条程序代码,该一个或多个处理器执行该一条或多条程序代码,使得计算机设备能够执行以完成上述实施例中的显微图像的处理方法。

Claims (18)

  1. 一种显微图像的处理方法,应用于计算机设备,所述方法包括:
    对显微图像进行实例分割,得到实例图像,所述实例图像包含所述显微图像中的目标对象;
    对所述实例图像中的目标对象进行骨架提取,得到所述目标对象的骨架形态信息,所述骨架形态信息表征所述目标对象的骨架形态;
    基于所述骨架形态信息,对所述目标对象进行运动分析,得到多个特征值,所述多个特征值用于表征合成所述骨架形态时多种预设运动状态的加权系数;
    将所述多个特征值构成的特征值序列,确定为所述目标对象的运动成分信息。
  2. 根据权利要求1所述的方法,其中,所述实例图像包括所述目标对象的轮廓图和掩膜图;
    所述对显微图像进行实例分割,得到包含所述显微图像中目标对象的实例图像包括:
    从所述显微图像中,确定包含所述目标对象的感兴趣区域ROI;
    对所述ROI进行实例分割,得到所述目标对象的轮廓图和掩膜图。
  3. 根据权利要求2所述的方法,其中,在所述显微图像中的目标对象为多个目标对象,且所述ROI中包含互相重叠的所述多个目标对象的情况下,所述对所述ROI进行实例分割,得到所述目标对象的轮廓图和掩膜图包括:
    基于所述ROI的位置信息,确定ROI候选框,所述ROI候选框选中的区域包括所述ROI;
    从所述显微图像的全局图像特征中确定所述ROI的局部图像特征,所述局部图像特征用于表征所述全局图像特征中被所述ROI候选框选中的区域的特征;
    将所述局部图像特征输入到双图层实例分割模型中,通过所述双图层实例分割模型对所述局部图像特征进行处理,输出所述ROI中多个所述目标对象各自的轮廓图和掩膜图,所述双图层实例分割模型用于对不同对象分别建立图层来获取每个对象各自的实例分割结果。
  4. 根据权利要求3所述的方法,其中,所述ROI中包含互相重叠的遮挡对象和被遮挡对象;
    所述双图层实例分割模型包括遮挡对象图层网络和被遮挡对象图层网络,所述遮挡对象图层网络用于提取位于顶层的遮挡对象的轮廓和掩膜,所述被遮挡对象图层网络用于提取位于底层的被遮挡对象的轮廓和掩膜;
    所述将所述局部图像特征输入到双图层实例分割模型中,通过所述双图层实例分割模型对所述局部图像特征进行处理,输出所述ROI中多个所述目标对象各自的轮廓图和掩膜图包括:
    将所述局部图像特征输入所述遮挡对象图层网络,通过所述遮挡对象图层网络提取得到所述ROI中位于顶层的遮挡对象的第一感知特征,所述第一感知特征表征所述遮挡对象在实例分割任务上的图像特征;
    对所述第一感知特征进行上采样操作,得到所述遮挡对象的轮廓图和掩膜图;
    将所述局部图像特征和所述第一感知特征融合所得的融合特征输入到所述被遮挡对象图层网络,提取得到所述ROI中位于底层的被遮挡对象的第二感知特征,所述第二感知特征表征所述被遮挡对象在实例分割任务上的图像特征;
    对所述第二感知特征进行上采样操作,得到所述被遮挡对象的轮廓图和掩膜图。
  5. 根据权利要求4所述的方法,其中,所述遮挡对象图层网络包括第一卷积层、第一图卷积层以及第二卷积层,所述第一图卷积层包括非局部算子,所述非局部算子用于将图像空间中的像素点根据对应特征向量的相似度关联起来;
    所述将所述局部图像特征输入所述遮挡对象图层网络,通过所述遮挡对象图层网络提取得到所述ROI中位于顶层的遮挡对象的第一感知特征包括:
    将所述局部图像特征输入到所述遮挡对象图层网络的第一卷积层中,通过所述第一卷积层对所述局部图像特征进行卷积操作,得到初始感知特征;
    将所述初始感知特征输入到所述遮挡对象图层网络的第一图卷积层中,通过所述第一图卷积层中非局部算子对所述初始感知特征进行卷积操作,得到图卷积特征;
    将所述图卷积特征输入到所述遮挡对象图层网络的第二卷积层中,通过所述第二卷积层对所述图卷积特征进行卷积操作,得到所述第一感知特征。
  6. 根据权利要求4所述的方法,其中,所述将所述局部图像特征和所述第一感知特征融合所得的融合特征输入到所述被遮挡对象图层网络,提取得到所述ROI中位于底层的被遮挡对象的第二感知特征包括:
    将所述融合特征输入到所述被遮挡对象图层网络的第三卷积层中,通过所述第三卷积层对所述融合特征进行卷积操作,得到感知交互特征;
    将所述感知交互特征输入到所述被遮挡对象图层网络的第二图卷积层中,通过所述第二图卷积层中非局部算子对所述感知交互特征进行卷积操作,得到图卷积交互特征;
    将所述图卷积交互特征输入到所述被遮挡对象图层网络的第四卷积层中,通过所述第四卷积层对所述图卷积交互特征进行卷积操作,得到所述第二感知特征。
  7. 根据权利要求3所述的方法,其中,所述双图层实例分割模型基于多个合成样本图像训练得到,所述合成样本图像中包含多个所述目标对象,所述合成样本图像基于多个仅包含单目标对象的原始图像合成得到。
  8. 根据权利要求7所述的方法,其中,在所述目标对象比原始图像中的背景暗的情况下,所述合成样本图像中每个像素的像素值等于用于合成所述合成样本图像的多个原始图像中相同位置像素中的最低像素值;或,
    在所述目标对象比原始图像中的背景亮的情况下,所述合成样本图像中每个像素的像素值等于所述多个原始图像中相同位置像素中的最高像素值。
  9. 根据权利要求1-8任一项所述的方法,其中,所述对所述实例图像中的目标对象进行骨架提取,得到所述目标对象的骨架形态信息包括:
    对所述ROI中的任一目标对象,将所述实例图像输入到骨架提取模型中,通过所述骨架提取模型来对所述目标对象进行骨架提取,得到所述目标对象的骨架形态图像,所述骨架提取模型用于基于目标对象的实例图像来预测目标对象的骨架形态;
    对所述骨架形态图像进行识别,得到所述目标对象的骨架形态中的头部端点和尾部端点;
    将所述骨架形态图像、所述头部端点和所述尾部端点确定为所述骨架形态信息。
  10. 根据权利要求9所述的方法,其中,所述骨架提取模型包括级联的多个卷积层;
    所述将所述实例图像输入到骨架提取模型中,通过所述骨架提取模型来对所述目标对象进行骨架提取,得到所述目标对象的骨架形态图像包括:
    将所述实例图像输入到所述骨架提取模型的多个卷积层中,通过所述多个卷积层逐层对所述实例图像进行卷积操作,得到所述骨架形态图像;
    其中,所述骨架提取模型基于包含目标对象的样本图像和对所述目标对象标注的骨架形态标签信息训练得到。
  11. 根据权利要求10所述的方法,其中,所述骨架形态标签信息包括对所述样本图像中目标对象的骨架形态进行采样的多个采样点各自的骨架切向角,所述骨架切向角表征在从头部端点指向尾部端点的有向骨架形态上以所述采样点作为切点所对应切线与水平线之间的夹角;
    所述骨架提取模型在训练阶段的损失函数值基于所述多个采样点各自的骨架切向角和预测切向角之间的误差确定得到,其中,所述预测切向角基于所述骨架提取模型对所述样本图像预测得到的骨架形态图像采样得到。
  12. 根据权利要求9所述的方法,其中,所述对所述骨架形态图像进行识别,得到所述目标对象的骨架形态中的头部端点和尾部端点包括:
    在所述骨架形态图像中,截取得到第一端点局部区域和第二端点局部区域,所述第一端点局部区域和所述第二端点局部区域分别位于骨架的两端;
    基于所述第一端点局部区域,提取所述骨架一端的第一方向梯度直方图HOG特征;
    基于所述第二端点局部区域,提取所述骨架另一端的第二HOG特征;
    基于所述第一HOG特征和所述第二HOG特征,分别对所述骨架一端和所述骨架另一端进行识别,得到所述头部端点和所述尾部端点。
  13. 根据权利要求12所述的方法,其中,所述基于所述第一HOG特征和所述第二HOG特征,分别对所述骨架一端和所述骨架另一端进行识别,得到所述头部端点和所述尾部端点包括:
    将所述第一HOG特征输入到头尾识别模型中,通过所述头尾识别模型对所述第一HOG特征进行二分类,得到第一识别结果,所述第一识别结果用于表征所述骨架一端是头部端点还是尾部端点;
    将所述第二HOG特征输入到所述头尾识别模型中,通过所述头尾识别模型对所述第二HOG特征进行二分类,得到第二识别结果,所述第二识别结果用于表征所述骨架另一端是头部端点还是尾部端点;
    基于所述第一识别结果和所述第二识别结果,确定得到所述头部端点和所述尾部端点;
    其中,所述头尾识别模型用于根据端点局部区域的HOG特征来判断目标对象的骨架中的端点是属于头部端点还是尾部端点。
  14. 根据权利要求1-13任一项所述的方法,其中,所述基于所述骨架形态信息,对所述目标对象进行运动分析,得到多个特征值包括:
    基于所述骨架形态信息,对所述目标对象的骨架形态进行采样,得到多个采样点各自的骨架切向角所构成的特征向量,所述骨架切向角表征在从头部端点指向尾部端点的有向骨架形态上以所述采样点作为切点所对应切线与水平线之间的夹角;
    对所述多种预设运动状态所指示的预设骨架形态分别进行采样,得到所述多种预设运动状态各自的预设特征向量;
    将所述特征向量分解成多个所述预设特征向量与所述多个特征值各自的乘积之间的和值,得到多个特征值。
  15. 根据权利要求14所述的方法,其中,所述方法还包括:
    将所述特征值序列中包含的所述多个特征值按照从大到小的顺序进行排序,将所述排序 中位于前目标位的特征值所对应的预设运动状态确定为运动主成分;
    基于所述运动主成分,对所述目标对象在观测时间段内的运动进行分析,得到所述目标对象在所述观测时间段内的运动学特征。
  16. 一种显微图像的处理装置,配置于计算机设备中,所述装置包括:
    实例分割模块,用于对显微图像进行实例分割,得到实例图像,所述实例图像包含所述显微图像中的目标对象;
    骨架提取模块,用于对所述实例图像中的目标对象进行骨架提取,得到所述目标对象的骨架形态信息,所述骨架形态信息表征所述目标对象的骨架形态;
    运动分析模块,用于基于所述骨架形态信息,对所述目标对象进行运动分析,得到多个特征值,将所述多个特征值构成的特征值序列,确定为所述目标对象的运动成分信息,所述多个特征值用于表征合成所述骨架形态时多种预设运动状态的加权系数。
  17. 一种计算机设备,其中,所述计算机设备包括一个或多个处理器和一个或多个存储器,所述一个或多个存储器中存储有至少一条计算机程序,所述至少一条计算机程序由所述一个或多个处理器加载并执行以实现如权利要求1至权利要求15任一项所述的显微图像的处理方法。
  18. 一种存储介质,其中,所述存储介质中存储有至少一条计算机程序,所述至少一条计算机程序由处理器加载并执行以实现如权利要求1至权利要求15任一项所述的显微图像的处理方法。
PCT/CN2023/094954 2022-07-19 2023-05-18 显微图像的处理方法、装置、计算机设备及存储介质 WO2024016812A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/603,081 US20240221400A1 (en) 2022-07-19 2024-03-12 Microscopic image processing method and apparatus, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210849205.1 2022-07-19
CN202210849205.1A CN115205262A (zh) 2022-07-19 2022-07-19 显微图像的处理方法、装置、计算机设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/603,081 Continuation US20240221400A1 (en) 2022-07-19 2024-03-12 Microscopic image processing method and apparatus, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2024016812A1 true WO2024016812A1 (zh) 2024-01-25

Family

ID=83582466

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/094954 WO2024016812A1 (zh) 2022-07-19 2023-05-18 显微图像的处理方法、装置、计算机设备及存储介质

Country Status (3)

Country Link
US (1) US20240221400A1 (zh)
CN (1) CN115205262A (zh)
WO (1) WO2024016812A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118196406A (zh) * 2024-02-05 2024-06-14 南京大学 用于对图像进行分割处理的方法、装置及计算机可读介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205262A (zh) * 2022-07-19 2022-10-18 腾讯科技(深圳)有限公司 显微图像的处理方法、装置、计算机设备及存储介质
CN115359412B (zh) * 2022-10-24 2023-03-03 成都西交智汇大数据科技有限公司 一种盐酸中和实验评分方法、装置、设备及可读存储介质
CN118379327B (zh) * 2024-04-25 2024-10-29 中国科学院空间应用工程与技术中心 一种面向空间站空间线虫运动的关键点提取与跟踪方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112017189A (zh) * 2020-10-26 2020-12-01 腾讯科技(深圳)有限公司 图像分割方法、装置、计算机设备和存储介质
EP3866113A1 (en) * 2020-02-17 2021-08-18 Agile Robots AG Image segmentation methods and apparatus
CN113780145A (zh) * 2021-09-06 2021-12-10 苏州贝康智能制造有限公司 精子形态检测方法、装置、计算机设备和存储介质
CN115205262A (zh) * 2022-07-19 2022-10-18 腾讯科技(深圳)有限公司 显微图像的处理方法、装置、计算机设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3866113A1 (en) * 2020-02-17 2021-08-18 Agile Robots AG Image segmentation methods and apparatus
CN112017189A (zh) * 2020-10-26 2020-12-01 腾讯科技(深圳)有限公司 图像分割方法、装置、计算机设备和存储介质
CN113780145A (zh) * 2021-09-06 2021-12-10 苏州贝康智能制造有限公司 精子形态检测方法、装置、计算机设备和存储介质
CN115205262A (zh) * 2022-07-19 2022-10-18 腾讯科技(深圳)有限公司 显微图像的处理方法、装置、计算机设备及存储介质

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
GENG W., COSMAN P., HUANG C., SCHAFER W.R.: "Automated worm tracking and classification", CONFERENCE RECORD OF THE 37TH. ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, & COMPUTERS. PACIFIC GROOVE, CA, NOV. 9 - 12, 2003., NEW YORK, NY : IEEE., US, vol. 2, 1 January 2003 (2003-01-01), US , pages 2063 - 2068 , XP093130507, ISBN: 978-0-7803-8104-9, DOI: 10.1109/ACSSC.2003.1292343 *
HEBERT LAETITIA, AHAMED TOSIF, COSTA ANTONIO C., O’SHAUGHNESSY LIAM, STEPHENS GREG J.: "WormPose: Image synthesis and convolutional networks for pose estimation in C. elegans", PLOS COMPUTATIONAL BIOLOGY, PUBLIC LIBRARY OF SCIENCE, US, vol. 17, no. 4, 27 April 2021 (2021-04-27), US , pages e1008914, XP093130504, ISSN: 1553-7358, DOI: 10.1371/journal.pcbi.1008914 *
KE LEI; TAI YU-WING; TANG CHI-KEUNG: "Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers", 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 20 June 2021 (2021-06-20), pages 4018 - 4027, XP034007874, DOI: 10.1109/CVPR46437.2021.00401 *
STEPHENS GREG J., JOHNSON-KERNER BETHANY, BIALEK WILLIAM, RYU WILLIAM S.: "Dimensionality and Dynamics in the Behavior of C. elegans", PLOS COMPUTATIONAL BIOLOGY, PUBLIC LIBRARY OF SCIENCE, US, vol. 4, no. 4, April 2008 (2008-04-01), US , pages e1000028, XP093130502, ISSN: 1553-7358, DOI: 10.1371/journal.pcbi.1000028 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118196406A (zh) * 2024-02-05 2024-06-14 南京大学 用于对图像进行分割处理的方法、装置及计算机可读介质

Also Published As

Publication number Publication date
US20240221400A1 (en) 2024-07-04
CN115205262A (zh) 2022-10-18

Similar Documents

Publication Publication Date Title
WO2024016812A1 (zh) 显微图像的处理方法、装置、计算机设备及存储介质
TWI777092B (zh) 一種圖像處理方法、電子設備及存儲介質
CN110490850B (zh) 一种肿块区域检测方法、装置和医学图像处理设备
CN109154978B (zh) 用于检测植物疾病的系统和方法
Song et al. Accurate cervical cell segmentation from overlapping clumps in pap smear images
US20210118144A1 (en) Image processing method, electronic device, and storage medium
Pan et al. Leukocyte image segmentation by visual attention and extreme learning machine
CN110472676A (zh) 基于深度神经网络的胃早癌组织学图像分类系统
Pan et al. Mitosis detection techniques in H&E stained breast cancer pathological images: A comprehensive review
Pan et al. Cell detection in pathology and microscopy images with multi-scale fully convolutional neural networks
JP2010518486A (ja) 顕微鏡検査における細胞分析システムおよび方法
CN111563550B (zh) 基于图像技术的精子形态检测方法和装置
TW202013311A (zh) 一種圖像處理方法、電子設備及存儲介質
Ji et al. Research on urine sediment images recognition based on deep learning
Wang et al. Nucleus segmentation of cervical cytology images based on depth information
Wen et al. Review of research on the instance segmentation of cell images
US20220319208A1 (en) Method and apparatus for obtaining feature of duct tissue based on computer vision, and intelligent microscope
Zhang et al. Saliency detection via extreme learning machine
Francis et al. TEDLESS–Text detection using least-square SVM from natural scene
Urdal et al. Prognostic prediction of histopathological images by local binary patterns and RUSBoost
Nelson et al. An effective approach for the nuclei segmentation from breast histopathological images using star-convex polygon
Wang et al. Optic disc detection based on fully convolutional neural network and structured matrix decomposition
Wu et al. Automatic skin lesion segmentation based on supervised learning
CN115546163A (zh) 宫颈脱落细胞玻片识别方法、装置、设备和介质
Yancey Deep Feature Fusion for Mitosis Counting

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23841881

Country of ref document: EP

Kind code of ref document: A1