CN116740705A

CN116740705A - Super-bright light spot positioning and identifying method based on random forest classification algorithm

Info

Publication number: CN116740705A
Application number: CN202310359888.7A
Authority: CN
Inventors: 倪洁蕾; 倪燕翔; 曹博; 梁国涛
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2023-04-06
Filing date: 2023-04-06
Publication date: 2023-09-12
Anticipated expiration: 2043-04-06
Also published as: CN116740705B

Abstract

The application provides an ultra-bright light spot positioning and identifying method based on a random forest classification algorithm, and belongs to the technical field of single-molecule positioning and identifying. The application discloses an ultra-bright light spot positioning and identifying method based on a random forest classification algorithm, which comprises the following steps: the establishment of a single molecule library comprises the following steps: the method comprises the steps of establishing a single molecule library for simulating single molecule positioning super-resolution microscopic imaging; simulation of the ultra-bright light spot data: for simulating an ultra-bright spot; training a random forest classification model: training the single-molecule light spot and super-bright light spot data by adopting a random forest algorithm to obtain the random forest classification model; and (3) identifying and calibrating: and adopting the random forest classification model to perform positioning number identification and positioning information calibration on the superbright light spots of the single-molecule positioning super-resolution microscopic imaging of the biological sub-diffraction limit structure. The application effectively improves the accuracy of the ultra-bright light spot positioning and improves the quantitative and comparative analysis capability of single-molecule positioning super-resolution microscopic imaging on biological samples.

Description

Super-bright light spot positioning and identifying method based on random forest classification algorithm

Technical Field

The application relates to the technical field of single-molecule positioning and identification, in particular to an ultra-bright light spot positioning and identification method based on a random forest classification algorithm.

Background

A single molecule positioning microscope (SMLM) comprising STORM, PALM, DNA-PAINT and the like, wherein a plurality of molecules in a diffraction limit range are imaged in a time-sharing way by utilizing the switching capacity of certain fluorescent molecules, and in the process of carrying out sequential imaging on the same field of view, each frame of photo captures a signal of a single molecule, so that the diffraction limit is broken, and ultra-high resolution imaging (the resolution in the horizontal direction is about 20nm and the axial resolution is about 50 nm) is realized; this technology has thus also obtained the 2014 nobel prize. As one of the highest resolution in the current super-resolution fluorescence imaging technology, SMLM has significant advantages and irreplaceable effects when facing the size of one to several nanometers of most biomolecules, especially in the aspect of visualizing sub-diffraction limit ultrastructures.

Recent advances in biological research have placed urgent demands on accurate quantitative assessment of nano-resolution biological structures. Despite the high resolution, single molecule localized super-resolution microscopy imaging still presents a significant challenge in providing quantitative information of the imaged biological structure. This is because optical switching of a particular molecule is essentially a random event. Uncertainty in the scintillation time of individual molecules during defined images presents a significant challenge for quantitative analysis based on the number of positionings. Nevertheless, there is still a lack of research to assess or improve quantitative capability of the system, and reports concerning quantitative correlations are mainly focused on localization segmentation and counting.

The light spot in each camera frame is ideally contributed by the emission of a single molecule according to the single molecule activation principle. This is achieved by using excellent optical switching fluorescent dyes such as AF647 and Cy5, which exhibit relatively low fluorescent-dark duty cycles (0.05-0.1%). Nevertheless, molecules in biological specimens typically exhibit small dimensions ranging from a nanometer to a few nanometers, often with relatively high molecular abundance in the diffraction limited cellular ultrastructures, e.g. about 325 tubulin dimers within a 200nm long microtubule segment. Therefore, although a single-molecule sample is sparse enough, in the single-molecule positioning super-resolution microscopic imaging process, a light spot formed by simultaneously emitting light from a plurality of molecules within the diffraction limit exists, and the light spot is called an ultra-bright light spot. Such special spots are mostly considered as single-molecule spots in image reconstruction, resulting in a reduced quantitative capability of single-molecule localization microscopy imaging.

Therefore, it is necessary to apply a method for identifying and calibrating the super-bright positioning of quantitative single-molecule positioning super-resolution microscopy imaging to improve the quantitative capability of single-molecule positioning super-resolution microscopy imaging and widen the application of single-molecule positioning super-resolution microscopy imaging in biomedical research.

Disclosure of Invention

In order to solve the problems in the prior art, the application provides an ultra-bright light spot positioning and identifying method based on a random forest classification algorithm.

The application discloses an ultra-bright light spot positioning and identifying method based on a random forest classification algorithm, which comprises the following steps:

the establishment of a single molecule library comprises the following steps: the method comprises the steps of establishing a single molecule library for simulating single molecule positioning super-resolution microscopic imaging;

simulation of the ultra-bright light spot data: randomly generating a plurality of position coordinates in a diffraction limit area, randomly extracting a corresponding number of single-molecule light spots from a single-molecule library, and overlapping the single-molecule light spots in a simulation area to enable the positioning coordinates of the single-molecule light spots to be respectively overlapped with the generated position coordinates;

training a random forest classification model: training the single-molecule light spot and super-bright light spot data by adopting a random forest algorithm to obtain the random forest classification model;

and (3) identifying and calibrating: and adopting the random forest classification model to perform positioning number identification and positioning information calibration on the superbright light spots of the single-molecule positioning super-resolution microscopic imaging of the biological sub-diffraction limit structure.

Further, the method also comprises a super-resolution image reconstruction step: and reconstructing a calibrated single-molecule positioning microscopic imaging super-resolution image based on the identified and calibrated superbright light spot information.

The application is further improved, and the step of establishing the single molecule library comprises the following sub-steps:

s101: preparing a single-molecule sample to obtain a fluorescence-labeled single-molecule sample;

s102: single molecule positioning super-resolution microscopic imaging: adopting an imaging buffer solution to perform single-molecule positioning super-resolution microscopic imaging of single-molecule samples to obtain image sequences of all single-molecule samples;

s103: extraction of single molecule data: and acquiring imaging data sets of each single-molecule positioning microscopic imaging based on the image sequence to form a single-molecule library, wherein the imaging data sets comprise the image sequence of single molecules, and the image sequence comprises an emission point, corresponding positioning coordinates, frame numbers and photon numbers.

Further, the single molecule positioning super-resolution microscopic imaging method comprises, but is not limited to, STORM, PALM, DNA-PAINT single molecule positioning super-resolution microscopic imaging technology.

Further, in step S103, the method for extracting single molecule data includes:

in the obtained image sequence, the emission light spot is fitted into a two-dimensional Gaussian function to determine the center position of the emission light spot;

drift correction is carried out on the positioning of each frame;

taking the central position as the center, and extracting the spatial distribution of each positioned light spot in a set pixel range;

clustering the positioning coordinates to obtain positioning sequences of each single molecule at different moments;

an imaging dataset of single molecule localization microscopy imaging is established, comprising image sequences of single molecules, each image sequence containing localization coordinates, frame number and photon number information.

Further, the training method of the random forest classification model comprises the following steps:

randomly selecting a corresponding number of light spots from a single-molecule library by using a random forest classification algorithm, placing the center points of the light spots in random positions in a simulated diffraction limit area, simulating super-bright light spots of a set type, submitting the spatial distribution of overlapped light spots as input vectors, distributing the number of light spots as labels, and generating a random tree-shaped predictor from each sample fraction;

and then, selecting one part of data in each type of super-bright light spots and single molecular type for training, and the other part of data for verification, optimizing parameters of a random forest algorithm, and finally obtaining a trained random forest classification model.

The random forest classification algorithm may be trained using a random forest function library of a language platform, including Matlab, python or a function library of random forests on a C language platform, and the like.

Further, the method for identifying the positioning number and calibrating the positioning information of the ultra-bright light spots comprises the following steps:

and extracting uncalibrated emission light spots from the image sequence of single-molecule positioning super-resolution microscopic imaging, and determining the potential positioning number of each light spot by applying a trained random forest classification model, thereby obtaining the calibrated positioning number of each ultrastructure.

Compared with the prior art, the application has the beneficial effects that: by constructing a random forest classification model, the classification precision of different types of super-bright light spots is improved, the positioning number is identified through the super-bright light spots, the positioning accuracy of the super-bright light spots is effectively improved, the quantitative and comparative analysis capability of single-molecule positioning super-resolution microscopic imaging on biological samples is improved, and the imaging capability of single-molecule positioning super-resolution microscopic imaging is remarkably improved.

Drawings

In order to more clearly illustrate the application or the solutions of the prior art, a brief description will be given below of the drawings used in the description of the embodiments or the prior art, it being obvious that the drawings in the description below are some embodiments of the application and that other drawings can be obtained from them without the inventive effort of a person skilled in the art.

FIG. 1 is a flow chart of the method of the present application;

FIG. 2 is a schematic diagram of the classification effect of the random forest classification model according to the present application, wherein a is a schematic diagram of classification precision and accuracy of the random forest classification model according to the present application, b is a schematic diagram of the total positioning number effect of the simulated micro-pipe section with different mark densities, and c is a schematic diagram of the correct positioning percentage at 10% to 90% of the label density;

FIG. 3 is a schematic diagram of calibration positioning in a non-uniform biological superstructure for simulating STORM imaging according to the present application, wherein a is a schematic diagram of simulated STORM imaging for the non-uniform superstructure and b is a model for classifying plaque types using random forests.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The application solves the technical problems of how to determine the positioning quantity of different types of super-bright light spots and improve the quantitative process of single-molecule super-resolution microscopic imaging in the process of reconstructing images by single-molecule positioning super-resolution microscopic imaging, wherein the super-bright light spots are light spots formed by simultaneous light emission of a plurality of molecules within the diffraction limit. Although the luminescence of densely distributed molecules can be separated in time by means of the optical scintillation technology, due to the random characteristic of optical scintillation, most biological samples can still exist in the single-molecule positioning super-resolution microscopic imaging process, and a light spot formed by simultaneous luminescence of a plurality of molecules within the diffraction limit is still possible, and is called as an ultra-bright light spot. The corresponding spot is called Ultra-N, depending on the number of molecules (N) within the diffraction limit. In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

The single-molecule positioning super-resolution microscopic imaging comprises a STORM, PALM, DNA-PAINT single-molecule positioning super-resolution microscopic imaging technology, and the super-bright light spot positioning identification method based on a random forest classification algorithm is applicable to the single-molecule positioning microscopic imaging technology and mainly described as STORM below.

As shown in FIG. 1, the method for positioning and identifying the ultra-bright light spots based on the random forest classification algorithm comprises the following steps:

step one, a single molecule library is established: a single molecule library for establishing analog STORM super-resolution imaging;

step two, the simulation step of the ultra-bright light spot data: randomly generating a plurality of position coordinates in a diffraction limit area, randomly extracting a corresponding number of single-molecule light spots from a single-molecule library, and overlapping the single-molecule light spots in a simulation area to enable the positioning coordinates of the single-molecule light spots to be respectively overlapped with the generated position coordinates;

step three, training a random forest classification model: training single molecules in the single molecule library by adopting a random forest algorithm to obtain the random forest classification model;

step four, identifying and calibrating: and carrying out positioning number identification and positioning information calibration on the ultra-bright light spots by adopting the random forest classification model and carrying out STORM imaging analysis on the biological sub-diffraction limit structure.

The method for positioning and identifying the superbright light spots based on the random forest classification algorithm can be applied to the technologies of quantification, comparison, analysis and the like of biological samples, so that the accuracy of quantitative analysis is improved, the method can also be applied to the technology of reconstructing super-resolution images, and the imaging capability of the super-resolution images can be effectively improved.

As a preferred embodiment of the present application, the present embodiment is applied to the technology of reconstructing super-resolution images, and the present application further includes a step five, i.e. a super-resolution image reconstruction step: and reconstructing a calibrated STORM super-resolution image based on the identified and calibrated superbright light spot information.

Specifically, as a detailed embodiment of the present application, the step of establishing the single molecule library of the present application includes the following sub-steps:

s101: and (3) preparing a single-molecule sample, and obtaining the single-molecule sample after fluorescent marking.

The specific preparation method comprises the following steps:

firstly, the cover glass is intensively cleaned by ultrasonic treatment in Milli-Q water (ultrapure water);

then, an oligonucleotide conjugated at the 5' -end with a single labeling functional group molecule, the concentration of which is 0.1-1.0nM, is immobilized on the glass surface, and then prepared in 4% paraformaldehyde;

thirdly, fixing fluorescent microspheres with the diameter of 200 nanometers on the surface of glass for a certain time at room temperature, and taking the fluorescent microspheres as a reference mark for correcting the drift of a molecular sample in an X-Y plane in the image acquisition process;

finally, the sample was washed with PBS (phosphate buffered saline) to remove unbound molecules.

The preparation method of the single molecule sample of the present application may be prepared by covalent coupling or non-covalent coupling, etc., and the preparation method is only one example.

In this example, alexaFluor647 fluorochromes are used, and any fluorochromes including AlexaFluor647, CF568 may be used.

S102: STORM super-resolution imaging: and adopting an imaging buffer solution to perform STORM super-resolution imaging of the single-molecule sample, and obtaining an image sequence of each single-molecule sample.

The imaging buffer has a number of different optimized versions, in this example imaging buffer containing 50mM Tris (Tris, pH 8.0), 10mM NaCl, 1% by volume beta-mercaptoethanol, 10% glucose, 0.5mg/mL glucose oxidase (G2133, sigma), 40. Mu.g/mL catalase (C30, sigma). STORM imaging was performed on individual single molecule samples, and 8000 frames of image sequences were collected per field of view at a frequency of 33 Hz.

Imaging buffers containing 50mM Tris (pH 8.0), 10mM NaCl, 0.7% beta-mercaptoethanol (v/v), 10% glucose (w/v), 0.55mg/mL glucose oxidase (G2133, sigma), and 40. Mu.g/mL catalase (C30, sigma) were also used in this example, pH was adjusted to 8.10.+ -. 0.05 using NaOH.

S103: extraction of single molecule data: based on the image sequence, a single-molecule STORM imaging data set is obtained, wherein the single-molecule STORM imaging data set comprises a single-molecule image sequence, and the image sequence comprises an emission point, corresponding positioning coordinates, a frame number and a photon number.

Specifically, the emission spot of this example in the obtained image sequence is fitted to a two-dimensional gaussian function to determine its center position (i.e., location). The image sequence is then drift corrected, fiducial markers such as immobilized fluorescent microbeads are added to the sample or software is used to analyze the correlation of the image sequence, and hardware corrections by fine tuning the stage, etc. And extracting photon number distribution of each light spot at 7 x 7 pixels 2 around the center of the light spot. The central coordinates of the spots are then clustered to obtain a localization sequence for each single molecule.

The present example establishes a single molecule library comprising 12000 image sequences of single molecules, each sequence comprising different emission points and their corresponding positioning coordinates, frame numbers, photon numbers, etc., to further simulate four types of Ultra-bright spots (comprising Ultra 1-4), and simulate biosub-diffraction limited structure STORM imaging.

Of course, the classification type of the present example is not limited to 4 kinds, and may be changed as required.

The example provides a training process of a random forest classification model, which is as follows:

and (3) training by using a TreeBagger function in a modeling tool MATLAB, randomly selecting a corresponding number of light spots from a single-molecule STORM imaging data set, placing the center point of the light spots in a random position in a simulated diffraction limit area, simulating Ultra-bright light spots (Ultra 2-4) of a set type, submitting the spatial distribution of overlapped light spots as input vectors, distributing the number of light spots as labels, and generating a random tree-shaped predictor from each sample fraction.

In this example, treebagger is just one embodiment, and the random forest algorithm of this example may also use random forest function libraries of various large language platforms, such as Matlab, python, C language.

Then, each type of three super-bright spots and a single molecular type contains 150000 pairs of input data and labels, wherein 120000 pairs are used for training and 30000 pairs are used for verification, and finally a trained random forest classification model is obtained.

In the identification and calibration step, the example can acquire reference facts through single-molecule imaging data and simulated data of various biological structures, correct experimental data to evaluate identification and calibration results, and simulate a biological sub-diffraction limit structure STORM imaging sequence as analysis.

This example simulates a STORM image of a1 micron micro-pipe segment with a mark density varying from 10% to 90% and a density profile of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% respectively. In this simulation, fluorophore molecules are randomly distributed on the microtubule structure according to the desired density, and then randomly selected scintillation sequences in a single molecule STORM imaging dataset are placed at designated locations on the simulated microtubules. The simulation is repeated five times for each simulation density, thereby improving the simulation accuracy.

The STORM imaging simulating the biological sub-diffraction limit structure also comprises the imaging simulating the STORM to the cell ultrastructure with the diffraction limit, and the simulation method comprises the following steps:

a regular or irregular circular model is used, with a diameter of about 240 nm, determined by the abbe limit, containing up to 550 molecules, assuming that the molecules are randomly distributed in a regular circular superstructure. To mimic the irregular circular structure of receptor clusters on cell membranes, the density of the center of the structure is higher than the density of the periphery. Each phosphor is randomly sampled as an ultrastructure of a single molecule STORM imaging dataset, including information about the corresponding location, spot, photon number, and scintillation, and in a sequence of simulated images of the ultrastructure, the emission spots in the same frame from different emission molecules are superimposed according to their corresponding center positions.

As one embodiment of the application, the method for identifying the positioning number and calibrating the positioning information of the ultra-bright light spots comprises the following steps:

extracting uncalibrated emission points from an image sequence of single-molecule positioning super-resolution microscopic imaging, and determining potential types of each positioning by applying a trained random forest classification model, thereby obtaining a calibrated positioning number of each ultrastructure, wherein the positioning number corresponds to the positioning position, photon number and scintillation information of a target light spot.

Model verification

As shown in a graph a in fig. 2, the random forest classification model of the present example performs classification verification on four types of light spots, the abscissa is the type, the four types are single-molecule light spots, and the simulated Ultra2, ultra3, and Ultra4 of the present example, and the ordinate is classification Precision (Accuracy), recall rate (recall), or Accuracy (Precision). As a result of verification, the single-molecule light spot precision (gray bars) and recall rate (white bars) levels reach over 96%, the Ultra-2 and Ultra-4 precision reaches about 80%, and the classification precision of the verification set reaches 81.75% (a straight line parallel to the abscissa in the figure).

As shown in fig. 2 b, this example simulates a stop image of a1 micron micro-pipe segment with a mark density varying from 10% to 90%. In this simulation, fluorophore molecules are randomly distributed across the microtubule structure according to the desired density. A randomly selected scintillation sequence from the single molecule stop imaging dataset is then placed at a designated location on the simulated microtube. The simulation was repeated five times for each simulated density. The total number of positions of the reference facts (left-most bar of each analog density) is calculated by summing the number of positions of all sample molecules, while the number of positions of the Uncalibrated samples (bars in the middle of each analog density) is calculated by fitting the spots in the analog sequence to a two-dimensional gaussian function. It can be seen that as the density of labels increases, the number of uncalibrated positions deviates from the number of reference-actual positions due to the increasing number of super-bright positions.

As shown in figure 2 c, a random forest classification model classifier is then applied to identify potential spot types in the simulated microtubules to determine how many basic molecules each spot contributes. For all simulated densities, the calibrated number of locations from the random forest classification model is closer to the reference fact than the uncalibrated number of locations. Notably, when considering the efficiency of classifying the position fix into the correct type, it can be found that up to 50% of the position fixes are not correctly assigned for those that are not calibrated. In contrast, after TreeBagger was used, there was a great improvement in classifying localization into the correct type, and the percentage of correct localization (perfect localization) reached more than 82%. Therefore, the application can effectively improve the classification accuracy and the positioning accuracy by adopting the random forest classification model.

As shown in figure 3, panel a, each superstructure exhibits a diffraction limit size, comprising 50-550 molecules, respectively, which in this example mimics a series of non-uniform ultrastructures of 50, 150, 250, 350, 450, 550 molecules with a higher photon count in the center than in the periphery, which mimics a different biological ultrastructure, such as receptor clusters on the cell membrane. From the simulated image sequence, uncalibrated transmit spots are extracted and a trained random forest classification model TreeBagger is applied to determine the potential type of each location, resulting in a calibrated number of locations for each ultrastructure. The number of positions from the random forest classification model TreeBagger shows a better correlation with the molecular density and a higher agreement with the reference fact (group-trunk) than the uncalibrated positions, as shown by the b-plot in fig. 3. Notably, treeBagger also shows a higher accuracy in classifying the spots as the correct type, as shown in figure 3, c.

In summary, although the ubiquitous existence of the super-bright positioning destroys the quantitative and comparative capabilities of the single-molecule super-resolution microscopy, by the establishment of the single-molecule library and the super-bright light spot positioning identification method based on the random forest classification algorithm provided by the embodiment of the application, the quantitative and comparative analysis capabilities of the single-molecule super-resolution microscopy imaging on biological specimens are greatly improved by analyzing a sufficient number of biological super-resolution maps and calibrating the super-bright positioning, and the capability of the single-molecule positioning super-resolution microscopy imaging is further remarkably improved.

The above embodiments are preferred embodiments of the present application, and are not intended to limit the scope of the present application, which includes but is not limited to the embodiments, and equivalent modifications according to the present application are within the scope of the present application.

Claims

1. The method for positioning and identifying the superbright light spots based on the random forest classification algorithm is characterized by comprising the following steps:

2. The method for positioning and identifying the superbright light spots based on the random forest classification algorithm as claimed in claim 1, wherein the method comprises the following steps: the method also comprises the step of reconstructing the super-resolution image: and reconstructing a calibrated single-molecule positioning microscopic imaging super-resolution image based on the identified and calibrated superbright light spot information.

3. The method for positioning and identifying the superbright light spots based on the random forest classification algorithm according to claim 1 or 2, wherein the method comprises the following steps of: the step of establishing the single molecule library comprises the following sub-steps:

s103: extraction of single molecule data: based on the image sequence, acquiring imaging data of each single-molecule positioning microscopic imaging to form a single-molecule library, wherein the single-molecule library is an imaging data set of each single-molecule imaging data, the imaging data set comprises an image sequence of single molecules, and the image sequence comprises an emission point, corresponding positioning coordinates, frame numbers and photon numbers.

4. A method for positioning and identifying a super bright light spot based on a random forest classification algorithm according to claim 3, wherein the method comprises the following steps: in step S101, the method of single molecule localization super-resolution microscopy imaging includes, but is not limited to, STORM, PALM, DNA-PAINT single molecule localization super-resolution microscopy imaging technique.

5. A method for positioning and identifying a super bright light spot based on a random forest classification algorithm according to claim 3, wherein the method comprises the following steps: in step S103, the method for extracting single molecule data includes:

drift correction is carried out on the positioning of each frame;

6. The method for positioning and identifying the superbright light spots based on the random forest classification algorithm according to claim 1 or 2, wherein the method comprises the following steps of: the training method of the random forest classification model comprises the following steps:

7. The method for positioning and identifying the superbright light spots based on the random forest classification algorithm according to claim 1 or 2, wherein the method comprises the following steps of: the method for identifying the positioning number and calibrating the positioning information of the ultra-bright light spots comprises the following steps: