Abstract
We use a deep neural network (NN) to detect and place region-of-interest (ROI) boxes around ultracold atom clouds in absorption and fluorescence images—with the ability to identify and bound multiple clouds within a single image. The NN also outputs segmentation masks that identify the size, shape and orientation of each cloud from which we extract the clouds' Gaussian parameters. This allows 2D Gaussian fits to be reliably seeded thereby enabling fully automatic image processing. The method developed performs significantly better than a more conventional method based on a standardized image analysis library (Scikit-image) both for identifying ROI and extracting Gaussian parameters.
Export citation and abstract BibTeX RIS
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
Deep neural networks (NNs) have revolutionized data analysis and led to automation of tasks that previously required human supervision. Image analysis has particularly benefited through the use of convolutional neural networks (CNNs) [1] and their derivatives which have allowed for image classification [2, 3], object detection [4, 5] and instance segmentation [6]. Although many of these NNs were developed for tasks such as facial recognition by social media networks [7, 8], they have also been used to identify laser modes [9], classify phases in condensed-matter systems [10, 11], reduce measurement errors for trapped-ion qubits [12] and process images from cold atom experiments [13–15]. In this work, we use an instance segmentation NN (see figure 1) to analyze experimental images containing atom clouds in magneto-optical traps (MOTs) and optical dipole traps (ODTs).
Neural networks consist of an input layer and an output layer with a number of intermediate hidden layers which are connected to one another via 'weights'. Rather than employing hard-coded algorithms, NNs learn to emulate data they encounter through training cycles, in which data is iteratively passed through the NN and the output compared to the 'ground truth'. The difference between the two is then used to update the weights between the NNs layers, thereby improving its accuracy. When employing supervised training, this requires a dataset which includes both input data and their associated ground truth values. For object detection NNs, these ground truth values are rectangular bounding boxes for each object, as well as labels classifying the object types in the bounding boxes. Instance segmentation NNs build upon object detection NNs by also requiring pixel-to-pixel segmentation masks in which image pixels comprising the object have mask values of one, whereas all other pixels have mask values of zero.
Our dataset consists of images of cold atom clouds in a MOT [16] and an ODT [17] (see figures 2(a)–(c)). Atom clouds in these traps form approximately Gaussian density distributions [18]. Fitting a cloud allows the parameters describing the distribution (Gaussian parameters) to be extracted and used to ascertain information such as the cloud's size and density. Furthermore, by using time-of-flight measurements [19] the temperature of the atoms can be determined.
Download figure:
Standard image High-resolution imageA region-of-interest (ROI) [20] centered on the atom cloud is used during fitting as objects in the image other than the atom cloud can cause an inaccurate fit (e.g. an atom cloud in another trap or extraneous noise). Additionally, decreasing the fit area can significantly decrease the fit time when using two-dimensional fitting. Manually determining the ROI is time-consuming when analyzing a large number of images and an algorithmic procedure is often employed, such as taking the 'center of mass' and then iteratively expanding the ROI around this point until the fractional enclosed 'power' exceeds some threshold. Another method involves performing connected component analysis on a binarized version of the image and then measuring the resulting regions using common image processing libraries [21]. However, if the image is noisy (e.g. contains fringing), the proposed ROIs will be inaccurate for these types of methods (section 6).
Here we propose a deep NN based approach to ROI determination in which a NN finds the ROI for each atom cloud in an image (see figure 1). Furthermore, the NN differentiates between clouds (classification) in a MOT and those in an ODT and also outputs a segmentation mask for each cloud from which Gaussian parameters are directly extracted. The classification feature is particularly useful when the cloud type is not a priori known from the experimental sequence (e.g. images containing both MOT and ODT clouds during the ODT loading). Although features such as the position or aspect ratio can be used for classification with additional manual or experimental input, the NN needs the image alone (beyond training on an appropriate dataset) to determine cloud types and so is easily adaptable to other cold atom experiments with different numbers of clouds or cloud types.
The rest of the article is arranged as follows: section 2 describes the experimental dataset used for NN training and validation, section 3 describes the training process, section 4 discusses Bayesian optimization (BO) [22, 23] of the NNs hyperparameters [23] and section 5 examines how the Gaussian parameters are calculated from the segmentation mask. In section 6 we compare our proposed NN method to a more conventional method of determining both ROIs and Gaussian parameters.
2. Experiment and dataset
To produce the ultracold atom clouds, erbium atoms are initially trapped and cooled to ∼20 µK in a narrow-line MOT [24] before being loaded into an ODT formed from a 30 W, 1030 nm laser beam focused down to a ∼40 µm waist (see figure 2). Optimization of the trap loading involves maximizing the atom number while minimizing the cloud temperature. The atom number is found by fitting the atom cloud in either a fluorescence or absorption image 4 with a two-dimensional (2D) Gaussian (see equation (1)) and then integrating under the curve. The cloud temperature can be determined from how the cloud width evolves during time-of-flight expansion.
The experimental dataset consists of 260 fluorescence and absorption images (1936 × 1216 pixels) along with their ROIs, labels and segmentation masks [25]. Of these, 130 images contain clouds released from the MOT with no ODT present (see figure 2(b)) and 130 images contain either just atoms released from the ODT (see figure 2(c)) or images where atoms released from the ODT and the MOT are both present (see figure 2(d)). We manually label the atom clouds in the images as 'MOT' or 'ODT' and draw a ROI box at the clouds' edges which we define as the 1/e2 radii along the x and y axes (see figure 2(d)). This definition prevents the ROI boxes from overlapping when the MOT and ODT are both present; however, the ROI boxes are also easily expandable when analysis requires the wings of the distribution.
The manually drawn ROIs were expanded by a factor of two—excepting where the expanded ROIs would overlap—and the atom clouds inside fit with a 2D Gaussian:
where I(x, y) is the image intensity, is the background intensity, I0 is the peak intensity, x0 and y0 are the center coordinates, wx and wy are the 1/e2 radii along the major and minor axes and θ is the angular orientation of the distribution. To increase the accuracy of the ROIs used for training, the fit parameters were used to calculate the 1/e2 radii along the image axes [9] (previously estimated by eye) and the ROI boxes redrawn using these values. The process of fitting and redrawing the ROI boxes from the fit parameters was then completed a second time with subsequent iterations neglected due to an insignificant increase in accuracy.
A segmentation mask was generated for each atom cloud (see figure 2(e)) with the mask borders placed at the 1/e2 contour of the cloud—calculated from the final fit parameters; pixels within the 1/e2 contour were set to one, whereas pixels outside were set to zero. Finally, the dataset was randomly split into a training dataset with 200 images and a validation dataset with 60 images.
3. Neural network and training
We use the NN Mask R-CNN [26] to detect and bound the atom clouds, as well as to provide segmentation masks for each cloud (see figures 3(a, b)). The NN (see figure 1) begins with the experimental image being fed into a CNN base (ResNet-50 [27]). The CNNs outputted feature map [28] is then passed into a region-proposal network (RPN) which generates ROIs where objects are likely located. Next, these ROIs are cropped from the CNNs feature map in a ROI alignment stage and resized to uniform dimensions. The cropped feature maps are then fed into three parallel branches. The first applies a classifier to determine the object type and give the confidence of its prediction—which is helpful in determining whether to use the ROI in post-NN analysis. The second branch gives a more accurate ROI box prediction (see figure 3(a)) and finally the third outputs a segmentation mask for the object inside the ROI (see figure 3(b)). Since all three branches share the same base, computation speed is significantly increased [29] for both training and evaluation.
Download figure:
Standard image High-resolution imageDuring training, the NN output is compared to the ground truth (i.e. the expected output from the training dataset) via a loss function; the loss is then back-propagated [30] through the NN to adjust the weights between layers and refine the NN model (see figure 4(a)). The loss function for the RPN stage is L1 loss [31], for the ROI branch it is Smooth L1 loss [32], the classifier uses categorical cross entropy loss [33] and lastly the mask branch utilizes binary cross entropy loss [34]. Although the loss can be separately back-propagated for each branch, a simpler approach is taken here in which the losses are summed together and then back-propagated to update the weights of the NN [26].
Download figure:
Standard image High-resolution imageTable 1. The hyperparameter search space used during Bayesian optimization along with the hyperparameter values used to train the NN to the highest average mAP (see figures 4(a) and 5).
Parameters | Lower bound | Upper bound | Log scale | Best value |
---|---|---|---|---|
Learning rate | 0.0001 | 0.009 | Yes | 0.0033 |
Momentum | 0.7 | 0.92 | No | 0.86 |
LRS step size | 3 | 15 | No | 13 |
LRS decay | 0.001 | 1 | Yes | 0.049 |
An epoch denotes a single cycle of training in which every image in the training dataset is passed through the NN, the loss calculated and the model weights updated. Increasing the number of training epochs can increase the NN's final accuracy so long as the NN does not overfit on the training data. However, due to finite computational resources, we restrict the training of an individual NN to 15 epochs—determined to be sufficient, see below—and instead use BO to increase the accuracy of the final trained model (see section 4).
Five hyperparameters, which tune the learning process, are set before the NN's training phase. The first is the learning rate which determines the size of the step the NN takes during stochastic gradient descent [35]. If the learning rate is too low, the NN will take too long to converge to the minimum of the loss function, whereas if the learning rate is too high, the NN will not be able to descend to the minimum, but will oscillate around it or diverge. Since larger learning rates are useful at the beginning of training and smaller learning rates are useful towards the end, a learning rate scheduler [36] is employed which decreases the learning rate by some scalar (decay, the second hyperparameter) after a fixed number of epochs (step size, the third hyperparameter). The fourth hyperparameter is the momentum which prevents the NN from getting stuck in a local minimum during training [37]. The last hyperparameter is the batch size—the number of images simultaneously passed through the NN—and, unlike the other hyperparameters which are tuned with BO, is fixed to four for all NNs.
The accuracy of the NN is evaluated (see figure 4(b)) using the Common Objects in Context (COCO) evaluation metric which entails calculating the intersection-over-union [38]:
where is the ROI prediction from the NN and is the ground truth ROI. Values below a predetermined IoU threshold (e.g. IoU = 0.5) are considered a false prediction—mislabelling the object class within the ROI is also a false prediction—whereas values above are considered a true prediction. A precision recall-curve [39] is then constructed and integrated to give the average precision (AP) for the given threshold (e.g. AP50 for the IoU = 0.5 threshold). Finally, the AP is calculated for ten IoU thresholds (0.50–0.95 with a step size of 0.05) and averaged to give the mean average precision (mAP)—which is the metric generally reported for object detection. The mask AP values and mAP can be similarly calculated [40].
The NNs are trained in a Google Colab notebook [41] utilizing a GPU backend and implemented in PyTorch [42] using the pre-built Mask-RCNN model in the Torchvision package. Rather than training the NN from scratch, the model weights are loaded from a network pre-trained on the COCO train2017 dataset [43]—which significantly reduces the time required to train the network and increases the model's final accuracy [44]. Additionally, this transfer learning, complemented by data augmentation (in which images are randomly flipped horizontally during training), allows us to use a relatively small training dataset. After each training epoch, the updated NN is evaluated on the validation dataset which yields the mAP for both the object detection and the segmentation mask branches. These two values are averaged together to give the accuracy of the training epoch (average mAP) and the epoch with the highest average mAP determines the overall accuracy of the NN model. After using BO to determine the hyperparameters which give the highest accuracy, we retrain the NN with these parameters for 30 epochs (see figure 4(c)) and verify that the NNs accuracy becomes asymptotic at the 15 epoch mark—with no overfitting seen thereafter.
4. Bayesian optimization of hyperparameters
The accuracy of the trained NN is sensitive to the hyperparameters used during training. In the past, the hyperparameters were tuned through grid search, random search [45] or by hand; however, in recent years, BO has been successfully employed to find the best set of hyperparameters [23, 46]. BO is particularly useful when trying to find the minimum (or maximum) of a function which is noisy and expensive to evaluate—such as NN training—thereby making a grid search of the parameter space impractical [22].
Bayesian optimization takes the function value (cost) at previously evaluated points and uses a Gaussian process to model the cost as a function of the parameter space [47]. The model also determines the confidence of its predictions in a given region of the parameter space: perfect certainty at evaluated points, low uncertainty near evaluated points and high uncertainty far from evaluated points. The BO loop then determines where to evaluate the function next by weighing the benefits of evaluating the function near the model's predicted minimum (or maximum) or evaluating the function in an unexplored region of the parameter space [48].
When optimizing our NN training with BO, the average mAP of the NN is evaluated as a function of the hyperparameter space which consists of the learning rate, momentum, and the learning rate scheduler (LRS) step size and decay (see table 1). As a warm start, the NN is initially trained and evaluated at five quasi-randomly (Sobol generated [49]) distributed points (see black squares in figures 5(a)–(f)). Further evaluation points (red circles in figures 5(a)–(f)) are iteratively determined by the BO loop. The Ax Python package is used for the BO loop as it provides a high level interface to the BoTorch [50] BO package.
Download figure:
Standard image High-resolution imageWith an increasing number of BO evaluation trials the best achieved average mAP rises and converges (see figure 5(g)). The best set of hyperparameters (see table 1) gives a mAP of 86.3% for the object detection branch—locating 88 of the 89 clouds—and a mAP of 85.8% for the mask branch. These mAP values are higher than those for similar NNs trained on the COCO validation dataset [26, 51] which is likely due to our NN only needing to classify two object types rather than the eighty in the COCO validation dataset [38], as well as the relatively simple features of the MOT and ODT clouds. The NN also performs well for clouds with a low signal-to-noise ratio (SNR) which we define as the cloud's peak amplitude divided by the standard deviation of the image's background intensity; the validation dataset's nosiest cloud has an SNR of only 2.1 but returns an ROI IoU score of 0.86.
5. Gaussian parameter analysis
We can extract the parameters characterizing the cloud's 2D Gaussian distribution (see equation (1)) directly from the NNs segmentation mask output (see figure 6(a)). Taking the 1st moments of the mask yields the center coordinates of the atom cloud, whereas the 1/e2 radii and the angular orientation θ can be determined by calculating the 2nd moments [52] of the mask. The background intensity is calculated by generating a histogram of the experimental image's pixel values (where there is no cloud) and then taking the peak position (see figure 6(c)). The amplitude of the 2D Gaussian I0 is calculated by summing the image's intensity inside the 1/e2 contour to find the power (P) and then applying:
This method of extracting clouds' Gaussian parameters directly from the segmentation mask was applied to the validation dataset (see figure 3(c)) and the results compared to the Gaussian parameters extracted via a 2D fit of the experimental image (see figure 7, blue bars). Apart from θ, we normalized the differences to obtain a relative error. The cloud center and the cloud radii were divided by the fitted radii, whereas the peak amplitude I0 and the background intensity were both divided by the fitted peak amplitude.
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageThe Gaussian parameters extracted from the NNs segmentation mask can either be used directly or as seed parameters for a conventional 2D fit—increasing the likelihood of fit convergence and reducing the fitting time. Without a seed, the fit time for a full image (1936 × 1216 pixels) is approximately 15 s on an Intel Xeon 2.2 GHz processor core. For the validation dataset, this is reduced to an average of 3 s when using the ROIs proposed by the NN—expanded by a factor of two except where the expanded ROIs would overlap—which substantially increases the fit speed despite the NNs processing time of 0.17 s per image using a Nvidia Tesla V100 GPU. Extracting the Gaussian parameters takes an average of 0.18 s, but results in an average fit speed up of 2 s per region when used as a seed. Thus, our method of determining ROIs and seed parameters using a NN offers a significant speedup in conjunction with fitting.
6. Comparison with conventional analysis
For comparison with the NN, a conventional method is used to find the ROIs and Gaussian parameters of the clouds (see figure 6(b)). We utilize standard methods based on Python's Scikit-image (Skimage) package [21] to pre-process the images, determine ROIs for the clouds, and extract the Gaussian parameters.
During pre-processing, a histogram is taken of the image's pixel intensities. As the cloud areas are much smaller than that of the overall image an approximately Gaussian peak corresponding to the image's background level and noise is seen. The mean µ and standard deviation σ of a Gaussian fit to this peak are used to define a threshold value [53] and the image is binarized by setting pixels with to one and the remainder to zero.
After pre-processing, connected component analysis [54] is applied to the binarized image which groups pixels of the same value together into 'regions' (e.g. an atomic cloud). These regions are fed into Skimage's regionprops method which returns both the ROI coordinates and the geometric parameters for each region. ROIs with an area or the image size (i.e. much larger or smaller than the size of the clouds under measurement) are removed and the two largest remaining ROIs—if there are more than one—are taken as the MOT and ODT ROIs since smaller regions generally coincide with remaining high noise regions.
Applying this method to the validation set locates 68 of the 89 clouds with an ; this is significantly worse than the NN which locates 88 of the 89 clouds. The mAP is also calculated, but since the conventional method does not differentiate between MOT and ODT clouds, all the validation dataset regions are relabelled as 'cloud'. Even with this simplification, an mAP of only 11.1% is achieved—much lower than the NN's object detection mAP of 86.3%. On closer inspection, the conventional method fails on images with low SNR such as images with significant fringing.
To determine the Gaussian parameters of the 68 successfully bounded clouds we apply a numerical scaling factor (to account for our thresholding method [52]) to the returned from regionprops and then generate a binary mask from the resulting which is used to find the cloud's background intensity and amplitude I0 (see figure 6(c)) as in section 5. The parameters are normalized and compared to those from the 2D fit (see figure 7, orange bars). Since the conventional method only recognizes a subset of the clouds, we quantitatively compare against the NN segmentation method for the same cloud subset (shown as filled blue bars in figure 7); a direct comparison of the root mean squared errors is shown in figure 7(h).
When considering both object detection and Gaussian parameter extraction, the NN method significantly outperforms the conventional method. Additionally, the conventional method's efficacy depends on the SNR of the data, whereas the NN is more robust against high noise levels and fringing since it learns higher level features of the atom clouds. Furthermore, the conventional method requires manual determination of the best pre-processing procedures and therefore potentially needs further tuning for new data; however, when faced with new data the NN simply needs to be retrained with more labelled data, thus making it effective in a laboratory setting.
7. Conclusion
An instance segmentation NN (Mask R-CNN) was trained to identify ultracold atom clouds in MOTs and ODTs. The NN generates both a ROI and a segmentation mask for each cloud—corresponding to the cloud's 1/e2 radii—with a mAP of 86.3% and 85.8% on the ROI and the segmentation mask branches, respectively. We show that the Gaussian parameters describing the atom clouds' distributions can also be extracted directly from the segmentation masks. Both ROI determination and Gaussian parameter extraction via the NN are significantly more accurate than a conventional method based on Python's Scikit-image library.
With an appropriate training dataset these techniques could be applied to ultracold atom clouds in other traps such as optical lattices [55, 56] and box potentials [57]; they are also directly applicable to laser beam profiling and other machine vision applications which require analysis of one or more 2D Gaussian distributions.
In the future a custom NN could be created by adding a branch after the ROI alignment stage which would output the cloud parameters directly. This would enable the characterization of non-Gaussian density profiles, useful, for example, in the detection, identification and parameterization of the bimodal clouds seen when a Bose–Einstein condensate [58, 59] is present.
Acknowledgments
We thank Elliot Bentine, Shu Ishida and Jirka Kučera for helpful discussions and comments on the manuscript. This work was supported by EPSRC Grant No. EP/P009565/1, the John Fell Oxford University Press (OUP) Research Fund and the Royal Society. M K acknowledges funding from Trinity College, University of Cambridge.
Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https://doi.org/10.5287/bodleian:Y56kydgmj. Data will be available from 7 January 2021.
Conflict of interest
The authors declare no conflicts of interest.
Footnotes
- 4
Experimental images shown in the paper are processed from two to three raw images. Fluorescence images require both an image with atoms and a background image without atoms. Absorption images additionally require a probe image in which the probe beam is turned on, but no atoms are present.