Open AccessArticle

MPG-Net: A Semantic Segmentation Model for Extracting Aquaculture Ponds in Coastal Areas from Sentinel-2 MSI and Planet SuperDove Images

Yuyang Chen

^1,2,3,

Li Zhang

^1,2

Bowei Chen

^1,2,*

Jian Zuo

^1,2,4 and

Yingwen Hu

^1,2,4

International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China

Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

Land Resource Surveying and Mapping Institute of Guang Xi Province, Nanning 530022, China

⁴

University of Chinese Academy of Sciences, Beijing 100049, China

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(20), 3760; https://doi.org/10.3390/rs16203760

Submission received: 8 August 2024 / Revised: 27 September 2024 / Accepted: 2 October 2024 / Published: 10 October 2024

(This article belongs to the Special Issue Monitoring Coastal and Marine Environments Based on Remote Sensing Data)

Download

Browse Figures

Figure 1
Map of the study area. In the figure, the top half is the location of the study area, and the bottom half is a standard false-color map of Planet (a) and Sentinel-2 (b). (A,C) are the aquaculture areas of Yingluo Harbor, and (B,D) are the aquaculture areas of Anpu Harbor. "> Figure 2
The construction of U-Net. "> Figure 3
The structure of MPG-Net. MS and PGC are the two improved structures proposed in this study. "> Figure 4
The MS structure. Inception module on the left. Dilated residual module with Dilation rate equal to 5 on the right. "> Figure 5
The PGC structure. The upper branch is the bottleneck module and GC module, and the lower branch is the PSA module. "> Figure 6
The construction process of the aquaculture pond extraction model. The top, middle, and bottom sections of the figure represent data cropping, data enhancement, and model training and prediction, respectively. "> Figure 7
Results of testing set segmentation of aquaculture ponds on Sentinel-2 dataset with different models. "> Figure 8
Results of testing set segmentation of aquaculture ponds on Planet dataset with different models. "> Figure 9
Results of ablation experiments on Sentinel-2 testing set. "> Figure 10
Results of ablation experiments on the Planet testing set. "> Figure 11
The extraction results of Yingluo Harbor. Frames (a,b) original images, (c,d) extraction results, and (e,f) accuracy maps. "> Figure 12
The extraction results of Anpu Harbor. Frames (a,b) original images, (c,d) extraction results, and (e,f) accuracy maps. ">

Versions Notes

Abstract

Achieving precise and swift monitoring of aquaculture ponds in coastal regions is essential for the scientific planning of spatial layouts in aquaculture zones and the advancement of ecological sustainability in coastal areas. However, because the distribution of many land types in coastal areas and the complex spectral features of remote sensing images are prone to the phenomenon of ‘same spectrum heterogeneous objects’, the current deep learning model is susceptible to background noise interference in the face of complex backgrounds, resulting in poor model generalization ability. Moreover, with the image features of aquaculture ponds of different scales, the model has limited multi-scale feature extraction ability, making it difficult to extract effective edge features. To address these issues, this work suggests a novel semantic segmentation model for aquaculture ponds called MPG-Net, which is based on an enhanced version of the U-Net model and primarily comprises two structures: MS and PGC. The MS structure integrates the Inception module and the Dilated residual module in order to enhance the model’s ability to extract the features of aquaculture ponds and effectively solve the problem of gradient disappearance in the training of the model; the PGC structure integrates the Global Context module and the Polarized Self-Attention in order to enhance the model’s ability to understand the contextual semantic information and reduce the interference of redundant information. Using Sentinel-2 and Planet images as data sources, the effectiveness of the refined method is confirmed through ablation experiments conducted on the two structures. The experimental comparison using the FCN8S, SegNet, U-Net, and DeepLabV3 classical semantic segmentation models shows that the MPG-Net model outperforms the other four models in all four precision evaluation indicators; the average values of precision, recall, IoU, and F1-Score of the two image datasets with different resolutions are 94.95%, 92.95%, 88.57%, and 93.94%, respectively. These values prove that the MPG-Net model has better robustness and generalization ability, which can reduce the interference of irrelevant information, effectively improve the extraction precision of individual aquaculture ponds, and significantly reduce the edge adhesion of aquaculture ponds in the extraction results, thereby offering new technical support for the automatic extraction of aquaculture ponds in coastal areas.

Keywords:

MPG-Net; deep learning; remote sensing image; aquaculture pond; segmentation

1. Introduction

Aquaculture serves as a crucial protein source for humans and is essential for global food security. China makes up 35 percent of the world’s total production of aquatic products and is both the world’s largest producer and consumer of fisheries products [1]. In recent decades, the rapid expansion of China’s coastal aquaculture industry has produced huge economic benefits; however, it also caused some damage to the ecological environment of the coastal zone [2,3,4], e.g., exacerbating eutrophication of coastal waters [5], occupying a large number of wetland resources [6], and destroying mangrove protected areas [7,8]. Pond aquaculture, as one of the primary methods practiced in China’s coastal areas, faces numerous challenges, including aquaculture pollution which prevents the local aquaculture business from growing healthily. Therefore, understanding the spatial distribution of aquaculture ponds in China’s coastal areas is crucial for the sustainable growth of aquaculture and the scientific management of coastal zones, guaranteeing ecological security and food safety in coastal areas.

The limitations of field surveys can be addressed by satellite remote sensing technology thanks to its benefits of a wide detection range, time efficiency, and cost effectiveness [9], making it an essential tool for tracking and researching the spatial distribution of coastal aquaculture ponds [10,11,12,13,14]. Using remote sensing imagery, researchers have developed a number of techniques in recent years for locating coastal aquaculture ponds. These methods include the ratio index method, object-oriented classification, machine learning classification, and deep learning techniques. The basic principle of ratio-based index construction is to utilize the ratio of reflectance or radiance brightness values of each pixel in different bands in remote sensing images to construct new indices in order to reduce the influence of image background information and to highlight the radiance characteristics of aquaculture ponds [15,16,17,18,19]. Although this method is simple, in the face of complex waters, the phenomenon of “same spectrum, heterogeneous objects”, relying only on the spectral characteristics of the difference is likely to lead to misclassification moreover, there is an insurmountable “salt and pepper” noise [20]. The basic idea of object-oriented classification is to divide the image elements into a collection of objects with spatial relationships and then formulate the corresponding classification rules based on the spectral features, shape, texture, proximity, and other characteristics to classify them, which can achieve higher accuracy of the extraction of aquaculture ponds [21,22,23,24]. The object-oriented classification method can effectively suppress the “salt and pepper” noise in image classification, and its classification effect usually improves with the increase in image resolution. However, its segmentation parameter selection often depends on the researcher’s empirical knowledge and requires repeated experiments [25]. The basic idea of machine learning classification is to learn the features of feature types in remote sensing images through training labeled samples and to automatically classify features into different categories based on these features. Commonly used machine learning algorithms in the extraction of aquaculture ponds are decision trees [26,27], support vector machines [28,29], and random forests [30,31,32]. Despite the widespread use of machine learning algorithms by researchers for remote sensing image recognition and classification, as well as the notable success achieved, there are still problems, such as feature selection—which requires researchers to have specialized knowledge, shallow algorithm structure, difficulty with improving classification accuracy and with matching the degree of automation needed to meet the demand, and so on [33].

Deep learning methods are based on artificial neural networks and realize the task of learning the features of target features by constructing connections between multi-layer neurons to identify the target features [34], which are nowadays widely used in the identification of aquaculture regions. Compared with the aforementioned methods, deep learning methods—with their stronger generalization ability—can achieve higher extraction accuracy, which can then meet the extraction requirements of coastal aquaculture ponds. Cheng et al. integrated U-Net with Hybrid Dilated Convolution to simultaneously expand the receptive field and avoid the ‘gridding’ issue, which thus enhances the ability of the network model to learn high-level features [35]. Lu et al. improved the ASPP structure and the up-sampling structure in order to improve the network model’s ability to understand semantic and location information and, therefore, significantly reduced the edge stickiness in aquaculture areas [36]. Su et al. employed residual hybrid dilated convolution blocks and dual-channel convolution blocks to enhance the U-Net’s up- and down-sampling structures, respectively, and to overcome the turbid environment of the water body and the different scales of the rafts in aquaculture areas [37]. Chen et al. used the U²-Net model to extract aquaculture ponds in the Zhoushan Archipelago, China, and have analyzed the spatiotemporal evolution of coastal aquaculture ponds in the region over the past 38 years. The model effectively captures multi-scale contextual information to address the challenges of “same object, different spectra” and “different objects, same spectrum.” However, the study focused on the extraction of entire aquaculture regions and did not achieve segmentation of individual aquaculture ponds [38]. This method effectively suppresses the phenomenon of “same spectrum heterogeneous objects“ that occurs in the traditional method, improves the blurring of the edges of aquaculture area extraction, and realizes the extraction of aquaculture ponds quickly and accurately.

Remarkable advancements have been achieved in the extraction of coastal aquaculture ponds through the use of deep learning techniques. Nevertheless, single-resolution remote sensing pictures were used to extract the majority of the aforementioned investigations but could not verify the model’s effect on the extraction of aquaculture ponds for remote sensing images of different scales and spatial resolutions. Therefore, it is necessary to construct an extraction model of aquaculture ponds applicable to multi-source and multi-resolution remote sensing images. Currently, there are three key problems in the construction of a coastal aquaculture pond extraction model: (1) Complex and diverse feature types are distributed around the coastal aquaculture ponds, such as paddy fields, lakes, and rivers—which are all comparable to the spectral features of the aquaculture ponds and produce too much redundant information interference in the extraction process. (2) Currently, in the task of aquaculture pond segmentation, studies that focus on the extraction of individual ponds are relatively few, with most research targeting the extraction of entire aquaculture regions. Moreover, most of the studies aim at extracting the whole aquaculture area, usually ignoring the effect of embankments between aquaculture ponds, which results in a large difference between the extracted aquaculture pond area and the actual area and is not conducive to the optimization of the layout of the seawater aquaculture industry and the industrial structure. On the other hand, by treating the whole farming area as an object during training sample production, it usually becomes mixed with non-farming ponds, which makes the model learn non-farming pond features as well during the training process, resulting in poor segmentation performance. (3) Current deep learning models with limited multi-scale feature extraction and a weak combination of global context and local features make it difficult to extract deep features from images.

The U-Net model is characterized by a simple structure and fewer parameters and has good performance on image segmentation tasks. Consequently, we suggest a novel semantic segmentation approach to solve the aforementioned issues—MPG-Net—which takes the U-Net model as the basis, improves the convolutional layer and skip connection, and proposes two new structures, MS and PGC. The MS structure integrates the Inception module and the Dilated residual module instead of the traditional convolutional layer of the U-Net model in order to extract the multi-scale features of the aquaculture ponds and effectively solves the problem of the vanishing gradient of the training of the model; The PGC structure incorporates the Global Context module and Polarized Self-Attention to maximize the reuse of the output features of the encoded part of the skip connection in order to enhance the contextual semantic information and reduce the interference of redundant information. Our main contribution is to propose a new semantic segmentation model, MPG-Net, which can extract individual aquaculture ponds accurately from remote sensing images of different resolutions in complex environmental contexts and can provide an important reference value for the optimization of the spatial layout of aquaculture and the sustainable development of resources in coastal areas.

2. Materials

2.1. Study Area

The study area comprises two typical aquaculture areas in the eastern Beibu Gulf of Guangxi—Yingluo Harbor and Anpu Harbor—with geographic coordinates ranging from 109°41′E to 109°57′E and from 21°25′N to 21°36′N, respectively, as shown in Figure 1. The region has a semi-tropical monsoon climate with low waves and excellent water quality in the bay, which creates highly favorable conditions for the growth of coastal aquaculture, and thus, numerous aquaculture ponds are distributed here. The aquaculture ponds within the research area are mainly located in low-lying areas such as tidal flats, which are generally independent and closed water bodies [39]. The area’s aquaculture sector has grown in size during the last few years, and the coastal shrimp ponds and aquaculture ponds have been farmed in a disorderly manner, resulting in eutrophication of the surrounding seawater and harm to the ecological balance of the mangrove reserve [40,41,42]. Consequently, developing an efficient and precise automatic extraction method for aquaculture ponds using remote sensing is crucial. This approach aids in comprehending the spatial distribution of these ponds, thereby making it easier to monitor and safeguard the natural environment of the coastal zone. However, many different types of features, such as paddy fields, rivers, lakes, and construction sites, are extensively dispersed throughout the research area, and the physical geography is relatively complex, posing certain challenges to the extraction of aquaculture pond information.

2.2. Data

For this study, we used Sentinel-2 Multispectral Scan Imaging (MSI), which has a spatial resolution of 10 m. We pre-processed the data on the GEE (https://earthengine.google.com) platform with de-clouding, atmospheric correction, and radiometric correction to obtain Sentinel-2 MSI (Level-2A) images covering the study area. The Sentinel-2 MSI median image data from October 2020 contain 4 bands—red, green, blue, and NIR—with a pixel size of 7696 × 7059. In addition, we used the Planet SuperDove satellite image from October 2020, with a spatial resolution of 3 m, containing 4 bands—red, green, blue, and NIR—with a pixel size of 8988 × 8088. Table 1 shows the band information for the Sentinel-2 MSI and the Planet SuperDove sensors needed for this study. The fusion of the two is complementary in terms of spatial and temporal resolution and spectral richness, which can significantly improve the segmentation accuracy of aquaculture ponds. In future research, we will further explore how to optimize the combined application of these two types data, especially the detailed segmentation under different environmental conditions, in order to fully reflect the advantages of their applied research.

3. Methodology

Because of the high dimensionality of remote sensing images and the complexity and variability of the natural geographic environment of aquaculture pond distribution, it is difficult for the current shallow network model to overcome the interference of strong background information to extract effective image features for accurate segmentation of aquaculture ponds. Therefore, in this research, the U-Net model [43] is used as the basis for improvement to obtain a new model suitable for aquaculture pond segmentation.

3.1. MPG-Net Model

U-Net is a semantic segmentation model based on FCN [44] with two sections—the encoding side and the decoding side. These two parts are linked via an intermediary layer. While the decoding part gradually returns the feature maps from the encoding phase to the original picture size, the encoding section collects the features from the image. As a result, the capacity to extract features at the encoding end and the picture recoverability at the decoding end directly impact the network’s overall performance. In this work, the U-Net model is an important structure and key technology that can effectively carry out the task of image semantic segmentation and enhance the precision and effectiveness of image processing. Figure 2 illustrates its organizational structure.

In this research, the proposed MPG-Net model is mainly improved based on the initial U-Net model, and we use the MS structure instead of the initial traditional convolutional layer and add the PGC attention structure in the skip connection, which provides the model with stronger feature extraction capability. The structure of the model mainly includes the encoding part, decoding part, and skip connection, and the operations used include convolutional operation (MS structure), up-sampling, max pooling, skip connection, and a PGC feature enhancement filtering mechanism; its structure is shown in Figure 3.

The coding part of the initial U-Net model: the coding part can be divided into five levels, each of which is composed of two 3 × 3 convolutions, one Relu activation function, and one max pooling layer. In the MPG-Net model, the coding part is also divided into five levels, and each level comprises the MS structure, which is mainly composed of the Inception [20] structure, Dilated residual module, and skip connection. These constituents can increase the network width and depth so that the model can better perform multi-scale extractions of image features and ensure a stable representation of features by the model. Moreover, each convolutional layer is followed by a Relu layer to speed up the convergence of the model and a BN layer to alleviate the gradient disappearance problem of the network. Finally, down-sampling is performed by the max pooling operation after the MS structure.

The skip connection part of the initial U-Net model: in the U-Net model, the skip connection is a key connection method to pass the feature map of the coding part directly to the decoding part and to perform the step-by-step recovery of the features in the decoding part, which is crucial for the multiplexing and effective propagation of the output features of the coding part. However, this traditional skip connection is prone to extracting redundant image information, while suffering from the problem of information loss due to continuous down-sampling in the coding part, which leads to a degradation of model performance. To address these issues, this study introduced the PGC attention mechanism, which maximizes the reuse of the output features of the coded part of the skip connection while greatly reducing the feature loss caused by down-sampling. By incorporating the PGC attention mechanism in the process of using skip connections, the utilization of the aquaculture pond features can be effectively improved, thus further improving the accuracy and robustness of the entire U-Net model.

The decoding part of the initial U-Net model: the decoding part can also be divided into five levels. The first four levels all consist of up-sampling and a 3 × 3 convolution; the fifth level is the output layer. In the MPG-Net model, our decoding part uses the MS structure instead of the original convolution operation. Simultaneously, as in the initial U-Net model, a 1 × 1 convolution is still used to map the output feature vectors of the last layer into a class label. The enhanced structure mentioned above increases the clarity and accuracy of the edges when performing aquaculture pond segmentation.

3.2. The MS Structure

Since the scale of aquaculture ponds in remote sensing images with different spatial resolutions varies, a new multi-scale feature extraction structure (MS structure) is proposed in this paper to accurately recognize aquaculture ponds with varying scales in remote sensing images. As shown in Figure 4, the structure consists of an Inception module [45], a Dilated residual module, and a skip connection. The MS structure is effective in extracting feature information from aquaculture ponds, strengthening the model’s ability to express the features, and improving the recognition accuracy and efficiency of the aquaculture ponds part of the remotely sensed image.

As shown in Figure 4, firstly, the input features go through four branches of the Inception module, each of which performs 1 × 1, 3 × 3, 5 × 5, and 7 × 7 convolutions, respectively. Each convolutional kernel size is good at capturing features at different scales, and, by concatenating these different sized convolutional kernels, the network is able to capture features at different scales simultaneously at the same layer, which also avoids the loss of information that may be caused by a single convolutional kernel in the U-Net model. Then, the features are fused by the Concatenate splicing operation and fed into the Dilated residual module with a Dilation rate equal to 5. Further, the fused information is extracted, and finally, the inputs and outputs are fused by a long skip connection. Therefore, the multi-scale feature extraction capability of the MS structure enhances the model’s ability to capture image features, enhance the model’s reuse of features and the accuracy of expression, reduce the risk of network fitting, and effectively improve recognition accuracy.

3.3. The PGC Structure

Due to the complexity and diversity of feature types in coastal areas, there are interference sources, such as paddy fields, lakes, and rivers, that have similar spectral features to the aquaculture ponds and generate too much redundant information interference during feature extraction when using deep neural networks, leading to the misclassification of aquaculture ponds. Therefore, we propose a new attention mechanism (PGC structure). As shown in Figure 5, this structure, which mainly consists of Polarized Self-Attention (PSA) [46] and Global Context (GC) [47], assigns weights to the feature map in a data-driven manner to suppress the image background noise and redundant information in order to strengthen the target features, so that the network’s computational resources are more allocated to the aquaculture ponds and are more able to deepen the connection between the shallow and deep information of the network to better deal with the complex features in remote sensing images, thus improving the performance of the model.

As shown in Figure 5, there are two main branches that make up the structure. Among them, the first branch is mainly composed of Polarized Self-Attention, which is mainly composed of spatial-only attention and channel-only attention. When performing the aquaculture pond extraction task, the spatial-only attention pools the input feature maps in the channel dimension, obtains the feature maps representing the spatial information of the whole image, and then enhances the expression of the aquaculture pond features in the image and suppresses the non-aquaculture pond features by weighting the attention, while the channel-only attention learns the weights of the different channels, identifies the channels that carry more aquaculture pond feature information, and suppresses channels that carry non-aquaculture pond feature information. Therefore, the Polarized Self-Attention is able to enhance the expression of useful information through the complementary effects of spatial-only attention and channel-only attention, while effectively suppressing irrelevant features and thus enabling the model to effectively eliminate non-aquaculture ponds. The second branch is mainly composed of an improved bottleneck module and a Global Context module, where the image initially goes through the bottleneck structure to improve the extraction of deeper features, and then, the Global Context module aggregates the global feature information and adaptively adjusts the feature weights, which solves the limitations of traditional convolutional neural networks in dealing with long-range dependencies and global context understanding. Finally, the feature information output from the two branches is fused and output. Therefore, the PGC structure enhances the extraction of useful information, reduces the interference of redundant information, effectively improves the model’s ability to capture global information, and then strengthens the connection between the shallow contour information and deep abstract information of the network to improve the model’s classification accuracy.

3.4. Experimental Setup

3.4.1. Parameter Settings

In order to speed up model training and prediction, the experimental platform runs 64-bit Windows 10 Professional on a PC equipped with an Intel(R) Core i7-10700F CPU running at 2.90 GHz, 64.0 GB of RAM, and NVIDIA GeForce RTX 3060 GPUs. The experimental environment was set up with Anaconda 3 software, and the virtual environment created was Python 3.6 with models built based on TensorFlow 2.4 and Keras 2.2 deep learning frameworks. In addition, the experiments, as well as model training and prediction, were uniformly programmed and debugged on PyCharm IDE.

As shown in Table 2, to evaluate the performance of each model and enhance its robustness, the model uniformly uses the Adam optimizer, where the initial learning rate in the training process is 0.0001, the dropout is set to 0.3, the batch size is set to 8, and Epoch is 100.

3.4.2. The Construction Process of the Aquaculture Pond Extraction Model

Figure 6 shows the process of model construction, which includes dataset construction, model training, and model testing. In the dataset construction, the Sentinel-2 image and the Planet image were used to produce the dataset. Then, using visual interpretation, the manual tagging of the aquaculture ponds in the study region was finished utilizing the ArcGIS10.8 software and a Google Earth image; the Id value was set to 255 to obtain the binarized sample labels. We employed random cropping to clip the image and its accompanying labels because entering the complete image and labels into the model for training could cause computer memory overload. As a result of this cropping, we obtained the Sentinel-2 image dataset and Planet image dataset and their corresponding labels as 750 pairs and 815 pairs, respectively, using an 8:2 ratio to split the sample dataset into training and testing sets. The training set is used to train the model and to learn the characteristics of aquaculture ponds in order to get the final weights for prediction. The testing set is used to evaluate the performance of the model on real data and verify the generalization ability of the model.

Since satellite sensors will capture the same feature at different angles and produce different image positions and morphologies, expanding the training dataset by applying geometric transformations allows the deep learning model to learn more rotationally invariant features of the feature, thus improving the model’s adaptability to different morphological images and reducing the risk of model overfitting. Consequently, we carried out data augmentation procedures on the training set, which encompassed geometric transformations such as horizontal and vertical flips as well as brightness enhancement and contrast enhancement. Finally, we obtained the Sentinel-2 image training set and Planet image training set and their respective corresponding labels as 3000 pairs and 3260 pairs, respectively, and used 20% of the enhanced training set at random as the validation set in order to fine-tune the model’s parameters. The obtained training sets were put into the model for training for feature learning, and finally, the testing set was used for model prediction.

3.4.3. Accuracy Assessment

In this study, the actual prediction results of the testing set are used as a standard for quantitative assessment, and four precision evaluation metrics, including precision, recall, Intersection-over-Union (IoU), and F1-Score, are used to evaluate the accuracy and reliability, reflecting the advantages and disadvantages of model performance. Among these metrics, precision denotes the percentage of correctly categorized aquaculture ponds among all extracted aquaculture ponds; recall indicates the proportion of correctly identified aquaculture ponds relative to the total actual aquaculture ponds; IoU denotes the similarity between the predicted results and the real situation; and F1-Score is the reconciled average of precision and recall, which is used to assess the model’s performance in a thorough manner. The formulas for calculating the evaluation indices are provided in Equations (1)–(4):

Precision = TP/(TP + FP)

(1)

Recall = TP/(TP + FN)

(2)

IoU = TP/(TP + FN + FP)

(3)

F1-Score = 2 × (Precision × Recall)/(Precision + Recall)

(4)

TP (True Positive) indicates that both the real situation and the model’s prediction are positive; FP (False Positive) indicates that the actual situation is negative but that the predictions are positive; FN (False Negative) indicates that, although the forecasts are negative, the actual situation is positive; and TN (True Negative) indicates that the real situation and predictions are both negative. The Table 3 Confusion Matrix is displayed.

4. Results and Discussion

4.1. Comparison Experiments

In this study, we compare the proposed MPG-Net with several mainstream semantic segmentation models, including FCN8S, SegNet, U-Net, and DeepLabV3. Table 4 demonstrates the comparison results of the number of training parameters, training time, and average testing time of different models. The training time refers to the length of training on the same training set, and the average test time is the average time required for inference on the same testing set for each image of size 256 × 256 pixels. As shown in Table 4, the number of training parameters of MPG-Net is significantly reduced compared with other models, which is mainly attributed to the fact that the MS structure achieves multi-scale information extraction through a parallel computation of different convolutional kernels and avoids duplicated stacking of deep convolutional networks. As well, the PGC structure reduces the duplicated computation of the global contextual information through efficient contextual feature extraction. Together, these optimization strategies efficiently cut down unnecessary computation and redundant features, thus enabling MPG-Net to have fewer training parameters than other models. In the computational time complexity stage, MPG-Net also outperforms other models in terms of training speed and inference speed under GPU acceleration, exhibiting shorter training time and inference time. It further proves the efficiency of our model structure design and the optimization of computational resource utilization. In summary, MPG-Net significantly reduces the number of model parameters and the inference time, demonstrating excellent efficiency and performance advantages.

In order to evaluate the superiority of the MPG-Net model, the prediction results of the proposed MPG-Net were compared with the prediction results of the mainstream semantic segmentation models, the classical FCN8S, SegNet, and U-Net, as well as the DeepLabV3 model, which were all qualitatively analyzed and quantitatively evaluated. Figure 7 and Figure 8 show a visual comparison between the original images of the testing set, the ground truth labels, and the recognition results generated by the five models. In the images, the white color represents aquaculture ponds, and the black color represents other ground classes.

First, we compared the segmentation results of each model on the testing set from a qualitative point of view, as shown in Figure 7 and Figure 8. The experimental results show that the segmentation results using the FCN8S model are the worst, which are mainly characterized by a large amount of noise interference and the misclassification phenomenon. Specifically, the continuous down-sampling operation of FCN8S during feature extraction leads to a significant decrease in the resolution of the feature map, which not only results in the loss of image detail information but also leads to the background information (e.g., dikes, near-shore seawater, etc.) being incorrectly classified as an aquaculture pond. The image edge jaggedness effect brought by its up-sampling stage also further reduces the segmentation accuracy, which is manifested by the discontinuous segmentation area and blurred details, especially because some small and irregular aquaculture ponds are often missed or misclassified. In contrast, the segmentation effect of SegNet and of U-Net is improved. However, while SegNet does achieve a smoother segmentation effect through the encoder–decoder architecture, it still shows some shortcomings in dealing with complex background information, especially for the regions with similar spectral features, such as the nearby seawater and other water bodies; the model is also prone to produce the adhesion effect among the aquaculture ponds (shown as the yellow rectangular box in Figure 7) and the misclassification phenomenon (as shown by the red rectangular box in Figure 7 and Figure 8). U-Net can reduce the misclassification of the background to a certain extent and make the edges of the aquaculture ponds clearer by combining the high-level semantic information of the encoder and the spatial details of the decoder through the jump connection. However, U-Net still has limitations in dealing with complex scenarios, especially when facing water bodies similar to aquaculture ponds (e.g., fine rivers, drainage canals). The model also tends to suffer from overfitting or underfitting problems, which leads to sticking (shown by the yellow circular box in Figure 7) and misclassification (shown by the red circular box in Figure 7 and Figure 8) between aquaculture ponds, and the identification of fine aquaculture ponds is still limited (as shown by the blue circular box in Figure 8). The performance of DeepLabV3 is relatively superior, especially when dealing with multi-scale targets, and, through the introduction of Atrous Convolution to expand the receptive field, it can effectively capture large-scale contextual information and reduce the information loss caused by feature down-sampling, which is manifested in the reduction of the adhesion with the background region (as shown by the orange rectangular box in Figure 7). However, DeepLabV3 still has difficulty in completely eliminating noise interference in certain complex backgrounds, and, despite better detail retention, it still cannot completely avoid misclassification with background regions (as shown by the green rectangular boxes in Figure 7 and Figure 8). The MPG-Net proposed in this paper has the best performance among all the models, mainly due to its integration of the PGC structure and the MS structure. The PGC structure can effectively reduce the interference of some redundant feature information when using the hopping connection layer to realize the information transfer and can effectively focus on the detailed features of the aquaculture ponds. In addition, the MS structure extracts aquaculture pond features at different scales through multi-scale convolution and combines them with global contextual information to ensure the clarity and accuracy of the edges of the aquaculture ponds. Compared with other models, MPG-Net not only better recognizes fine aquaculture ponds (as shown by the green circular box in Figure 7) but also reduces the phenomenon of background mis-segmentation (as shown by the purple circular box in Figure 8), significantly improves the segmentation accuracy in complex scenes, and reduces the omission of segmentation and adhesion problems of aquaculture ponds (as shown by the orange circular box in Figure 8).

In addition, this study quantitatively analyzed the segmentation performance of different classification methods based on evaluation metrics, as shown in Table 5 and Table 6. To assess the overall performance of each model, we calculated the average evaluation metrics values for the Sentinel-2 testing set and the Planet testing set, as shown in Table 7. It can be seen that FCN8S had the most inferior performance, with lower mean precision and mean recall, especially in mean IoU (about 54.39%) and mean F1-Score (about 69.94%), which were not as good as in the other models. This reflects the limitation of FCN8S in processing complex remote sensing images, which cannot fully utilize the detailed information for accurate segmentation. The average performance of SegNet and U-Net was in the middle level, and SegNet had a higher mean precision (about 87.95%), while its mean recall reached 90.26%, which was slightly higher than U-Net’s mean recall (about 89.60%). Nevertheless, the overall performance of U-Net on both testing sets was more balanced with its higher mean precision of 91.61%, which makes U-Net slightly better than SegNet in mean IoU and mean F1-Score, showing its stability in complex image segmentation tasks. DeepLabV3 outperformed the performance of SegNet and U-Net, with a mean recall of 91.70% and a mean F1-Score of 92.07%, which was close to the level of MPG-Net. DeepLabV3 performed better in global context capture and was especially good at dealing with complex textures and blurred regions.

In contrast, MPG-Net performed the best among all the compared models, with its average precision, recall, IoU, and F1-Score leading the others, especially in mean IoU and mean F1-Score, where mean precision, mean recall, mean IoU, and mean F1-Score were 94.95%, 92.95%, 88.57%, and 93.94%, respectively. This indicates that MPG-Net can capture features in complex scenes more effectively with stronger generalization ability.

4.2. Ablation Experiments

To confirm that the PGC and MS structures in the MPG-Net model are effective, ablation experiments were conducted. Specifically, new network models were obtained by sequentially adding different structures to the initial U-Net model and then testing the segmentation performance of these newly constructed network models. All experiments were conducted on the identical system, utilizing the same dataset for both training and validation of each instance and the same parameter settings to ensure reliable results for the ablation experiments. Models for the comparison of ablation experiments were divided into the following four types: U-Net, which refers to the initial U-Net model without any enhancements; U-Net_1, which incorporates the MS structure on top of U-Net; U-Net_2, which incorporates the PGC structure on top of U-Net; and MPG-Net, which represents the enhanced U-Net model, introduced in this study, that combines U-Net_1 and U-Net_ 2, thus incorporating the MS structure and the PGC structure. The prediction results for the testing set of ablation experiments are shown in Figure 9 and Figure 10, and the accuracy evaluation of each model is shown in Table 8 and Table 9.

The ablation experiments were visualized using a combination of quantitative and qualitative methods based on the identification result plots and accuracy evaluation tables. Compared with the initial U-Net model, the U-Net_1 model adopts the MS structure, which improved the four evaluation indicators of precision, recall, IoU, and F1-Score on the Sentinel-2 testing set by 1.48%, 2.62%, 3.48%, and 2.08%, respectively, and on the Planet testing set by 3.14%, 0.12%, 2.81%, and 1.63%, respectively, which is credited to the reality that the MS structure combines the multi-branching structure of the inception module as well as the Dilated residual module. These modules can both expand the breadth of the network and the receptive field of the convolutional extraction. At the same time, this combination reduces the sparsity that is brought about by the Dilated convolution in the process of convolution. Therefore, the MS structure can extract more feature information that effectively extracts different scales, thus better recognizing the contours of the aquaculture ponds.

The U-Net_2 model adopts the PGC structure, which improved the precision, recall, and IoU, as well as F1-Scores over the U-Net model on both the Sentinel-2 and Planet testing sets, which suggests that the PGC structure enhances the expression of useful information in images, reduces the interference of redundant information, and is more conducive to the extraction of aquaculture ponds. This is because, when performing the image segmentation task, the model must segment the image with the least amount of error and extract valuable information from the image. However, due to the presence of a large amount of redundant information in the image, the model needs to learn useful features from the global contextual information, and the PGC structure serves to improve the model’s capacity to represent the global contextual information, which enables the model to better capture the feature multi-scale information. At the same time, the model’s proficiency for recognizing objects at different scales is further enhanced by the Polarized Self-Attention. Therefore, both the U-Net_1 model with the MS structure and the U-Net_2 model with the PGC structure obtain better recognition results and higher accuracy.

Compared with the U-Net_1 and U-Net_2 models, the MPG-Net model further improves precision, recall, intersection ratio, and F1-Score on both datasets, and the recognition results markedly diminish other interferences resembling the spectral features of the aquaculture ponds, as illustrated in the green and blue boxes in Figure 9 and Figure 10. Moreover, the MPG-Net model further reduces misclassification while enhancing the expression edge features of the aquaculture ponds. As a result, this model can effectively combine the advantages of both the MS and the PGC structures and can extract the multi-scale feature information of aquaculture ponds while overcoming the interference of complex background information, thus improving the model’s robustness and capacity for generalization.

4.3. Applicability of Methods

The broad range of feature types across the coastal region and the complexity of the remote sensing image background often lead to the ‘same spectrum heterogeneous objects’ phenomenon when extracting information from aquaculture ponds. For example, paddy fields, salt pans, lakes, and pits have approximate features with similar spectral characteristics and are easily confused with coastal aquaculture ponds, resulting in the interference of redundant information and thus affecting the extraction accuracy. Additionally, the differing scales of aquaculture ponds in remote sensing images with varying spatial resolutions pose a challenge for the conventional convolutive neural network in extracting effective features, consequently impacting the recognition results.

The MPG-Net deep learning semantic segmentation model proposed in this study can diminish the interference from redundant information, effectively enhance the ability to articulate the characteristics of aquaculture ponds, and more comprehensively identify coastal aquaculture ponds. The aforementioned experiments demonstrate that the methodology proposed in this study fulfills the need for coastal aquaculture pond extraction. To further validate the applicability of the MPG-Net model, this paper provides an overall prediction of two typical aquaculture areas, Yingluo Harbor and Anpu Harbor, in 2022. The data we used are the Sentienl-2 MSI image and the Planet SuperDove image of October, and data pre-processing of the two images was performed to obtain the image data with the lowest cloud coverage. Based on the MPG-Net model proposed in this paper, the migration learning prediction of the study area obtained the distribution of aquaculture ponds in the study area, as shown in Figure 11 and Figure 12, as well as the accuracy evaluation table, as shown in Table 10 and Table 11.

From the accuracy map in Figure 11 and Figure 12, it can be intuitively seen that there are a small number of misclassifications and omissions in the results extracted by using the MPG-Net model, including mainly the nearby seawater and inland lakes as well as pits and ponds that are all wrongly classified as aquaculture ponds. However, in general, these results are basically consistent with the real distribution of aquaculture ponds, and the identification results are accurately localized. From the results of the accuracy evaluation indicators in Table 8 and Table 9, it is easy to see that the methods in this paper can obtain high accuracy for remote sensing images of different resolutions. With the exception IoU, the precision, recall, and F1-Score are all around 90% and meet the accuracy requirements. We also carried out area statistics on the extraction results of two typical farming areas, and it can be seen from the following table that the difference between the extracted area and the real area using the medium-resolution Sentinel-2 image is larger than that of the higher-resolution Planet image. This difference is due to the following reasons: (1) the area of the aquaculture ponds in the study area is generally small and densely arranged, and the width of the embankment between most of the aquaculture ponds is 5 to 10 m. Given the constraints of image resolution, the edge part of the aquaculture ponds are frequently misidentified as non-aquaculture ponds due to the mixing of image elements, which results in the extracted area of the area being smaller than the actual area; and (2) the spectral features of the small embankments of less than 5 m are often ignored, and they are retained and merged with the water surface of the aquaculture ponds due to the difficulty of segmentation, which also results in the extracted area of the area being larger than the actual area. In general, the areas of the two typical aquaculture areas extracted using different resolution images are almost similar to the real area. It is proved that the MPG-Net model has good applicability in the extraction of aquaculture ponds in coastal areas.

In summary, the MS structure and the PGC structure introduced in this research can boost the performance of the model and strengthen the model’s ability to generalize. Moreover, the MPG-Net model can effectively solve the problems of the conventional convolutive neural network in the extraction of aquaculture ponds and better realize the fine segmentation of the individual aquaculture ponds with high recognition accuracy, which provides an important reference for the fast and accurate extraction of aquaculture ponds within coastal regions.

5. Conclusions

This study proposes a new deep neural network model, MPG-Net, which is based on two different resolutions of remote sensing images, namely Sentinel-2 and Planet. The MPG-Net model enhances the U-Net model by substituting the traditional U-Net model’s convolutional layer with the MS structure and by incorporating the PGC attention structure into the skip connection; these inclusions provide the model with a stronger feature extraction ability of aquaculture ponds. We applied the model to identify aquaculture ponds in two typical areas of Beibu Gulf, Guangxi, and achieved high precision. Compared with the four classical semantic segmentation models, FCN8S, SegNet, U-Net, and DeepLabV3, the MPG-Net model had a better performance and higher precision. The average values of precision, recall, IoU, and F1-Score on the two image datasets with different resolutions were 94.95%, 92.95%, 92.95%, 88.57%, and 93.94%, respectively, which outperformed the other compared models, demonstrating that the method detailed in this paper comprehensively considered contextual semantic information and could achieve fast and accurate extraction of coastal aquaculture ponds in complex environments, enhance the precision of extracting individual aquaculture ponds, and provide a basis for the automatic segmentation of remote sensing images in the future.

Pond aquaculture stands as a significant farming approach in coastal regions. The swift growth of the aquaculture industry has instigated a hasty and unregulated expansion of aquaculture ponds in these coastal areas, increasing the vulnerability of the ecological environment in coastal areas. Utilizing satellite-based remote sensing technology in conjunction with deep learning techniques allows for the precise extraction of aquaculture pond data, thereby optimizing the spatial arrangement of aquaculture land and promoting rational use of land resources. Even though the MPG-Net model introduced in this paper realized the accurate extraction of aquaculture pond information in typical aquaculture areas along the Beibu Gulf coast of Guangxi, some issues need to be further investigated: (1) confirming the model’s validity and suitability when applied to lower spatial resolution remote sensing imagery; and (2) extending the model to larger-scale aquaculture regions and examining the spatial and temporal evolution patterns of these regions based on extensive time-series satellite remote sensing imagery.

Author Contributions

Conceptualization, Y.C. and L.Z.; methodology, Y.C. and B.C.; validation, Y.C., J.Z. and Y.H.; formal analysis, Y.C.; investigation, Y.C., J.Z. and Y.H.; data curation, Y.C.; writing—original draft preparation, Y.C.; writing—review and editing, Y.C., L.Z. and B.C.; visualization, Y.C.; supervision, L.Z.; project administration, L.Z. and B.C.; funding acquisition, B.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Director Fund of the International Research Center of Big Data for Sustainable Development Goals, Grant No. CBAS2022DF003, the National Natural Science Foundation of China (Grant No. 42071305) and Wetland Resources Survey Project of International Importance in South China (Grant No. ZD202201321).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors are very grateful to the anonymous reviewers for their valuable comments and suggestions for the improvement of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

FAO. The State of World Fisheries and Aquaculture 2020; FAO: Rome, Italy, 2020; ISBN 978-92-5-132692-3. [Google Scholar]
Rahman, A.F.; Dragoni, D.; Didan, K.; Barreto-Munoz, A.; Hutabarat, J.A. Detecting Large Scale Conversion of Mangroves to Aquaculture with Change Point and Mixed-Pixel Analyses of High-Fidelity MODIS Data. Remote Sens. Environ. 2013, 130, 96–107. [Google Scholar] [CrossRef]
Neofitou, N.; Papadimitriou, K.; Domenikiotis, C.; Tziantziou, L.; Panagiotaki, P. GIS in Environmental Monitoring and Assessment of Fish Farming Impacts on Nutrients of Pagasitikos Gulf, Eastern Mediterranean. Aquaculture 2019, 501, 62–75. [Google Scholar] [CrossRef]
Wang, B.; Cao, L.; Micheli, F.; Naylor, R.L.; Fringer, O.B. The Effects of Intensive Aquaculture on Nutrient Residence Time and Transport in a Coastal Embayment. Environ. Fluid Mech. 2018, 18, 1321–1349. [Google Scholar] [CrossRef]
Hukom, V.; Nielsen, R.; Asmild, M.; Nielsen, M. Do Aquaculture Farmers Have an Incentive to Maintain Good Water Quality? The Case of Small-Scale Shrimp Farming in Indonesia. Ecol. Econ. 2020, 176, 106717. [Google Scholar] [CrossRef]
Spalding, M.D.; Ruffo, S.; Lacambra, C.; Meliane, I.; Hale, L.Z.; Shepard, C.C.; Beck, M.W. The Role of Ecosystems in Coastal Protection: Adapting to Climate Change and Coastal Hazards. Ocean Coast. Manag. 2014, 90, 50–57. [Google Scholar] [CrossRef]
Lv, D.A.; Cheng, J.; Mo, W.; Tang, Y.H.; Sun, L.; Liao, Y.B. Pollution and Ecological Restoration of Mariculture. Ocean Dev. Manag. 2019, 11, 43–48. [Google Scholar]
Richards, D.R.; Friess, D.A. Rates and Drivers of Mangrove Deforestation in Southeast Asia, 2000–2012. Proc. Natl. Acad. Sci. USA 2016, 113, 344–349. [Google Scholar] [CrossRef]
Wen, K.; Yao, H.; Huang, Y.; Chen, H.; Liao, P. Remote Sensing Image Extraction for Coastal Aquaculture Ponds in the Guangxi Beibu Gulf Based on Google Earth Engine. Trans. Chin. Soc. Agric. Eng 2021, 37, 280–288. [Google Scholar]
Liu, W.; Liu, S.; Zhao, J.; Duan, J.; Chen, Z.; Guo, R.; Chu, J.; Zhang, J.; Li, X.; Liu, J. A Remote Sensing Data Management System for Sea Area Usage Management in China. Ocean Coast. Manag. 2018, 152, 163–174. [Google Scholar] [CrossRef]
Duan, Y.; Li, X.; Zhang, L.; Liu, W.; Chen, D.; Ji, H. Detecting Spatiotemporal Changes of Large-Scale Aquaculture Ponds Regions over 1988–2018 in Jiangsu Province, China Using Google Earth Engine. Ocean Coast. Manag. 2020, 188, 105144. [Google Scholar] [CrossRef]
Wang, L.; Chen, C.; Xie, F.; Hu, Z.; Zhang, Z.; Chen, H.; He, X.; Chu, Y. Estimation of the Value of Regional Ecosystem Services of an Archipelago Using Satellite Remote Sensing Technology: A Case Study of Zhoushan Archipelago, China. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 102616. [Google Scholar] [CrossRef]
Chen, C.; Liang, J.; Xie, F.; Hu, Z.; Sun, W.; Yang, G.; Yu, J.; Chen, L.; Wang, L.; Wang, L. Temporal and Spatial Variation of Coastline Using Remote Sensing Images for Zhoushan Archipelago, China. Int. J. Appl. Earth Obs. Geoinf. 2022, 107, 102711. [Google Scholar] [CrossRef]
Ottinger, M.; Clauss, K.; Huth, J.; Eisfelder, C.; Leinenkugel, P.; Kuenzer, C. Time Series Sentinel-1 SAR Data for the Mapping of Aquaculture Ponds in Coastal Asia. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 9371–9374. [Google Scholar]
Zhu, Z.; Tang, Y.; Hu, J.; An, M. Coastline Extraction from High-Resolution Multispectral Images by Integrating Prior Edge Information with Active Contour Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4099–4109. [Google Scholar] [CrossRef]
Sabjan, A.; Lee, L.; See, K.; Wee, S. Comparison of Three Water Indices for Tropical Aquaculture Ponds Extraction Using Google Earth Engine. Sains Malays 2022, 51, 369–378. [Google Scholar]
Ma, Y.; Zhao, D.; Wang, R.; Su, W. Offshore Aquatic Farming Areas Extraction Method Based on ASTER Data. Trans. Chin. Soc. Agric. Eng. 2010, 26, 120–124. [Google Scholar]
Wu, Y.; Chen, F.; Ma, Y.; Liu, J.; Li, X. Research on Automatic Extraction Method for Coastal Aquaculture Area Using Landsat8 Data. Remote Sens. Nat. Resour. 2018, 30, 96–105. [Google Scholar]
Alexandridis, T.K.; Topaloglou, C.A.; Lazaridou, E.; Zalidis, G.C. The Performance of Satellite Images in Mapping Aquacultures. Ocean Coast. Manag. 2008, 51, 638–644. [Google Scholar] [CrossRef]
Wang, F.; Xia, L.; Chen, Z.; Cui, W.; Liu, Z.; Pan, C. Remote Sensing Identification of Coastal Zone Mariculture Modes Based on Association-Rules Object-Oriented Method. Trans. Chin. Soc. Agric. Eng. 2018, 34, 210–217. [Google Scholar]
Fu, Y.; Deng, J.; Ye, Z.; Gan, M.; Wang, K.; Wu, J.; Yang, W.; Xiao, G. Coastal Aquaculture Mapping from Very High Spatial Resolution Imagery by Combining Object-Based Neighbor Features. Sustainability 2019, 11, 637. [Google Scholar] [CrossRef]
Ren, C.; Wang, Z.; Zhang, Y.; Zhang, B.; Chen, L.; Xi, Y.; Xiao, X.; Doughty, R.B.; Liu, M.; Jia, M. Rapid Expansion of Coastal Aquaculture Ponds in China from Landsat Observations during 1984–2016. Int. J. Appl. Earth Obs. Geoinf. 2019, 82, 101902. [Google Scholar] [CrossRef]
Huang, S.; Wei, C. Spatial-Temporal Changes in Aquaculture Ponds in Coastal Cities of Guangdong Province: An Empirical Study Based on Sentinel-1 Data during 2015–2019. Trop. Geogr. 2021, 41, 622–634. [Google Scholar]
Zhu, H.; Li, K.; Wang, L.; Chu, J.; Gao, N.; Chen, Y. Spectral Characteristic Analysis and Remote Sensing Classification of Coastal Aquaculture Areas Based on GF-1 Data. J. Coast. Res. 2019, 90, 49–57. [Google Scholar] [CrossRef]
Cheng, T.F.; Zhou, W.F.; Fan, W. Progress in the Methods for Extracting Aquaculture Areas from Remote Sensing Data. Remote Sens. Nat. Resour. 2012, 24, 1–5. [Google Scholar]
Duan, Y.; Li, X.; Zhang, L.; Chen, D.; Ji, H. Mapping National-Scale Aquaculture Ponds Based on the Google Earth Engine in the Chinese Coastal Zone. Aquaculture 2020, 520, 734666. [Google Scholar] [CrossRef]
Hou, T.; Sun, W.; Chen, C.; Yang, G.; Meng, X.; Peng, J. Marine Floating Raft Aquaculture Extraction of Hyperspectral Remote Sensing Images Based Decision Tree Algorithm. Int. J. Appl. Earth Obs. Geoinf. 2022, 111, 102846. [Google Scholar] [CrossRef]
Xue, M.; Chen, Y.Z.; Tian, X. Detection of Marine Aquaculture in Sansha Bay by Remote Sensing. Mar. Environ. Sci. 2019, 730–735. [Google Scholar] [CrossRef]
Zeng, Z.; Wang, D.; Tan, W.; Huang, J. Extracting Aquaculture Ponds from Natural Water Surfaces around Inland Lakes on Medium Resolution Multispectral Images. Int. J. Appl. Earth Obs. Geoinf. 2019, 80, 13–25. [Google Scholar] [CrossRef]
Xia, Z.; Guo, X.; Chen, R. Automatic Extraction of Aquaculture Ponds Based on Google Earth Engine. Ocean Coast. Manag. 2020, 198, 105348. [Google Scholar] [CrossRef]
Zhao, C.; Jia, M.; Wang, Z.; Mao, D.; Wang, Y. Identifying Mangroves through Knowledge Extracted from Trained Random Forest Models: An Interpretable Mangrove Mapping Approach (IMMA). ISPRS J. Photogramm. Remote Sens. 2023, 201, 209–225. [Google Scholar] [CrossRef]
Long, C.; Dai, Z.; Zhou, X.; Mei, X.; Van, C.M. Mapping Mangrove Forests in the Red River Delta, Vietnam. For. Ecol. Manag. 2021, 483, 118910. [Google Scholar] [CrossRef]
Shao, Z.; Sun, Y.; Xi, J.; Li, Y. Intelligent Optimization Learning for Semantic Segmentation of High Spatial Resolution Remote Sensing Images. Geomat. Inf. Sci. Wuhan Univ. 2022, 47, 234–241. [Google Scholar]
Wang, E.; Qi, K.; Li, X.; Peng, L. Semantic Segmentation of Remote Sensing Image Based on Neural Network. Acta Opt. Sin. 2019, 39, 93–104. [Google Scholar]
Cheng, B.; Liang, C.; Liu, X.; Liu, Y.; Ma, X.; Wang, G. Research on a Novel Extraction Method Using Deep Learning Based on GF-2 Images for Aquaculture Areas. Int. J. Remote Sens. 2020, 41, 3575–3591. [Google Scholar] [CrossRef]
Lu, Y.; Shao, W.; Sun, J. Extraction of Offshore Aquaculture Areas from Medium-Resolution Remote Sensing Images Based on Deep Learning. Remote Sens. 2021, 13, 3854. [Google Scholar] [CrossRef]
Su, H.; Wei, S.; Qiu, J.; Wu, W. RaftNet: A New Deep Neural Network for Coastal Raft Aquaculture Extraction from Landsat 8 OLI Data. Remote Sens. 2022, 14, 4587. [Google Scholar] [CrossRef]
Chen, C.; Zou, Z.; Sun, W.; Yang, G.; Song, Y.; Liu, Z. Mapping the Distribution and Dynamics of Coastal Aquaculture Ponds Using Landsat Time Series Data Based on U²-Net Deep Learning Model. Int. J. Digit. Earth 2024, 17, 2346258. [Google Scholar] [CrossRef]
Ottinger, M.; Clauss, K.; Kuenzer, C. Aquaculture: Relevance, Distribution, Impacts and Spatial Assessments–A Review. Ocean Coast. Manag. 2016, 119, 244–266. [Google Scholar] [CrossRef]
Chen, Y.; Song, G.; Zhao, W.; Chen, J.W. Estimating Pollutant Loadings from Mariculture in China. Mar. Environ. Sci. 2016, 35, 1–6. [Google Scholar]
Murray, N.J.; Clemens, R.S.; Phinn, S.R.; Possingham, H.P.; Fuller, R.A. Tracking the Rapid Loss of Tidal Wetlands in the Yellow Sea. Front. Ecol. Environ. 2014, 12, 267–272. [Google Scholar] [CrossRef]
Peng, Y.; Chen, G.; Li, S.; Liu, Y.; Pernetta, J.C. Use of Degraded Coastal Wetland in an Integrated Mangrove–Aquaculture System: A Case Study from the South China Sea. Ocean Coast. Manag. 2013, 85, 209–213. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Liu, H.; Liu, F.; Fan, X.; Huang, D. Polarized Self-Attention: Towards High-Quality Pixel-Wise Regression. arXiv 2021, arXiv:2107.00782. [Google Scholar]
Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. Gcnet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 1971–1980. [Google Scholar]

Figure 1. Map of the study area. In the figure, the top half is the location of the study area, and the bottom half is a standard false-color map of Planet (a) and Sentinel-2 (b). (A,C) are the aquaculture areas of Yingluo Harbor, and (B,D) are the aquaculture areas of Anpu Harbor.

Figure 2. The construction of U-Net.

Figure 3. The structure of MPG-Net. MS and PGC are the two improved structures proposed in this study.

Figure 4. The MS structure. Inception module on the left. Dilated residual module with Dilation rate equal to 5 on the right.

Figure 5. The PGC structure. The upper branch is the bottleneck module and GC module, and the lower branch is the PSA module.

Figure 6. The construction process of the aquaculture pond extraction model. The top, middle, and bottom sections of the figure represent data cropping, data enhancement, and model training and prediction, respectively.

Figure 7. Results of testing set segmentation of aquaculture ponds on Sentinel-2 dataset with different models.

Figure 8. Results of testing set segmentation of aquaculture ponds on Planet dataset with different models.

Figure 9. Results of ablation experiments on Sentinel-2 testing set.

Figure 10. Results of ablation experiments on the Planet testing set.

Figure 11. The extraction results of Yingluo Harbor. Frames (a,b) original images, (c,d) extraction results, and (e,f) accuracy maps.

Figure 12. The extraction results of Anpu Harbor. Frames (a,b) original images, (c,d) extraction results, and (e,f) accuracy maps.

Table 1. Description of the Sentinel-2A MSI and Planet SuperDove.

Sensor Type	Wave Description	Wavelength (nm)	Spatial Resolution (m)
Sentinel-2A MSI	Band2 Blue	458–523	10
	Band3 Green	543–578	10
	Band4 Red	650–680	10
	Band8 NIR	785–900	10
Planet SuperDove	Band2 Blue	465–515	3
	Band3 Green	513–549	3
	Band6 Red	650–680	3
	Band8 NIR	845–885	3

Table 2. Model parameters and the optimal values of MPG-Net.

Model Parameters	Optimal Parameters
Loss function	binary_crossentropy
Optimizer	Adam
Activation	Sigmoid
Initial learning rate	0.0001
Epoch	100
Batch size	8
Dropout	0.3

Table 3. Confusion Matrix.

Real Situation	Prediction Results
Real Situation	Positive	Negative
Positive	TP	FN
Negative	FP	TN

Table 4. Training parameters and computational complexity of different models.

Experimental Details		FCN8S	SegNet	DeepLabV3	U-Net	MPG-Net
Training Parameters		134M	29.5M	44M	31M	10M
Training Time	Sentinel-2 training set	480.2 min	275.9 min	313.7 min	276.4 min	269.7 min
Training Time	Planet training set	495.6 min	280.5 min	319.4 min	282.2 min	273.6 min
Average Testing Time	Sentinel-2 testing set	78.3 ms	31.6 ms	37.6 ms	32.1 ms	30.3 ms
Average Testing Time	Planet testing set	78.0 ms	31.8 ms	37.5 ms	32.3 ms	30.0 ms

Table 5. Experimental accuracy comparison of different models for Sentinel-2 image testing set.

Model	Precision (%)	Recall (%)	IoU (%)	F1-Score (%)
FCN8S SegNet	60.13 87.61	63.44 89.49	44.66 79.15	61.74 88.54
U-Net DeepLabV3	92.48 93.37	86.72 90.38	81.01 84.93	89.51 91.85
MPG-Net	94.57	92.73	88.04	93.64

Table 8. Accuracy assessment of ablation experiments on the Sentinel-2 testing set.

Model	Precision (%)	Recall (%)	IoU (%)	F1-Score (%)
U-Net	92.48	86.72	81.01	89.51
U-Net_1	93.96	89.34	84.49	91.59
U-Net_2	93.65	91.61	86.25	92.62
MPG-Net	94.57	92.73	88.04	93.64

Table 9. Accuracy assessment of ablation experiments on the Planet testing set.

Model	Precision (%)	Recall (%)	IoU (%)	F1-Score (%)
U-Net	90.73	92.48	84.51	91.60
U-Net_1	93.87	92.60	87.32	93.23
U-Net_2	94.21	92.77	87.77	93.48
MPG-Net	95.32	93.16	89.09	94.23

Table 10. Evaluation of extraction accuracy of aquaculture ponds in Yingluo Harbor.

	Precision (%)	Recall (%)	IoU (%)	F1-Score (%)	Area (/km²)
Ground Truth	100	100	100	100	9.27
Planet Image	93.72	91.41	86.13	92.55	9.07
Sentinel-2 Image	90.24	89.92	81.95	90.08	8.84

Table 11. Evaluation of extraction accuracy of aquaculture ponds in Anpu Harbor.

	Precision (%)	Recall (%)	IoU (%)	F1-Score (%)	Area (/km²)
Ground Truth	100	100	100	100	16.48
Planet Image	92.68	91.98	86.01	92.48	16.23
Sentinel-2 Image	91.05	90.03	82.71	90.53	16.88

Table 6. Experimental accuracy comparison of different models for Planet image testing set.

Model	Precision (%)	Recall (%)	IoU (%)	F1-Score (%)
FCN8S SegNet	73.68 88.29	83.15 91.03	64.12 81.22	78.13 89.64
U-Net DeepLabV3	90.73 91.58	92.48 93.02	84.51 85.70	91.60 92.29
MPG-Net	95.32	93.16	89.09	94.23

Table 7. Average evaluation metric values for different models on two testing sets.

Model	Mean Precision (%)	Mean Recall (%)	Mean IoU (%)	Mean F1-Score (%)
FCN8S SegNet	66.91 87.95	73.30 90.26	54.39 80.19	69.94 89.09
U-Net DeepLabV3	91.61 92.48	89.60 91.70	82.76 85.32	90.56 92.07
MPG-Net	94.95	92.95	88.57	93.94

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Zhang, L.; Chen, B.; Zuo, J.; Hu, Y. MPG-Net: A Semantic Segmentation Model for Extracting Aquaculture Ponds in Coastal Areas from Sentinel-2 MSI and Planet SuperDove Images. Remote Sens. 2024, 16, 3760. https://doi.org/10.3390/rs16203760

AMA Style

Chen Y, Zhang L, Chen B, Zuo J, Hu Y. MPG-Net: A Semantic Segmentation Model for Extracting Aquaculture Ponds in Coastal Areas from Sentinel-2 MSI and Planet SuperDove Images. Remote Sensing. 2024; 16(20):3760. https://doi.org/10.3390/rs16203760

Chicago/Turabian Style

Chen, Yuyang, Li Zhang, Bowei Chen, Jian Zuo, and Yingwen Hu. 2024. "MPG-Net: A Semantic Segmentation Model for Extracting Aquaculture Ponds in Coastal Areas from Sentinel-2 MSI and Planet SuperDove Images" Remote Sensing 16, no. 20: 3760. https://doi.org/10.3390/rs16203760

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu