Open AccessArticle

Research on Irrigation Grade Discrimination Method Based on Semantic Segmentation

Xibao Wu

Wentao Chen

Kexin Yang

Xin Zhao

Yiqun Wang

and

Wenbai Chen

School of Automation, Beijing Information Science and Technology University, Beijing 100192, China

Author to whom correspondence should be addressed.

Electronics 2024, 13(23), 4629; https://doi.org/10.3390/electronics13234629

Submission received: 18 October 2024 / Revised: 17 November 2024 / Accepted: 21 November 2024 / Published: 23 November 2024

(This article belongs to the Special Issue Machine Learning and Computational Intelligence in Remote Sensing)

Download

Browse Figures

Figure 1
Geographical location of the North China Plain and distribution map of different crops. "> Figure 2
Irrigation assurance capability assessment technology roadmap. "> Figure 3
Visualization of irrigation assurance capability indicators and data distribution information map. 2.43E-2 is expressed as <math display="inline"><semantics> <mrow> <mn>2.43</mn> <mo>×</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math>, 9.52E-2 is expressed as <math display="inline"><semantics> <mrow> <mn>9.52</mn> <mo>×</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math>. "> Figure 4
The scatter plot of <math display="inline"><semantics> <mrow> <mi>E</mi> <msub> <mi>T</mi> <mn>0</mn> </msub> </mrow> </semantics></math> calculated from meteorological stations and PET. "> Figure 5
Mask2Former model structure. "> Figure 6
(a) represents the Transformer module, (b) represents the ConvNeXt module, and (c) represents the CONAT module. "> Figure 7
(a) Model accuracy for categorizing the level of farmland irrigation guarantee capacity; (b) Intersection over Union (IoU) results. "> Figure 8
(a) Tagged image; (b) verification image of the Mask2former model with CONAT as the backbone network. "> Figure 9
(a) Accuracy comparison of Mask2former models with different backbone networks; (b) Intersection over Union (IoU) comparison. ">

Versions Notes

Abstract

As one of China’s major grain crops, wheat has a high demand for water resources, making it susceptible to drought stress. Traditional irrigation evaluation methods are often based on experience and rule-based calculations, which struggle to cope with complex environmental factors and dynamic changes in crop needs. With technological advancements, deep learning-based research methods, characterized by their strong data-driven analytical capabilities, are expected to improve the accuracy of evaluation results. This paper focuses on the irrigation demand assessment of winter wheat farmland, aiming to explore a new regional-scale irrigation demand assessment method based on deep learning. By establishing samples of different irrigation evaluation levels, this study seeks to better meet the requirements of irrigation demand assessment. For the problem of regional-scale irrigation-level discrimination, the Convolutional Network Attention(CONAT) module was proposed to optimize the backbone network structure of the Mask2Former model. To tackle issues related to data imbalance and underfitting across certain categories, a loss function tailored for imbalanced sample distributions was implemented, accompanied by enhancements to the training scheme. By contrasting this refined model with alternative methods for discriminating irrigation levels, the viability of this approach was showcased.

Keywords:

winter wheat; farmland irrigation demand assessment; semantic segmentation; Mask2Former

1. Introduction

Many regions around the world are facing water scarcity issues, making water resource management a complex and increasingly severe challenge. Areas, particularly arid and semi-arid regions, are experiencing significant water shortages. This situation puts pressure on agricultural production, drinking water supplies, and ecosystems. In northern China, the issue of water scarcity is even more pronounced due to dry climate conditions and geographical limitations. The North China Plain, as one of China’s major grain-producing areas, faces significant pressures related to water resource constraints. Agricultural water usage accounts for 62.4% of China’s total water resources, with a large portion allocated to the irrigation of winter wheat. Due to issues surrounding the supply and distribution of water resources, improving agricultural irrigation efficiency faces many challenges. Wheat, as one of China’s main staple crops, has a high demand for water resources, making it particularly susceptible to drought stress.

Due to the complexity and diversity of water distribution in agricultural fields, accurately estimating irrigation water demand faces significant challenges. Traditional methods for estimating irrigation water use are constrained in large-scale areas due to the highly labor-intensive nature and susceptibility to incidental factors such as weather, soil, and crop conditions. Methods based on remote sensing retrieval of surface variables for estimating irrigation water use continue to emerge, as remote sensing technologies enable monitoring from local to global scales [1], aiding in overcoming challenges posed by big data and computational limitations [2,3]. Hain et al. [4] compared remotely sensed actual evapotranspiration values with simulated potential evapotranspiration values, with the difference between the two considered to originate from non-precipitation water sources such as irrigation, thereby establishing the evapotranspiration difference method for estimating irrigation water demand.

With the successful application of deep learning in the fields of visual imaging [5] and natural language processing [6,7], an increasing number of studies have begun to explore its potential in addressing agricultural issues. Semantic segmentation, a deep learning-based image processing technique, enables precise classification of each pixel in remote sensing images into different material categories. Compared with traditional classification methods, semantic segmentation not only classifies ground objects in remote sensing images but also identifies spatially continuous regions. In agricultural remote sensing applications, semantic segmentation is primarily used in the following areas: (1) Farm soil type segmentation [8,9,10]: Semantic segmentation of farm remote sensing images accurately identifies different types of soil, such as sandy, loamy, and clay soils. (2) Crop planting structure segmentation [11,12,13]: Classifies and segments different crop types (such as wheat, rice, maize, etc.) in fields, helping to assess crop diversity and distribution. (3) Pest and disease monitoring [14,15]: Semantic segmentation can identify diseases and pests, such as lesions, eggs, and damaged leaves, thereby enabling monitoring and early warning. (4) Vegetation cover and growth status monitoring [16]: Semantic segmentation of remote sensing images assesses the coverage and health status of vegetation.

Despite some achievements in the application of semantic segmentation technology in agricultural remote sensing, few studies have focused on quantifying irrigation demand assessments at the regional scale using this technology. The innovation of this paper lies in its examination of irrigation demand assessment for winter wheat cultivation, with data primarily sourced from MODIS remote sensing data (Moderate Resolution Imaging Spectroradiometer), a satellite payload supported by NASA for global surface monitoring. MODIS is characterized by its moderate spatial resolution, wide spectral bands, global coverage observation period, and extensive applications across multiple fields. This study integrates the ET and PET parameters from MODIS remote sensing data with observational data from ground meteorological stations in China. Based on the principle of regional water balance, we propose the IGCI (Irrigation Guarantee Capacity Index), defined as the ratio of monthly effective irrigation amount to irrigation water demand. Furthermore, we establish a sample set for different irrigation assessment levels. To achieve regional scale irrigation grade discrimination, this paper innovatively proposes a backbone network structure of the Mask2Former model optimized by the CONAT module, thereby enhancing the model’s image feature extraction capability. Additionally, to address issues of data imbalance and underfitting in some categories, this paper introduces a loss function suitable for unbalanced sample distributions and has improved the training scheme. To comprehensively evaluate the performance of different algorithms and network models, extensive ablation and comparative experiments were conducted, and the results demonstrate significant performance advantages of the proposed irrigation grade discrimination model.

2. Research Area and Data

2.1. Research Area

The research area encompasses the North China Plain (approximately between 113° to 121° east longitude and 32° to 40° north latitude), which extends eastward to the Bohai Sea, westward to the Taihang Mountains, northward to the Yanshan Mountains, and southward to the Yellow River. It includes Beijing, Tianjin, the Hebei Plain, as well as the Yubei and Luxi Plains north of the Yellow River.

The terrain in the research area is flat, with deep soil layers, and most farmland is located at altitudes below 50 m. This region experiences a temperate continental monsoon climate, characterized by hot and rainy summers, cold and dry winters, and an average annual temperature of around 15 °C. The thermal conditions support double cropping of crops such as wheat and maize, with significant diurnal temperature variations conducive to their growth. It is one of the largest grain-producing regions in China. As shown in the distribution map of different crops in Figure 1, arable land accounts for over 80% of the area, with the green areas representing the winter wheat-growing region.

2.2. Data

The data for this study include MODIS remote sensing data, meteorological station data, and arable crop planting distribution maps. MODIS remote sensing data are supported by NASA (National Aeronautics and Space Administration) and are used for global surface monitoring. They are characterized by moderate spatial resolution, extensive spectral bands, global coverage observation cycles, and widespread application across various fields. MOD16A2 is a part of the MODIS remote sensing data product series with a spatial resolution of 1 km, specifically used for monitoring global surface evapotranspiration and vegetation transpiration [17]. The remote sensing data used in this study are the MOD16A2 product from 2010 to 2018, which include actual evapotranspiration (ET) and potential evapotranspiration (PET). After processing the raw data, including projection conversion and clipping, a multi-year monthly ET and PET data series from 2010 to 2018 was formed; detailed information can be found in Table 1.

Meteorological station data are sourced from the China Surface Climate Data Daily Dataset, which includes daily data from benchmark and basic meteorological stations across China since January 1951. The dataset comprises daily values for site pressure, temperature, precipitation, evaporation, relative humidity, wind direction and speed, sunshine duration, and ground temperature at 0 cm [18]. The raw data underwent preprocessing steps including data cleaning, coordinate unit conversion, and integration of monthly data. Regional filtering was conducted using the latitude and longitude coordinates accompanying the meteorological data to compile monthly datasets for each meteorological station within the North China region spanning from 2010 to 2018.

The data on arable land and crop types are sourced from Jiadi Li’s distribution map of crop planting areas in the North China Plain [19], which mainly includes the distribution of six typical crops (winter wheat–summer maize, winter wheat–rice, other bimodal crops, spring maize, cotton, other unimodal crops) from 2001 to 2018 in the North China Plain, with a resolution of 0.2775 km. This study primarily extracts the conclusions on crop planting distribution and the distribution information of winter wheat arable land as a basis for further research.

3. Method

3.1. Preprocessing of Remote Sensing Data

Remote sensing data preprocessing was conducted using the Google Earth Engine (GEE) platform (https://earthengine.google.com/, accessed on 22 April 2024), where the 8-day MOD16A2 data underwent band extraction, data calculation, clipping, and reprojection to the World Geodetic System 1984 (WGS84). GEE is a cloud computing platform provided by Google that integrates a vast amount of satellite remote sensing data from around the world, including Landsat, MODIS, Sentinel, and others. It supports both JavaScript and Python programming languages and provides powerful computational resources and analytical tools specifically designed for processing and analyzing Earth observation data. Therefore, this study uses GEE programming to synchronize MODIS data with ground meteorological data and other data sources in space, ensuring consistency across the datasets. The detailed processing steps are as follows:

Data Clipping: Selecting the area of interest and removing unnecessary data from the remote sensing images to reduce the scale of the data being processed.
Feature Extraction: Extracting useful land cover information from the images, allowing for the calculation of statistical information within the study area, such as mean and total values.
Image Reprojection: The process of converting remote sensing images from one geographic coordinate system (projection) to another. Different satellites and sensors may use various projections and resolutions, making reprojection a necessary step for integrating and comparing different datasets.
Band Extraction and Synthesis: Selecting the required bands from remote sensing images and combining information from multiple bands into new remote sensing images to obtain additional information or for specific analyses. This includes extracting the ET and PET bands from the MOD16A2 data and merging these extracted bands with precipitation and arable land data to generate new remote sensing datasets.
Data Quality Control: Checking and correcting for outliers, missing values, or other quality issues in the data. This involves calculating the mean and standard deviation for specified bands within the designated area. Subsequently, the range for outliers is determined based on thresholds (a threshold value of 2.5 is selected in this study), and pixels exceeding this range are masked out.

After preprocessing, ET and PET monthly total raster data images can be downloaded from GEE. The calculation of actual evapotranspiration (ET) employed an improved RS-PM algorithm, which considers contributions from both vegetation and non-vegetation cover types. The calculation formula is as follows:

ET = f_{ν} \times E T_{ν} + (1 - f_{ν}) \times E T_{b}

(1)

In the formula,

f_{ν}

represents the vegetation cover fraction, while

E T_{v}

and

E T_{b}

denote the transpiration from vegetated areas and the soil evaporation from non-vegetated areas, respectively. The ET monthly composite data provided by MODIS represents the total cumulative evapotranspiration for the month and should be multiplied by a conversion factor of 0.1 before use.

3.2. The Calculation of Crop Irrigation Evaluation Indicators

3.2.1. Irrigation Assurance Capability of Arable Land

In this study, the ET and PET parameters from the MODIS evapotranspiration product were used, combined with observational data from ground meteorological stations, to calculate the monthly effective irrigation amount and irrigation water requirement based on the principle of regional water balance. The ratio of these two values was then defined as the

I G C I

[20], which is used to assess the capability of irrigation water use. The calculation formula is as follows:

I G C I = E I W / I W D

(2)

When the actual effective irrigation amount equals the theoretical water requirement, it indicates the optimal irrigation guarantee capability. If the effective irrigation amount is zero, it signifies almost no irrigation capability. Generally, the

I G C I

value ranges from 0 to 1, with higher values indicating a higher irrigation guarantee capability. The calculation pathway of this index is illustrated in Figure 2. The visualization of the Irrigation Assurance Capability Index is shown in Figure 3.

3.2.2. Effective Irrigation Volume

The farmland irrigation system has a water balance process, where water input includes three components: atmospheric precipitation (P), artificial irrigation (I), and recharge from shallow groundwater (E). Water output occurs in three forms: evapotranspiration (

E T

), soil infiltration (D), and surface runoff loss (R), while also considering the change in soil moisture content (

Δ W

). The calculation formula is expressed as

P + I + E = E T + R + D + Δ W

(3)

Under normal circumstances, the upward recharge from groundwater can be neglected, as its contribution is generally small. Over an annual timescale, changes in soil water storage are usually minimal and can also be disregarded. Additionally, soil infiltration and surface runoff loss have complex non-linear relationships with precipitation and irrigation amounts, making direct calculations challenging. By introducing the concepts of effective precipitation (

P 1

) and effective irrigation water (

E I W

), the water balance equation can be simplified to

P 1 + E I W = E T

(4)

E I W = E T - P 1

(5)

where the actual evapotranspiration can be obtained from the MODIS

E T

product. The effective precipitation estimation can rely on precipitation observation data from ground meteorological stations and be calculated using empirical models.

Effective precipitation refers to the portion of rainfall that can be absorbed and utilized by crops to meet their transpiration needs. It does not include surface runoff, deep percolation below the root zone, or the deep percolation required for leaching salts. Effective precipitation is closely related to various factors, such as rainfall intensity, duration, soil characteristics, and crop types. Generally, low-intensity, short-duration rainfall is more effective because most of the moisture can be absorbed by the soil, whereas heavy rainfall or prolonged rainfall tends to have lower effectiveness, leading to surface runoff and deep percolation. Even continuous light rain can cause deep percolation if it lasts too long, reducing its effectiveness. Another influencing factor is the distribution of precipitation throughout the entire growing season of the crops. If rainfall occurs shortly after irrigation, much of it will not recharge the soil profile due to already high soil moisture content, resulting instead in surface runoff or deep percolation. Conversely, if a large amount of precipitation concentrates in the later stages of crop growth, the effectiveness of rainfall may be relatively low, despite a larger soil water capacity, because winter wheat absorbs less water at that time.

Effective precipitation is generally calculated using empirical coefficients for rainfall utilization. Reference [21] provides a detailed analysis and modeling of the effective precipitation for winter wheat in the North China region. It conducts a correlation analysis of the actual precipitation and effective precipitation data sequences over 50 winter wheat-growing years, determining the calculation models for effective precipitation for periods such as weekly, ten-day, monthly, and the entire growing period. This paper adopts the monthly model (

R^{2} = 0.9125

) as the standard to calculate the effective precipitation for each month of the winter wheat-growing cycle.

P 1 = - 0.0025 P^{2} + 1.0593 P - 0.529

(6)

where, P represents the precipitation amount during the growth period of winter wheat (mm), and

P 1

represents the effective precipitation amount during this period (mm).

Winter wheat is generally sown from late September to early October and harvested from late May to mid-June of the following year. This study compiles monthly rainfall observation data from meteorological stations to calculate monthly effective precipitation for the entire growing season of winter wheat in the North China Plain. The inverse distance weighting interpolation method is employed to generate raster data of effective precipitation at a 1 km resolution. The interpolation calculation formula is as follows:

d_{i} = \sqrt{{(x - x_{i})}^{2} + {(y - y_{i})}^{2}}

(7)

w_{i} = \frac{1 / d_{i}}{\sum_{i = 1}^{n} 1 / d_{i}}

(8)

Z_{0} = \sum_{i = 1}^{n} w_{i} \times Z (x_{i}, y_{i})

(9)

In the formula,

x_{i}

and

y_{i}

are the coordinates of the nearby points,

d_{i}

is the distance from the target point to the nearby points,

w_{i}

is the distance weight from the target point to the nearby points, and

Z_{0}

is the value at the interpolation point.

3.2.3. Irrigation Water Requirement

During the growth process of crops, the amount of water needed for irrigation supplementation is the crop’s irrigation water demand. The calculation formula for crop irrigation water demand is as follows, where

I W D

represents the irrigation water demand (mm), and

E T_{C}

represents the crop’s potential evapotranspiration (mm):

I W D = E T_{C} - P 1

(10)

Crop water requirement refers to the total amount of water needed by a crop under suitable soil moisture and fertility conditions to ensure normal growth, development, and desired yield. This includes plant transpiration, inter-row evaporation, and the formation of plant tissues. The crop water requirement can be calculated based on reference crop evapotranspiration (

E T_{0}

) and introducing a crop-specific coefficient, as follows:

E T_{C} = K_{C} \times E T_{0}

(11)

where

K_{C}

represents the crop coefficient for a specific crop, reflecting the differences among different crops, typically set as constants.

Reference [22] uses the FAO’s piecewise single-value averaging method to calculate the crop coefficients at different growth stages for major crops at 61 representative points in the water resource tertiary regions of the Yellow River, Huai River, Hai River, and the inland river basins of Northwest China. The results provide the crop coefficients throughout the entire growth cycle of winter wheat in the North China Plain, with the initial growth stage at 0.60, the freeze–thaw period ranging from

0.60

0.40

, the overwintering period at 0.40, the rapid development ranging period from

0.40

1.15

, the mid-growth stage at 1.15, and the maturity period ranging from

1.15

0.40

The reference crop evapotranspiration (potential evapotranspiration) refers to the hypothetical evapotranspiration of a crop under ideal conditions, with no water limitations, good growth conditions, and complete ground cover. It represents the combined actual transpiration of the crop and the total evaporation from the surrounding soil. It is solely related to meteorological factors and reflects an idealized state. The PET parameter in the MOD16A2 product considers surface cover conditions and combines with formulas like Penman to obtain continuous spatial data on potential evapotranspiration. Due to pixel heterogeneity and scale effects, PET parameters may differ from traditional

E T_{0}

. For instance, Wang et al. found that PET from MOD16A2 overestimated reference crop water requirements [23]. To address this, this study first utilizes daily observational data from ground meteorological stations and employs the Penman–Monteith (P-M) method recommended by the FAO to calculate

E T_{0}

at single points [24]. The calculation formula is as follows:

E T_{0} = \frac{0.408 Δ (R_{n} - G) + γ \frac{900}{T + 273} u_{2} (e_{s} - e_{a})}{Δ + γ (1 + 0.34 u_{2})}

(12)

In the formula,

R_{n}

is the net radiation (MJ/m²/day), G is the ground heat flux (MJ/m²/day),

Δ

is the slope of the saturation vapor pressure curve with respect to temperature (kPa/°C),

γ

is the psychrometric constant at different elevations (kPa/°C), T is the daily average temperature (°C),

u_{2}

is the wind speed at two meters above the ground (m/s),

e_{s}

is the average saturation vapor pressure (kPa), and

e_{a}

is the measured vapor pressure (kPa). The soil heat flux G represents the energy used to heat the soil, and during the estimation of evapotranspiration, the daily ground heat flux is usually small and can often be neglected.

γ = \frac{c_{p} P}{ε λ} = 0.665 \times 10^{- 3} P

(13)

P = 101.3 {(\frac{293 - 0.0065 z}{293})}^{5.26}

(14)

In the formula, P is the atmospheric pressure (kPa),

λ

is the latent heat of evaporation (2.45 MJ/kg),

C_{p}

is the specific heat at constant pressure (1.013 ×

10^{- 3}

MJ/kg/°C), and

ϵ

is the ratio of the molecular weight of water vapor to dry air (0.622). z represents the elevation of the site (m). Since the elevation of cultivated land is generally less than 100 m, when z is less than or equal to 100, the calculated value of

γ

is approximately 0.067.

e_{a} = e_{s} R H

(15)

e_{s} = \frac{e^{o} (T_{max}) + e^{o} (T_{min})}{2}

(16)

e^{o} (T) = 0.6108 exp [\frac{17.27 T}{T + 237.3}]

(17)

Δ = \frac{4096 [0.6108 exp (\frac{17.27 T}{T + 237.3})]}{{(T + 237.3)}^{2}}

(18)

In the formula,

e^{o} (T)

is the saturation vapor pressure (kPa) at air temperature T, and

R H

is the relative humidity. The average saturation vapor pressure

e_{s}

should be the average of the saturation vapor pressures at the daily maximum and minimum temperatures.

R_{n} = R_{n s} - R_{n l}

(19)

R_{n s} = (1 - α) R_{s}

(20)

R_{n d} = σ [\frac{T_{max, K}^{4} + T_{min, K}^{4}}{2}] (0.34 - 0.14 \sqrt{e_{a}}) (1.35 \frac{R_{s}}{R_{s o}} - 0.35)

(21)

In the formula,

R_{n s}

is the net shortwave radiation (MJ/m²/day),

α

is the albedo, with a value of 0.23 for grass reference crops.

R_{n l}

is the net longwave radiation (MJ/m²/day), and

σ

is the Stefan-Boltzmann constant (4.903 ×

10^{- 9}

MJ/m²/day).

T_{\max, K}

is the highest absolute temperature within 24 h (K = °C + 273.16), and

T_{\min, K}

is the lowest absolute temperature within 24 h (K = °C + 273.16).

R_{s o}

represents the solar radiation under clear sky conditions (MJ/m²/day).

R_{s} = (a_{s} + b_{s} \frac{n}{N}) R_{a}

(22)

R_{s o} = (0.75 + 2 \times 10^{- 5} z) R_{a} = 0.75 R_{a}

(23)

In the formula,

R_{s}

is the incident net shortwave radiation (MJ/m²/day), n is the actual sunshine hours, and N is the maximum possible sunshine duration. The Angstrom coefficients

a_{s}

and

b_{s}

depend on the local atmospheric conditions and the solar declination (latitude and month). If there are no measured solar radiation data available, and the parameters

a_{s}

and

b_{s}

have not been calibrated, the FAO recommended empirical values can be used, which are

a_{s}

= 0.25 and

b_{s}

= 0.50.

R_{a} = \frac{24 \times 60}{π} G_{s c} d_{r} [ω_{s} sin (φ) sin (δ) + cos (φ) cos (δ) sin (ω_{s})]

(24)

d_{r} = 1 + 0.033 cos (\frac{2 π}{365} J)

(25)

δ = 0.409 sin (\frac{2 π}{365} J - 1.39)

(26)

ω_{s} = arccos [- tan (φ) tan (δ)]

(27)

In the formula, the extraterrestrial radiation

R_{a}

for different times and latitudes can be calculated using the solar constant

G_{s} c

(0.082 MJ/m²/min), the solar declination

δ

, and the day of the year J (a number between 1 and 366). The inverse relative distance

d_{r}

between the Earth and the Sun and the sunset angle

ω_{s}

(rad) are also considered, along with the latitude

φ

(rad).

Derived from Equations (7) to (23), the

E T_{0}

results can be calculated using data from meteorological stations, including daily maximum and minimum temperatures, relative humidity, average wind speed, duration of sunshine, date, and coordinates of latitude and longitude. The daily results for

E T_{0}

are then aggregated into monthly data, which are used to establish a correlation between ground-based

E T_{0}

and the PET parameter in MOD16A2. A correction equation is constructed, and spatially continuous inversion of

E T_{0}

over the region is achieved through regression analysis.

Using regression analysis, a power function empirical regression model

Y = 0.747 X^{1.069}

was fitted, with a coefficient of determination

R^{2}

reaching 0.814, as shown in Figure 4. Using the PET parameters from the MOD16A2 product to calculate spatially continuous

E T_{0}

values helps correct the representative errors of point-based

E T_{0}

measurements. This approach also reduces the impact of surface heterogeneity and scale effects on PET, resulting in more accurate regional assessments of evapotranspiration.

3.3. Model Design

In 2022, the Mask2former architecture [25] was reported to handle various image segmentation tasks. Mask2Former is a deep learning-based image segmentation model designed to generate high-quality segmentation masks for images. It combines transformer architecture (Transformer) with convolutional neural networks (CNNs), demonstrating outstanding performance in complex visual tasks, particularly in instance segmentation and semantic segmentation, and has become a dominant structure in the field, as shown in Figure 5. The Mask2Former model includes several key improvements:

Masked Attention in the Transformer Decoder: The introduction of Masked Attention enables faster convergence during training and enhances overall performance.
Efficient Multi-Scale Processing: The model employs a feature pyramid consisting of both low-resolution and high-resolution features, feeding different scales of multi-scale features into various layers of the transformer decoder. This approach helps in better segmentation of small objects and regions.
Optimized Network Modules: The model optimizes the arrangement of Self-Attention and Masked Attention, completely removing the dropout module, which simplifies the model’s complexity.

These enhancements allow Mask2Former to effectively address a wide range of image segmentation challenges.

Mask2Former combines powerful image feature extraction capabilities with flexible segmentation strategies. By integrating mask generation with image segmentation tasks, it effectively handles the boundaries and structures of objects in complex scenes within agricultural remote sensing contexts. This model makes efficient use of multi-scale features, which is particularly important for segmenting different crops and terrains in agricultural remote sensing. It leverages self-attention mechanisms to capture long-range dependencies, thereby enhancing segmentation accuracy.

However, the original Mask2former prototype uses ResNet or Swin Transformer as its backbone network. The current network architectures may no longer fully meet contemporary demands due to their relative age. ResNet [1], as a classical convolutional neural network, has relatively low computational efficiency. Although Swin Transformer [26] introduced a self-attention mechanism to improve efficiency, it still has limitations in global modeling capabilities due to its sliding window strategy. Moreover, both ResNet and Swin Transformer are networks specifically designed for visual tasks. Their generalization performance on custom agricultural datasets may not be ideal in some complex scenarios. Therefore, this paper considers improving the backbone network of Mask2Former to enhance the model’s feature extraction capability and computational efficiency.

When refining the Mask2former backbone network, it is crucial to consider the strengths and weaknesses inherent in both existing CNN and Transformer models. CNNs excel at capturing local features but are insufficient in modeling long-range dependencies. On the other hand, Transformers can effectively model long-range dependencies through self-attention mechanisms but perform poorly in capturing local features. Currently, one of the better-performing CNN models is ConvNeXt [27], which optimizes through extreme ablation experiments, making comprehensive structural changes, layer structure adjustments, and detail modifications to the ResNet50 module. Based on observations of various network models, this paper attempts to design a new module called CONAT, as shown in Figure 6c. CONAT is a deep learning model typically used for image processing tasks, particularly in optimizing network structures and enhancing performance. It utilizes attention mechanisms to more effectively focus on key parts of the input data, thereby improving the model’s performance on specific tasks. CONAT effectively combines the best aspects of ConvNext and Transformer modules, enabling it to efficiently extract important features from images while adaptively merging features at different levels. This capability helps improve segmentation performance in complex backgrounds within agricultural remote sensing, allowing the model to maintain efficient performance across diverse environments. Both of these modules employ a “reverse bottleneck” structure, where the input channels are first expanded and then projected through

1 \times 1

convolutions. The differences are as follows: First, the ConvNeXt module places the depthwise convolution at the beginning of the module, a design validated by experimental results. Second, while Transformers capture global information through self-attention, ConvNeXt does not incorporate attention modules. The improvements made by CONAT include: replacing the MLP part of the Transformer design with ConvNeXt modules, which retain the

7 \times 7

depthwise convolutions and Batch Normalization (BN) to enhance the encoding of local pixel interactions and increase the network’s representational capacity. Additionally, reversing the sequence of the attention and ConvNeXt modules modifies the Transformer module sequence and delegates the downsampling task to ConvNeXt’s strided depthwise convolution, optimizing the downsampling kernel parameters more effectively.

Formally, given an input tensor

x \in R^{H \times W \times C}

, the CONAT module can be expressed as follows:

C O A T (x) = x + (A \circ N_{2} \circ N_{l} \circ D) (B N (x))

(28)

D (x) = L N (D e p t h C o n ν (x))

(29)

N_{l} (x) = G E L U (C o n v (x))

(30)

N_{2} (x) = C o n v (x)

(31)

A (x) = x + A t t n (L N (x))

(32)

In this context,

B N

L N

G E L U

, and

A t t n

, respectively, represent Batch Normalization, Layer Normalization, Gaussian Error Linear Unit, and Self-Attention operation. The Self-Attention operation also includes residual connections, which are not explicitly shown in the equations for simplicity. The MLP operation is represented by two functions,

N_{1}

and

N_{2}

, corresponding to a

1 \times 1

convolution for channel expansion by a factor of 4 and a

1 \times 1

convolution for channel projection, respectively. D represents a

7 \times 7

depthwise convolution.

By combining the mask generation capabilities of Mask2Former with the fine-grained feature extraction of CONAT, it becomes possible to effectively capture the boundaries of crops and ground cover, improving boundary precision. Agricultural remote sensing data are often influenced by weather and seasonal changes; utilizing these modules can enhance the model’s robustness in variable environments and reduce mis-segmentation.

4. Experimental Results

4.1. Experimental Dataset

In the related evaluation research process, data preprocessing plays a crucial role in building network models and often determines the effectiveness of training. This study selects remote sensing data as image input for the model instead of text or tabular data because it is believed that images contain sufficient geographic information.

Meteorological factors and crop distribution are regional rather than completely scattered, and deep learning models may be able to extract more comprehensive and richer information from raw images. Effective precipitation maps and winter wheat planting distribution maps are uploaded to the GEE platform and combined with preprocessed ET and PET images. After ensuring the consistency of spatial range, resolution, and projection coordinates, band synthesis is performed to generate a four-channel remote sensing image as experimental data. The full growth cycle of winter wheat lasts 9 months each year, and relevant data from 2010 to 2018 were collected, totaling 81 months. The data were randomly divided into training and validation sets in a 7:2 ratio. Additionally, due to the large size of a complete image, which is 3100 × 2800 pixels, the original training data were divided into segments of

512 \times 512

pixels to save memory resources and improve computational efficiency.

The training data contain all the raw data necessary for computing IGCI, with IGCI data serving as the labeled data for training the model. The labeled data are categorized based on the value of IGCI into the following grades: “

0 \sim 0.2

” indicating severe deficiency, “

0.2 \sim 0.4

” indicating moderate deficiency, “

0.4 \sim 0.6

” indicating general satisfaction, “

0.6 \sim 0.8

” indicating basic satisfaction, “

0.8 \sim 1

” indicating sufficient satisfaction, “>1” indicating excessive water supply, and “<0” indicating negative values (indicating non-real cultivated land or calculation errors, etc.). There are a total of seven grades covering potential outcomes from severe deficiency to sufficient satisfaction, as well as numerical anomalies. In the dataset, each pixel’s label represents the irrigation guarantee level for that location. The distribution of data for each grade is statistically summarized, as shown in Table 2.

As shown in Table 2, the dataset is unevenly distributed across the seven levels of irrigation guarantee capacity, with the “negative value” category having the highest proportion, accounting for 32.6%. In contrast, samples from levels such as “basically met” and “fully met” are relatively sparse, making up 3.9% and 2.8%, respectively. This class imbalance may affect the model’s prediction accuracy and generalization ability, particularly when predicting underrepresented categories, leading to potential biases. To mitigate this imbalance, oversampling and undersampling techniques can be considered, which involve increasing the samples of the smaller categories or reducing the samples of the larger categories to balance the distribution across classes. Additionally, a weighted loss function or adjusting class weights could be used to alleviate the impact of class imbalance.

4.2. Loss Function

In image segmentation tasks, using multiple loss functions for joint optimization is a common and effective strategy. The loss function consists of several components, including region loss, classification loss, and mask loss, among others. The loss function composition of the Mask2Former model is as follows:

l o s s_{t o t a l} = α l o s s_{c e} + (1 - α) l o s s_{d i c e}

(33)

CE Loss Function:

l o s s_{c e} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i c} log (p_{i c})

(34)

N denotes the number of samples, C denotes the number of classes,

y_{i}

is the true label of the i-th sample (one-hot encoded), and

p_{i c}

is the predicted probability that the i-th sample belongs to the c-th class.

Dice Loss Function:

l o s s_{d i c e} = 1 - \frac{2 \sum_{i = 1}^{N} \sum_{j = 1}^{H \times W} y_{i j} p_{i j} + 1}{\sum_{i = 1}^{N} \sum_{j = 1}^{H \times W} (y_{i j} + p_{i j}) + 1}

(35)

In the equation, N denotes the number of samples, H and W represent the height and width of the image, respectively,

y_{i j}

is the label value of the j-th pixel in the i-th sample, and

p_{i j}

is the predicted value of the j-th pixel in the i-th sample.

As observed in Section 4.1, the dataset in this study suffers from an issue of class imbalance. Oversampling may lead to overfitting, as the newly generated samples could introduce noise. On the other hand, undersampling may result in the loss of valuable information, leading to a decline in model performance. To address these issues, this study adopts an optimized loss function approach. WCE Loss, OhemCE Loss, and Focal Loss are all suitable for datasets with severe class imbalance. However, Focal Loss requires careful selection of the adjustment factor, which can easily lead to overfitting of the samples. Therefore, this study attempts to replace CE Loss with WCE Loss and OhemCE Loss.

WCE Loss is an improved version of CE Loss that introduces a weighting mechanism to address the issue of class imbalance. Assigning higher loss weights to minority class samples forces the model to focus more on these underrepresented classes during training.

l o s s_{w c e} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{C} w_{c} y_{i c} log (p_{i c})

(36)

In the equation, N represents the number of samples, C represents the number of classes,

y_{i}

is the true label of the i-th sample (one-hot encoded), and

p_{i c}

is the predicted probability that the i-th sample belongs to the c-th class.

OhemCE Loss encourages the model to focus more on the minority class samples, avoiding the dominance of the majority class, reducing overfitting on easy samples, and dynamically adapting to the changing difficulty of samples during training. This effectively alleviates the problem of class imbalance.

l o s s_{O h e m} = - \frac{1}{N_{O h e m}} \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i c} log (p_{i c})

(37)

In the equation, N represents the number of samples, C represents the number of classes,

y_{i}

is the true label of the i-th sample (one-hot encoded), and

p_{i c}

is the predicted probability that the i-th sample belongs to the c-th class.

4.3. Experimental Evaluation Metrics

For the selection of evaluation metrics for the dataset, in order to facilitate comparison with other related studies, this experiment uses commonly used metrics: Intersection over Union (IoU) and Accuracy. Intersection over Union is a widely used evaluation metric in computer vision, particularly in object detection and image segmentation tasks. It measures the overlap between the predicted results and the ground truth annotations. Accuracy is a commonly used performance evaluation metric in machine learning and statistics, especially in classification tasks. It represents the proportion of correctly predicted samples out of the total samples.

I o U = \frac{T P}{T P + F P + F N}

(38)

Acc = \frac{T P + T N}{T P + T N + F P + F N}

(39)

Among these,

T P

and

F P

, respectively, represent the numbers of true positive and false positive samples identified by the model, while

T N

and

F N

, respectively, represent the numbers of true negative and false negative samples identified by the model. Average Accuracy (aAcc) refers to the average pixel classification accuracy of the model across all classes, used to evaluate the overall classification accuracy of the model, but cannot reflect performance differences between different classes. Mean Intersection over Union (mIoU) refers to the average IoU across all classes, used to evaluate performance differences between different classes of the model, with higher values indicating better segmentation performance of the model across different classes. Mean Accuracy (mAcc) refers to the average pixel classification accuracy across all classes, used to evaluate performance differences between different classes of the model and reflect the overall performance of the model in pixel classification.

The experiment platform’s hardware and software specifications are listed in Table 3.

In the hyperparameter optimization process of this model, the selection and setting of hyperparameters are based on both the dataset characteristics and prior experimental results, using manual tuning. Based on the experimental training visualization results, in order to ensure that the model can fully learn the data distribution and that the loss value converges to a stable point, the number of iterations is set to 80,000. The batch size is determined by sequentially testing different values based on the GPU memory capacity, and the final batch size is set to 4 to balance memory usage and gradient update frequency, ensuring more stable training. The AdamW optimizer is chosen with a learning rate of 0.001, which is suitable for data with sparse features. The lower learning rate helps prevent gradient fluctuations that might affect model convergence. The learning rate decay coefficient is set to 0.05 to avoid overfitting, while the learning rate factors (0.9, 0.999) ensure a good update speed and momentum accumulation for the model. Additionally, the PolyLR learning strategy is employed, which gradually decreases the learning rate using polynomial decay. This allows the model to fine-tune its learning of sample details in the later stages of training, thereby improving its generalization ability. The hyperparameter settings of the irrigation grade discrimination model based on Mask2former are shown in the Table 4.

4.4. Comparative Experiment

The dataset used in this experiment is large and high-dimensional. Traditional methods often encounter the “curse of dimensionality” when dealing with such datasets. Additionally, non-deep-learning methods typically rely on linear assumptions, which fail to capture the non-linear relationships within the data effectively [28]. In contrast, deep learning models can automatically extract features from the data and, through multi-layered architectures, achieve the effect of ensemble learning. This allows them to integrate the advantages of multiple models, thereby improving overall performance. Therefore, deep learning models are chosen for this experiment. To validate the effectiveness of the proposed method, we compared multiple commonly used semantic segmentation models, including FCN, PSPNet, DeepLabV3+, KNet, Segformer, SegNeXt, ConvNeXt, and Mask2former, on the agricultural irrigation level dataset in this chapter. We chose Accuracy and Intersection over Union (IoU) as the two evaluation metrics to comprehensively evaluate the performance of the proposed algorithm.

As shown in Table 5, this experiment compared the performance of multiple semantic segmentation models on the task of classifying agricultural irrigation assurance levels. The Mask2former model demonstrated outstanding performance across all evaluation metrics (as shown in Figure 7), highlighting its advantages in agricultural irrigation assurance level classification tasks. Firstly, in terms of comprehensive metrics such as aAcc and mAcc, Mask2former achieved the best scores of 69.47% and 65.83%, respectively, outperforming other models significantly. This indicates that Mask2former excels in overall classification performance and can effectively identify various level categories. Regarding the IoU metric, Mask2former attained the highest mIoU of 46.03% among all models. This suggests that Mask2former exhibits the highest segmentation accuracy for each level category, with the best overlap between predicted and ground truth regions. This performance is attributed to its advanced model design, such as the multi-scale feature fusion mechanism and mask prediction mechanism, which enable better capture of subtle differences between different levels in terms of semantic understanding and feature extraction.

4.5. Ablation Study

As shown in Table 6, this section’s experimental study employed three evaluation metrics, aAcc, mAcc, and mIoU, to assess the semantic segmentation performance of the Mask2former model with different backbone networks from multiple perspectives.

Firstly, in terms of prediction accuracy, the CONAT model performs best on the aAcc metric, achieving 72.48%, indicating the highest segmentation precision at the pixel level. The Convnext small follows closely with an aAcc of 72.2%. In terms of the most crucial evaluation metric, mIoU, the Swin small model leads with 48.26%, exhibiting the highest prediction accuracy and target recognition capability. Furthermore, in assessing model robustness through the mAcc metric, Swin small ranks first with a score of 67.19%, demonstrating strong generalization ability and noise resistance. CONAT and Convnext tiny also perform well on this metric. Lastly, mIoU data show that the Swin small model significantly leads with 48.26%, far surpassing other models. This result further confirms the model’s exceptional semantic segmentation capability, achieving the highest congruence between the segmentation predictions and the true annotations, and capturing target details most precisely. The validation results of the main network, CONAT, are shown in Figure 8. The colors have no actual meaning; different colors simply represent different categories (black represents the background, red indicates severe deficiency, green denotes slightly satisfied, blue means generally satisfied, yellow signifies basically satisfied, magenta represents fully satisfied, cyan indicates excess supply, and gray is used for negative values).

In summary, the Mask2former model with CONAT as the backbone network demonstrates strong performance in image segmentation tasks, particularly achieving optimal performance in certain categories with sufficient data. However, this model lags in accuracy for categories with relatively fewer data points, such as “fully satisfied” and “basically satisfied”, with accuracies of only 34.51% and 44.72%, respectively, which is noticeably lower than some other models, as shown in Figure 9. This phenomenon may stem from two main reasons:

Excessive pursuit of local optima: The CONAT model incorporates self-attention mechanisms and the Convnext model structure, which enhances its feature learning capability. However, during training, the model may overly prioritize achieving high accuracy on dominant categories, leading to neglect of learning for smaller target categories and resulting in underfitting and decreased generalization performance on these categories.
Class imbalance: Due to the significant difference in data quantities among different categories, CONAT may overly focus on the major categories with abundant samples during training, neglecting the attention to minor categories, ultimately causing an imbalance in segmentation performance.

To address the above issue, we attempted to replace CE Loss with WCE Loss and OhemCE Loss. Experimental observations revealed that directly using these two loss functions as classification losses did not yield satisfactory model results. Therefore, we adjusted the training strategy by initially training the model with CE Loss. When the model had not yet fully converged, i.e., when the validation accuracy reached 80% of the previously measured accuracy, we applied early stopping, changed the loss function to WCE Loss or OhemCE Loss, reduced the learning rate, and reloaded the initial training weights for secondary training. The total training duration remained the same as before.

As shown in Table 7, WCE Loss exhibits improvements in both mAcc and mIoU compared with CE Loss, particularly with significant enhancements in the “fully satisfied” category, with an increase of around 20% in Accuracy and 7% in IoU. WCE Loss mitigates the data imbalance issue by assigning higher weights to minority class samples, thus increasing the model’s attention to these smaller categories. However, this also means a reduction in focus on major class samples, resulting in a slight decrease in overall Accuracy. From the data perspective, the model results using Ohem Loss achieve notably higher performance, reaching 73.61% for aAcc and 49.16% for mIoU, outperforming the other two loss functions. Ohem Loss effectively alleviates data imbalance issues without causing a decrease in overall Accuracy, unlike WCE Loss. This indicates that Ohem Loss aids CONAT in achieving better performance in overall classification and segmentation accuracy. This may be attributed to Ohem Loss’s ability to focus the model on challenging samples, thereby improving the model’s discriminative power by paying more attention to samples that are easily confused or difficult to classify.

According to Table 8, an ablation study was conducted to evaluate the impact of different components on the model performance. This study involved varying the presence of Mask2Former, CONAT, and different loss functions—CE Loss, WCE Loss, and Ohem Loss—while observing changes in accuracy metrics such as average Accuracy (aAcc), mean class Accuracy (mAcc), and mean Intersection over Union (mIoU). The results show that using Mask2Former with CONAT and CE Loss yields a moderate performance with mIoU at 46.48. However, replacing CE Loss with WCE Loss or Ohem Loss leads to different outcomes; notably, the combination of Mask2Former, CONAT, and Ohem Loss results in the highest mIoU of 49.16. This suggests that while the addition of CONAT generally enhances performance, the choice of loss function significantly affects the model’s effectiveness in classifying and segmenting the data.

5. Conclusions

This study focuses on the evaluation of irrigation demand for winter wheat cultivation and provides an in-depth analysis of Transformer and ConvNeXt semantic segmentation models. Since the prototype of Mask2Former uses ResNet or Swin Transformer as the backbone network, which is unable to meet the current requirements, this paper proposes the CONAT module and optimizes the backbone structure of the Mask2Former model accordingly, making it more suitable for irrigation level classification. To address the data imbalance issue in the dataset, a loss function suitable for imbalanced sample distributions is introduced, and the training strategy is improved. Experiments were conducted on an irrigation-level image dataset, and the experimental results show that the proposed model outperforms several classic segmentation networks in the irrigation-level classification task. Specifically, the model achieved an improvement of 4.14% in overall Accuracy and 3.13% in mean Intersection over Union (IoU), reaching 73.61% and 49.16%, respectively. These results provide a practical method for achieving accurate irrigation level classification at the regional scale.

To improve the research on farmland irrigation demand assessment, we face several significant challenges and limitations. First, due to data availability constraints, agricultural irrigation is often affected by seasonal variations. Timely data would help better understand and assess irrigation demand and effectiveness. However, the most recent data used in this study only go up to 2018, which may impact the model’s accuracy and reliability. To address this issue, future research could explore using updated datasets or developing models that can leverage historical data for predictions. Additionally, the spatial resolution of this dataset is limited, preventing the capture of more detailed surface features, which affects the accuracy of irrigation assessment. To solve this issue, drones could be employed for low-altitude flights to capture more detailed images, but this is only suitable for small-scale and specific area monitoring. A better choice would be satellite imaging technology, which has seen rapid advancements in recent years. Using high-resolution satellites, such as WorldView or GeoIQ, can provide sub-meter-level imagery that can assist in more precise surface feature analysis.

Moreover, transfer learning, which is a hot topic in deep learning, has shown great potential in dealing with supervised learning problems with scarce or missing labeled data. Future research could introduce AI large models and apply transfer learning specifically to address the challenges in agricultural condition assessment. For example, Generative Adversarial Networks (GANs) could generate synthetic images to expand the training dataset, enhancing the model’s adaptability to new crop types, especially in cases of insufficient samples. Transfer learning frameworks, such as TensorFlow Hub and PyTorch Hub, offer various pretrained models, making it easier to perform transfer learning and fine-tune them for specific crop types. This approach could not only improve the model’s prediction accuracy but also extend its applicability to other regions or crops, thereby enhancing the method’s generalizability and practicality. Overall, by updating data sources and leveraging advanced deep learning techniques, the accuracy and applicability of farmland irrigation demand assessment can be effectively improved, which is of great significance for future agricultural management and policymaking.

Author Contributions

Conceptualization and methodology, X.W. and W.C. (Wentao Chen); software and validation, W.C. (Wentao Chen); data curation and writing—original draft preparation, K.Y. and X.Z.; writing—review and editing, Y.W. and K.Y.; supervision, project administration, and funding acquisition, W.C. (Wenbai Chen). All authors have read and agreed to the published version of the manuscript.

Funding

This work is granted by The Major Project of Scientific and Technological Innovation 2030 (2021ZD0113603).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Rad, A.M.; AghaKouchak, A.; Navari, M.; Sadegh, M. Progress, challenges, and opportunities in remote sensing of drought. Glob. Drought Flood Obs. Model. Predict. 2021, 2021, 1–28. [Google Scholar] [CrossRef]
Alizadeh, M.R.; Abatzoglou, J.T.; Luce, C.H.; Adamowski, J.F.; Farid, A.; Sadegh, M. Warming enabled upslope advance in western US forest fires. Proc. Natl. Acad. Sci. USA 2021, 118, e2009717118. [Google Scholar] [CrossRef] [PubMed]
Hain, C.R.; Crow, W.T.; Anderson, M.C.; Yilmaz, M.T. Diagnosing neglected soil moisture source–sink processes via a thermal infrared–based two-source energy balance model. J. Hydrometeorol. 2015, 16, 1070–1086. [Google Scholar] [CrossRef]
Liu, H.; Fang, S.; Zhang, Z.; Li, D.; Lin, K.; Wang, J. MFDNet: Collaborative poses perception and matrix Fisher distribution for head pose estimation. IEEE Trans. Multimed. 2021, 24, 2449–2460. [Google Scholar] [CrossRef]
Li, Z.; Liu, H.; Zhang, Z.; Liu, T.; Xiong, N.N. Learning knowledge graph embedding with heterogeneous relation attention networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 3961–3973. [Google Scholar] [CrossRef]
Zhang, Z.; Li, Z.; Liu, H.; Xiong, N.N. Multi-scale dynamic convolutional network for knowledge graph embedding. IEEE Trans. Knowl. Data Eng. 2020, 34, 2335–2347. [Google Scholar] [CrossRef]
Zhou, P.; Han, J.; Cheng, G.; Zhang, B. Learning compact and discriminative stacked autoencoder for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4823–4833. [Google Scholar] [CrossRef]
Cerrón, B.; Bazan, C.; Coronado, A. Detection of housing and agriculture areas on dry-riverbeds for the evaluation of risk by landslides using low-resolution satellite imagery based on deep learning. Study zone: Lima, Peru. In Proceedings of the ICML 2020 Workshop: Tackling Climate Change with Machine Learning, Virtual, 26–30 April 2020. [Google Scholar]
Tong, X.Y.; Xia, G.S.; Lu, Q.; Shen, H.; Li, S.; You, S.; Zhang, L. Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sens. Environ. 2020, 237, 111322. [Google Scholar] [CrossRef]
Hasan, A.M.; Sohel, F.; Diepeveen, D.; Laga, H.; Jones, M.G. A survey of deep learning techniques for weed detection from images. Comput. Electron. Agric. 2021, 184, 106067. [Google Scholar] [CrossRef]
Li, Z.; Chen, G.; Zhang, T. A CNN-transformer hybrid approach for crop classification using multitemporal multisensor images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 847–858. [Google Scholar] [CrossRef]
Wang, S.; Di Tommaso, S.; Faulkner, J.; Friedel, T.; Kennepohl, A.; Strey, R.; Lobell, D.B. Mapping crop types in southeast India with smartphone crowdsourcing and deep learning. Remote Sens. 2020, 12, 2957. [Google Scholar] [CrossRef]
Saleem, M.H.; Potgieter, J.; Arif, K.M. Plant disease classification: A comparative evaluation of convolutional neural networks and deep learning optimizers. Plants 2020, 9, 1319. [Google Scholar] [CrossRef] [PubMed]
Tassis, L.M.; de Souza, J.E.T.; Krohling, R.A. A deep learning approach combining instance and semantic segmentation to identify diseases and pests of coffee leaves from in-field images. Comput. Electron. Agric. 2021, 186, 106191. [Google Scholar] [CrossRef]
Lee, C.S.; Sohn, E.; Park, J.D.; Jang, J.D. Estimation of soil moisture using deep learning based on satellite data: A case study of South Korea. GIScience Remote Sens. 2019, 56, 43–67. [Google Scholar] [CrossRef]
Mu, Q.; Zhao, M.; Running, S.W. MODIS global terrestrial evapotranspiration (ET) product (NASA MOD16A2/A3). Algorithm Theor. Basis Doc. Collect. 2013, 5, 381–394. [Google Scholar]
Kai, L.; Gege, N.; Sen, Z. Study on the spatiotemporal evolution of temperature and precipitation in China from 1951 to 2018. Adv. Earth Sci. 2020, 35, 1113. [Google Scholar]
Li, J.; Lei, H. Tracking the spatio-temporal change of planting area of winter wheat-summer maize cropping system in the North China Plain during 2001–2018. Comput. Electron. Agric. 2021, 187, 106222. [Google Scholar] [CrossRef]
Jianxi, H.; Li, L.; Chao, Z.; Wenju, Y.; Jianyu, Y.; Dehai, Z. Evaluation of cultivated land irrigation guarantee capability based on remote sensing evapotranspiration data. Trans. Chin. Soc. Agric. Eng. 2015, 31, 100. [Google Scholar]
Zhandong, L.; Aiwang, D.; Junfu, X.; Yang, G.; Hao, L. Study on the Calculation Model of Effective Precipitation during the Growing Season of Winter Wheat. J. Irrig. Drain. 2009, 28, 21–25. [Google Scholar]
Aiwang, D. Research on the Irrigation Water Quota for Major Crops in Northern China. Available online: https://kns.cnki.net/kcms2/article/abstract?v=rA1vgdEcKkrjV8F9uynZY6wQwq8XTGefDjjSS0nqxguyXQCSYgiRKNOfagTrGHCJf5M4q6zHMMJgTqgvOJ0oRD866jZlAmy_XEfE3BCbrPPn3PFp9DkpgXcqGojATg0haiYsqO6fqH5UGnuLKxn6lKfnczUcoYpB8wDVClnQWVc1PFhhMLun5Q==&uniplatform=NZKPT&language=CHS (accessed on 20 November 2024).
Wang, T.; Zlotnik, V.A. A complementary relationship between actual and potential evapotranspiration and soil effects. J. Hydrol. 2012, 456, 146–150. [Google Scholar] [CrossRef]
Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop evapotranspiration-Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56. Fao Rome 1998, 300, D05109. [Google Scholar]
Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1290–1299. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
Sarıgöl, M. Evaluating the Accuracy of Machine Learning, Deep Learning and Hybrid Algorithms for Flood Routing Calculations. Pure Appl. Geophys. 2024, 2024, 1–22. [Google Scholar] [CrossRef]

Figure 1. Geographical location of the North China Plain and distribution map of different crops.

Figure 2. Irrigation assurance capability assessment technology roadmap.

Figure 3. Visualization of irrigation assurance capability indicators and data distribution information map. 2.43E-2 is expressed as

2.43 \times 10^{- 2}

, 9.52E-2 is expressed as

9.52 \times 10^{- 2}

Figure 3. Visualization of irrigation assurance capability indicators and data distribution information map. 2.43E-2 is expressed as

2.43 \times 10^{- 2}

, 9.52E-2 is expressed as

9.52 \times 10^{- 2}

Figure 4. The scatter plot of

E T_{0}

calculated from meteorological stations and PET.

Figure 4. The scatter plot of

E T_{0}

calculated from meteorological stations and PET.

Figure 5. Mask2Former model structure.

Figure 6. (a) represents the Transformer module, (b) represents the ConvNeXt module, and (c) represents the CONAT module.

Figure 7. (a) Model accuracy for categorizing the level of farmland irrigation guarantee capacity; (b) Intersection over Union (IoU) results.

Figure 8. (a) Tagged image; (b) verification image of the Mask2former model with CONAT as the backbone network.

Figure 9. (a) Accuracy comparison of Mask2former models with different backbone networks; (b) Intersection over Union (IoU) comparison.

Table 1. The MOD16A2 global land evapotranspiration data.

Channel	Unit	Minimum	Maximum	Scale	Description
ET	kg/m²	−5	453	0.1	Evapotranspiration (total)
LE	J/m²/day	−20	1671	10,000	Latent heat flux (daily average)
PET	kg/m²	−8	793	0.1	Potential evapotranspiration (total)
PLE	J/m²/day	−40	3174	10,000	Potential latent heat flux (daily average)
ET_QC	—	—	—	—	ET quality control

Table 2. The distribution of IGCI data for each grade in the dataset.

Irrigation Guarantee Level	Number of Instances	Data Proportion
Severe Deficiency	77,980,516	22.5%
Moderate Deficiency	45,273,101	13.1%
General Satisfaction	29,018,465	8.4%
Basic Satisfaction	13,357,058	3.9%
Sufficient Satisfaction	9,747,072	2.8%
Excessive Water Supply	58,008,612	16.7%
Negative Values	112,798,254	32.6%

Table 3. Hardware and software specifications of the experimental platform.

Component	Device Information	Version Number
Operating System	Debian	5.10.0-23-amd64
Deep Learning Framework	Pytorch	2.0.1
Graphics Processing Unit	NVIDIA GPU	TITAN Xp
Central Processing Unit	Intel	E5-4650 v3
NVIDIA Computing Platform	CUDA	11.7
NVIDIA GPU Acceleration	CUDNN	8.5.0
Programming Software	Python	3.11.6

Table 4. Hyperparameter settings of Mask2former model.

Parameter Name	Parameter Value
Iteration times	80,000
Batch size	4
Optimizer	AdamW
Learning rate	0.001
Learning rate weight decay	0.05
Learning rate factor beta	(0.9, 0.999)
Learning strategy	PolyLR

Table 5. Comparison results of model performance for the classification of arable land irrigation guarantee capacity.

Model	aAcc	mAcc	mIoU
HRNet	67.25	56.82	41.34
PSPNet	62.31	52.38	36.91
DeepLabV3+	52.73	57.28	31.55
KNet (FCN)	62.82	59.31	38.78
KNet (UPerNet)	63.33	61.61	40.07
KNet (Swin)	66.01	61.46	41.66
Segformer (mit-b0)	65.13	59.48	40.4
Segformer (mit-b3)	64.28	61.28	40.5
SegNeXt	59.02	59.23	36.39
ConvNeXt	63.71	58.94	39.49
Mask2former	69.47	65.83	46.03

Table 6. Experimental Accuracy Results Based on Mask2former Backbone Network.

Mask2former Backbone Network	aAcc	mAcc	mIoU
ResNet50	69.47	65.83	46.03
Swin tiny	69.61	64.98	46.71
Swin small	71.10	67.19	48.26
NAT	68.94	60.68	44.13
DINAT	70.04	65.31	47.20
Convnext tiny	71.16	64.06	46.97
Convnext small	72.20	63.91	47.48
MOAT	69.81	61.77	45.59
CONAT	72.48	62.24	46.48

Table 7. Comparison of CONAT Model Results with Different Loss Functions.

Classification	CE Loss		WCE Loss		Ohem Loss
Classification	Acc	IoU	Acc	IoU	Acc	IoU
Severe Deficiency	71.76	52.14	69.91	52.80	72.52	56.08
Moderate Deficiency	65.69	49.89	72.97	50.95	65.46	49.49
General Satisfaction	61.54	41.40	59.63	41.34	65.93	47.12
Basic Satisfaction	44.72	28.54	48.01	26.51	39.55	26.12
Sufficient Satisfaction	34.51	26.27	54.88	33.42	44.25	32.01
Excessive Water Supply	79.11	62.05	77.93	61.97	84.40	64.59
Negative Values	78.36	65.12	71.56	62.66	78.67	68.74
aAcc	72.48	—	70.54	—	73.61	—
mAcc	62.24	—	64.98	—	64.40	—
mIoU	—	46.48	—	47.09	—	49.16

Table 8. Comparison of Model Effects Across Different Components.

Mask2former	CONAT	CE Loss	WCE Loss	Ohem Loss	aAcc	mAcc	mIoU
✓		✓			69.47	65.83	46.03
✓	✓	✓			72.48	62.24	46.48
✓	✓		✓		70.54	64.98	47.09
✓	✓			✓	73.61	64.4	49.16

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, X.; Chen, W.; Yang, K.; Zhao, X.; Wang, Y.; Chen, W. Research on Irrigation Grade Discrimination Method Based on Semantic Segmentation. Electronics 2024, 13, 4629. https://doi.org/10.3390/electronics13234629

AMA Style

Wu X, Chen W, Yang K, Zhao X, Wang Y, Chen W. Research on Irrigation Grade Discrimination Method Based on Semantic Segmentation. Electronics. 2024; 13(23):4629. https://doi.org/10.3390/electronics13234629

Chicago/Turabian Style

Wu, Xibao, Wentao Chen, Kexin Yang, Xin Zhao, Yiqun Wang, and Wenbai Chen. 2024. "Research on Irrigation Grade Discrimination Method Based on Semantic Segmentation" Electronics 13, no. 23: 4629. https://doi.org/10.3390/electronics13234629

APA Style

Wu, X., Chen, W., Yang, K., Zhao, X., Wang, Y., & Chen, W. (2024). Research on Irrigation Grade Discrimination Method Based on Semantic Segmentation. Electronics, 13(23), 4629. https://doi.org/10.3390/electronics13234629

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Irrigation Grade Discrimination Method Based on Semantic Segmentation

Abstract

1. Introduction

2. Research Area and Data

2.1. Research Area

2.2. Data

3. Method

3.1. Preprocessing of Remote Sensing Data

3.2. The Calculation of Crop Irrigation Evaluation Indicators

3.2.1. Irrigation Assurance Capability of Arable Land

3.2.2. Effective Irrigation Volume

3.2.3. Irrigation Water Requirement

3.3. Model Design

4. Experimental Results

4.1. Experimental Dataset

4.2. Loss Function

4.3. Experimental Evaluation Metrics

4.4. Comparative Experiment

4.5. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI